net.sf.saxon.expr

Class Tokenizer


public final class Tokenizer
extends java.lang.Object

Tokenizer for expressions and inputs. This code was originally derived from James Clark's xt, though it has been greatly modified since. See copyright notice at end of file.

Field Summary

static int
BARE_NAME_STATE
State in which a name is NOT to be merged with what comes next, for example "("
static int
DEFAULT_STATE
Initial default state of the Tokenizer
static int
OPERATOR_STATE
State in which the next thing to be read is an operator
static int
SEQUENCE_TYPE_STATE
State in which the next thing to be read is a SequenceType
int
currentToken
The number identifying the most recently read token
int
currentTokenStartOffset
The position in the input expression where the current token starts
String
currentTokenValue
The string value of the most recently read token
String
input
The string being parsed
int
inputOffset
The current position within the input string
int
startLineNumber
The starting line number (for XPath in XSLT, the line number in the stylesheet)

Method Summary

int
getColumnNumber()
Get the column number of the current token
int
getColumnNumber(int offset)
Return the column number corresponding to a given offset in the expression
long
getLineAndColumn(int offset)
Get the line and column number corresponding to a given offset in the input expression, as a long value with the line number in the top half and the column number in the lower half
int
getLineNumber()
Get the line number of the current token
int
getLineNumber(int offset)
Return the line number corresponding to a given offset in the expression
int
getState()
Get the current tokenizer state
void
lookAhead()
Look ahead by one token.
void
next()
Get the next token from the input expression.
char
nextChar()
Read next character directly.
String
recentText()
Get the most recently read text (for use in an error message)
void
setState(int state)
Set the tokenizer into a special state
void
tokenize(String input, int start, int end, int lineNumber)
Prepare a string for tokenization.
void
treatCurrentAsOperator()
Force the current token to be treated as an operator if possible
void
unreadChar()
Step back one character.

Field Details

BARE_NAME_STATE

public static final int BARE_NAME_STATE
State in which a name is NOT to be merged with what comes next, for example "("
Field Value:
1

DEFAULT_STATE

public static final int DEFAULT_STATE
Initial default state of the Tokenizer
Field Value:
0

OPERATOR_STATE

public static final int OPERATOR_STATE
State in which the next thing to be read is an operator
Field Value:
3

SEQUENCE_TYPE_STATE

public static final int SEQUENCE_TYPE_STATE
State in which the next thing to be read is a SequenceType
Field Value:
2

currentToken

public int currentToken
The number identifying the most recently read token

currentTokenStartOffset

public int currentTokenStartOffset
The position in the input expression where the current token starts

currentTokenValue

public String currentTokenValue
The string value of the most recently read token

input

public String input
The string being parsed

inputOffset

public int inputOffset
The current position within the input string

startLineNumber

public int startLineNumber
The starting line number (for XPath in XSLT, the line number in the stylesheet)

Method Details

getColumnNumber

public int getColumnNumber()
Get the column number of the current token
Returns:
the column number

getColumnNumber

public int getColumnNumber(int offset)
Return the column number corresponding to a given offset in the expression
Parameters:
offset - the byte offset in the expression
Returns:
the column number

getLineAndColumn

public long getLineAndColumn(int offset)
Get the line and column number corresponding to a given offset in the input expression, as a long value with the line number in the top half and the column number in the lower half
Parameters:
offset - the byte offset in the expression
Returns:
the line and column number, packed together

getLineNumber

public int getLineNumber()
Get the line number of the current token
Returns:
the line number

getLineNumber

public int getLineNumber(int offset)
Return the line number corresponding to a given offset in the expression
Parameters:
offset - the byte offset in the expression
Returns:
the line number

getState

public int getState()
Get the current tokenizer state
Returns:
the current state

lookAhead

public void lookAhead()
            throws XPathException
Look ahead by one token. This method does the real tokenization work. The method is normally called internally, but the XQuery parser also calls it to resume normal tokenization after dealing with pseudo-XML syntax.
Throws:
XPathException - if a lexical error occurs

next

public void next()
            throws XPathException
Get the next token from the input expression. The type of token is returned in the currentToken variable, the string value of the token in currentTokenValue.
Throws:
XPathException - if a lexical error is detected

nextChar

public char nextChar()
            throws StringIndexOutOfBoundsException
Read next character directly. Used by the XQuery parser when parsing pseudo-XML syntax
Returns:
the next character from the input

recentText

public String recentText()
Get the most recently read text (for use in an error message)
Returns:
a chunk of text leading up to the error

setState

public void setState(int state)
Set the tokenizer into a special state
Parameters:
state - the new state

tokenize

public void tokenize(String input,
                     int start,
                     int end,
                     int lineNumber)
            throws XPathException
Prepare a string for tokenization. The actual tokens are obtained by calls on next()
Parameters:
input - the string to be tokenized
start - start point within the string
end - end point within the string (last character not read): -1 means end of string
lineNumber - the linenumber in the source where the expression appears
Throws:
XPathException - if a lexical error occurs, e.g. unmatched string quotes

treatCurrentAsOperator

public void treatCurrentAsOperator()
Force the current token to be treated as an operator if possible

unreadChar

public void unreadChar()
Step back one character. If this steps back to a previous line, adjust the line number.