net.sf.saxon.dotnet

Class DotNetRegexTranslator


public class DotNetRegexTranslator
extends SurrogateRegexTranslator

This class translates XML Schema regex syntax into .NET regex syntax. Author: James Clark, Thai Open Source Software Center Ltd. See statement at end of file. Modified by Michael Kay (a) to integrate the code into Saxon, (b) to support XPath additions to the XML Schema regex syntax, (c) to target the .NET regex syntax instead of JDK 1.4

This version of the regular expression translator treats each half of a surrogate pair as a separate character, translating anything in an XPath regex that can match a non-BMP character into a Java regex that matches the two halves of a surrogate pair independently. This approach doesn't work under JDK 1.5, whose regex engine treats a surrogate pair as a single character.

This translator is currently used for Saxon on .NET 1.1. It's almost the same as the JDK 1.4 version, except that it avoids use of the "&&" operator, which isn't available on .NET.

Nested Class Summary

Nested classes/interfaces inherited from class net.sf.saxon.regex.SurrogateRegexTranslator

SurrogateRegexTranslator.BackReference, SurrogateRegexTranslator.CharClass, SurrogateRegexTranslator.CharRange, SurrogateRegexTranslator.Complement, SurrogateRegexTranslator.Dot, SurrogateRegexTranslator.Empty, SurrogateRegexTranslator.Property, SurrogateRegexTranslator.SimpleCharClass, SurrogateRegexTranslator.SingleChar, SurrogateRegexTranslator.WideSingleChar

Nested classes/interfaces inherited from class net.sf.saxon.regex.RegexTranslator

RegexTranslator.Range

Field Summary

Fields inherited from class net.sf.saxon.regex.SurrogateRegexTranslator

categoryCharClasses, subCategoryCharClasses

Fields inherited from class net.sf.saxon.regex.RegexTranslator

ALL, NONE, NOT_ALLOWED_CLASS, SOME, SURROGATES1_CLASS, SURROGATES2_CLASS, captures, caseBlind, curChar, currentCapture, eos, ignoreWhitespace, inCharClassExpr, isXPath, length, pos, regExp, result, xmlVersion

Constructor Summary

DotNetRegexTranslator()
Create a regular expression translator for the .NET platform

Method Summary

int
getNumberOfCapturedGroups()
Get the number of captured groups for this regular expression
static void
main(String[] args)
Convenience main method for testing purposes.
String
translate(CharSequence regExp, int xmlVersion, boolean xpath, boolean ignoreWhitespace, boolean caseBlind)
Translates a regular expression in the syntax of XML Schemas Part 2 into a regular expression in the syntax of java.util.regex.Pattern.
protected boolean
translateAtom()

Methods inherited from class net.sf.saxon.regex.RegexTranslator

absorbSurrogatePair, advance, copyCurChar, expect, highSurrogateRanges, isAsciiAlnum, isBlock, isJavaMetaChar, lowSurrogateRanges, makeException, makeException, parseQuantExact, recede, sortRangeList, translateAtom, translateBranch, translateQuantifier, translateQuantity, translateRegExp, translateTop

Constructor Details

DotNetRegexTranslator

public DotNetRegexTranslator()
Create a regular expression translator for the .NET platform

Method Details

getNumberOfCapturedGroups

public int getNumberOfCapturedGroups()
Get the number of captured groups for this regular expression
Returns:
the number of captured groups

main

public static void main(String[] args)
            throws RegexSyntaxException
Convenience main method for testing purposes. Note that the actual testing is done using the Java regex engine.
Parameters:

translate

public String translate(CharSequence regExp,
                        int xmlVersion,
                        boolean xpath,
                        boolean ignoreWhitespace,
                        boolean caseBlind)
            throws RegexSyntaxException
Translates a regular expression in the syntax of XML Schemas Part 2 into a regular expression in the syntax of java.util.regex.Pattern. The translation assumes that the string to be matched against the regex uses surrogate pairs correctly. If the string comes from XML content, a conforming XML parser will automatically check this; if the string comes from elsewhere, it may be necessary to check surrogate usage before matching.
Parameters:
regExp - a String containing a regular expression in the syntax of XML Schemas Part 2
xmlVersion - the version of XML in use - this affects the meanings of the \i and \c character class escapes
xpath - a boolean indicating whether the XPath 2.0 F+O extensions to the schema regex syntax are permitted
ignoreWhitespace - true if the x flag is set, allowing ignorable whitespace in the regex
caseBlind - true if the i flag is set, allowing case blind comparisons
Returns:
a String containing a regular expression in the syntax of java.util.regex.Pattern
Throws:
RegexSyntaxException - if regexp is not a regular expression in the syntax of XML Schemas Part 2, or XPath 2.0, as appropriate
See Also:
java.util.regex.Pattern, XML Schema Part 2

translateAtom

protected boolean translateAtom()
            throws RegexSyntaxException
Overrides:
translateAtom in interface RegexTranslator