Skip to content

funcparserlib.lexer — Regexp-based tokenizer

funcparserlib.lexer.make_tokenizer(specs)

Make a function that tokenizes text based on the regexp specs.

Type: (Sequence[TokenSpec | Tuple]) -> Callable[[str], Iterable[Token]]

A token spec is TokenSpec instance.

Note

For legacy reasons, a token spec may also be a tuple of (type, args), where type sets the value of Token.type for the token, and args are the positional arguments for re.compile(): either just (pattern,) or (pattern, flags).

It returns a tokenizer function that takes a string and returns an iterable of Token objects, or raises LexerError if it cannot tokenize the string according to its token specs.

Examples:

>>> tokenize = make_tokenizer([
...     TokenSpec("space", r"\s+"),
...     TokenSpec("id", r"\w+"),
...     TokenSpec("op", r"[,!]"),
... ])
>>> text = "Hello, World!"
>>> [t for t in tokenize(text) if t.type != "space"]  # noqa
[Token('id', 'Hello'), Token('op', ','), Token('id', 'World'), Token('op', '!')]
>>> text = "Bye?"
>>> list(tokenize(text))
Traceback (most recent call last):
    ...
lexer.LexerError: cannot tokenize data: 1,4: "Bye?"

funcparserlib.lexer.TokenSpec

A token specification for generating a lexer via make_tokenizer().

funcparserlib.lexer.TokenSpec.__init__(type, pattern, flags=0)

Initialize a TokenSpec object.

Parameters:

Name Type Description Default
type str

User-defined type of the token (e.g. "name", "number", "operator")

required
pattern str

Regexp for matching this token type

required
flags int

Regexp flags, the second argument of re.compile()

0

funcparserlib.lexer.Token

A token object that represents a substring of certain type in your text.

You can compare tokens for equality using the == operator. Tokens also define custom repr() and str().

Attributes:

Name Type Description
type str

User-defined type of the token (e.g. "name", "number", "operator")

value str

Text value of the token

start Optional[Tuple[int, int]]

Start position (line, column)

end Optional[Tuple[int, int]]

End position (line, column)