`funcparserlib.lexer` — Regexp-based tokenizer

`funcparserlib.lexer.make_tokenizer(specs)`

Make a function that tokenizes text based on the regexp specs.

Type: (Sequence[TokenSpec | Tuple]) -> Callable[[str], Iterable[Token]]

A token spec is TokenSpec instance.

Note

For legacy reasons, a token spec may also be a tuple of (type, args), where type sets the value of Token.type for the token, and args are the positional arguments for re.compile(): either just (pattern,) or (pattern, flags).

It returns a tokenizer function that takes a string and returns an iterable of Token objects, or raises LexerError if it cannot tokenize the string according to its token specs.

Examples:

>>> tokenize = make_tokenizer([
...     TokenSpec("space", r"\s+"),
...     TokenSpec("id", r"\w+"),
...     TokenSpec("op", r"[,!]"),
... ])
>>> text = "Hello, World!"
>>> [t for t in tokenize(text) if t.type != "space"]  # noqa
[Token('id', 'Hello'), Token('op', ','), Token('id', 'World'), Token('op', '!')]
>>> text = "Bye?"
>>> list(tokenize(text))
Traceback (most recent call last):
    ...
lexer.LexerError: cannot tokenize data: 1,4: "Bye?"

`funcparserlib.lexer.TokenSpec`

A token specification for generating a lexer via make_tokenizer().

`funcparserlib.lexer.TokenSpec.init(type, pattern, flags=0)`

Initialize a TokenSpec object.

Parameters:

Name	Type	Description	Default
`type`	`str`	User-defined type of the token (e.g. `"name"`, `"number"`, `"operator"`)	required
`pattern`	`str`	Regexp for matching this token type	required
`flags`	`int`	Regexp flags, the second argument of `re.compile()`	`0`

`funcparserlib.lexer.Token`

A token object that represents a substring of certain type in your text.

You can compare tokens for equality using the == operator. Tokens also define custom repr() and str().

Attributes:

Name	Type	Description
`type`	`str`	User-defined type of the token (e.g. `"name"`, `"number"`, `"operator"`)
`value`	`str`	Text value of the token
`start`	`Optional[Tuple[int, int]]`	Start position (line, column)
`end`	`Optional[Tuple[int, int]]`	End position (line, column)

funcparserlib.lexer — Regexp-based tokenizer

funcparserlib.lexer.make_tokenizer(specs)

funcparserlib.lexer.TokenSpec

funcparserlib.lexer.TokenSpec.__init__(type, pattern, flags=0)

funcparserlib.lexer.Token