๐Ÿ 

parsing with patterns

a JSON parser in 25 lines. seriously.

the core idea

Zef has a parser-combinator library built on types. Yes โ€” the same type system from chapter 7. Types describe shapes; you use parse2 to match a shape against input.

parsing = set-theoretic types on sequences

A pattern is a type. parse2(pattern) | collect either produces the parsed value or raises an error. Because patterns compose with | (union) and & (intersection), grammars compose naturally too.

the syntax card

syntaxmeaning
[Type]zero-or-more (like regex *)
Repeated[...]one-or-more (like regex +)
Optional[...]zero-or-one (like regex ?)
Literal['a', 'b']match any listed char
String[...]string-pattern
Sequence[...]array/sequence pattern
Type[... : fn]wrap with a transform on match
z.Forwardforward declaration for recursion

simple patterns first

# exact match
'hello' | parse2('hello') | collect       # 'hello'

# pick one of
'yes' | parse2('yes' | 'no') | collect   # 'yes'

# typed
42 | parse2(Int) | collect                # 42
42 | parse2(String) | collect             # parse error

sequences โ€” the workhorse

Array mode (positional, unnamed):

[1, 'hello', 3.14] | parse2(Sequence[Int, String, Float]) | collect
# [1, 'hello', 3.14]

Dict mode (each element gets a name):

['Alice', 30] | parse2(Sequence[
    'name': String,
    'age':  Int,
]) | collect
# {'name': 'Alice', 'age': 30}

transforms on capture

Use : to post-process a matched value:

['alice', 30] | parse2(Sequence[
    'name': String : to_upper_case,
    'age':  Int    : add(1),
]) | collect
# {'name': 'ALICE', 'age': 31}

the big one โ€” a JSON parser in 25 lines

# Forward declarations for recursive types
JsonValue  = z.Forward
JsonArray  = z.Forward
JsonObject = z.Forward

# Whitespace
WS = [Space]

# Primitives
JsonNull   = String['null' : constant_func(None)]
JsonBool   = String['true' : constant_func(True)] | String['false' : constant_func(False)]
JsonString = Type[String['"', [~Literal['"']], '"'] : second]
JsonNumber = Type[String[
    Optional['-'],
    Repeated[Digit],
    Optional['.', Repeated[Digit]],
    Optional[Literal['e', 'E'], Optional[Literal['+', '-']], Repeated[Digit]]
] : to_float]

# Compound
JsonArray  = Sequence['[', WS, Optional[JsonValue, Repeated[WS, ',', WS, JsonValue]], WS, ']']
JsonPair   = Sequence[JsonString, WS, ':', WS, JsonValue]
JsonObject = Sequence['{', WS, Optional[JsonPair, Repeated[WS, ',', WS, JsonPair]], WS, '}']

# Union everything
JsonValue  = JsonNull | JsonBool | JsonNumber | JsonString | JsonArray | JsonObject

# Use it:
'{"users": [{"name": "Alice", "score": 95.5}], "active": true}' | parse2(JsonValue) | collect
# {'users': [{'name': 'Alice', 'score': 95.5}], 'active': True}

twenty-five lines. a parser.

Because the "grammar" is just type expressions, and type expressions compose with pipes/unions/intersections, you can assemble a real recursive parser from small pieces. No parser-generator, no PEG DSL, no regex nightmare. Types go in; structured values come out.

other useful parser examples

S-expressions

SExpr  = z.Forward

Atom   = String[Repeated[Alphanumeric | Literal['_', '-', '+', '*', '/']]]
Number = Type[String[Optional['-'], Repeated[Digit]] : int]

SExpr  = Number | Atom | Sequence['(', [Space], [SExpr, [Space, SExpr]], [Space], ')']

'(+ 1 (* 2 3))' | parse2(SExpr) | collect

CSV line

Field  = String[[~Literal[',']]]
CsvLine = Sequence[Field, Repeated[',', Field]]

'alice,30,berlin' | parse2(CsvLine) | collect

HTTP request line

Method = 'GET' | 'POST' | 'PUT' | 'DELETE'
Path   = String[Repeated[~Space]]
Req    = Sequence[
    'method':  Method,
    '_':       Space,
    'path':    Path,
    '_':       Space,
    'version': String['HTTP/1.', Digit],
]

'GET /users HTTP/1.1' | parse2(Req) | collect
# {'method': 'GET', 'path': '/users', 'version': 'HTTP/1.1'}

z.Forward โ€” for recursive grammars

Without forward declarations, you can't write A = B; B = A | .... z.Forward gives you the placeholder:

Expr = z.Forward

Factor = Int | Sequence['(', Expr, ')']
Term   = Sequence[Factor, Repeated[Literal['*', '/'], Factor]]
Expr   = Sequence[Term,   Repeated[Literal['+', '-'], Term]]  # now defined

why this design wins

grammars as values

Your JsonValue is just a Python variable. You can pass it around, store it, compose it with another grammar, generate it from configuration. Parsers-as-values make grammar engineering much nicer than a parser-generator tool that demands a separate build step.

quick reference

goalpattern
one ofA | B | C
and-thenSequence[A, B, C]
zero or more[A]
one or moreRepeated[A]
optionalOptional[A]
negate char~Literal['"']
transform matchType[A : fn]
forward-declareX = z.Forward
invokeinput | parse2(Pattern) | collect

parse a simple email address

Write a pattern that parses [email protected] into {'user': 'alice', 'domain': 'example.com'}.

solution
LocalPart = String[Repeated[~Literal['@']]]
Domain    = String[Repeated[~Space]]

Email = Sequence[
    'user':   LocalPart,
    '_':      '@',
    'domain': Domain,
]

'[email protected]' | parse2(Email) | collect
# {'user': 'alice', 'domain': 'example.com'}

Next up: worker processes โ€” isolated, fault-tolerant compute. โ†’