a JSON parser in 25 lines. seriously.
Zef has a parser-combinator library built on types. Yes โ the same type system from chapter 7. Types describe shapes; you use parse2 to match a shape against input.
A pattern is a type. parse2(pattern) | collect either produces
the parsed value or raises an error. Because patterns compose with |
(union) and & (intersection), grammars compose naturally too.
| syntax | meaning |
|---|---|
[Type] | zero-or-more (like regex *) |
Repeated[...] | one-or-more (like regex +) |
Optional[...] | zero-or-one (like regex ?) |
Literal['a', 'b'] | match any listed char |
String[...] | string-pattern |
Sequence[...] | array/sequence pattern |
Type[... : fn] | wrap with a transform on match |
z.Forward | forward declaration for recursion |
# exact match
'hello' | parse2('hello') | collect # 'hello'
# pick one of
'yes' | parse2('yes' | 'no') | collect # 'yes'
# typed
42 | parse2(Int) | collect # 42
42 | parse2(String) | collect # parse error
Array mode (positional, unnamed):
[1, 'hello', 3.14] | parse2(Sequence[Int, String, Float]) | collect
# [1, 'hello', 3.14]
Dict mode (each element gets a name):
['Alice', 30] | parse2(Sequence[
'name': String,
'age': Int,
]) | collect
# {'name': 'Alice', 'age': 30}
Use : to post-process a matched value:
['alice', 30] | parse2(Sequence[
'name': String : to_upper_case,
'age': Int : add(1),
]) | collect
# {'name': 'ALICE', 'age': 31}
# Forward declarations for recursive types
JsonValue = z.Forward
JsonArray = z.Forward
JsonObject = z.Forward
# Whitespace
WS = [Space]
# Primitives
JsonNull = String['null' : constant_func(None)]
JsonBool = String['true' : constant_func(True)] | String['false' : constant_func(False)]
JsonString = Type[String['"', [~Literal['"']], '"'] : second]
JsonNumber = Type[String[
Optional['-'],
Repeated[Digit],
Optional['.', Repeated[Digit]],
Optional[Literal['e', 'E'], Optional[Literal['+', '-']], Repeated[Digit]]
] : to_float]
# Compound
JsonArray = Sequence['[', WS, Optional[JsonValue, Repeated[WS, ',', WS, JsonValue]], WS, ']']
JsonPair = Sequence[JsonString, WS, ':', WS, JsonValue]
JsonObject = Sequence['{', WS, Optional[JsonPair, Repeated[WS, ',', WS, JsonPair]], WS, '}']
# Union everything
JsonValue = JsonNull | JsonBool | JsonNumber | JsonString | JsonArray | JsonObject
# Use it:
'{"users": [{"name": "Alice", "score": 95.5}], "active": true}' | parse2(JsonValue) | collect
# {'users': [{'name': 'Alice', 'score': 95.5}], 'active': True}
Because the "grammar" is just type expressions, and type expressions compose with pipes/unions/intersections, you can assemble a real recursive parser from small pieces. No parser-generator, no PEG DSL, no regex nightmare. Types go in; structured values come out.
SExpr = z.Forward
Atom = String[Repeated[Alphanumeric | Literal['_', '-', '+', '*', '/']]]
Number = Type[String[Optional['-'], Repeated[Digit]] : int]
SExpr = Number | Atom | Sequence['(', [Space], [SExpr, [Space, SExpr]], [Space], ')']
'(+ 1 (* 2 3))' | parse2(SExpr) | collect
Field = String[[~Literal[',']]]
CsvLine = Sequence[Field, Repeated[',', Field]]
'alice,30,berlin' | parse2(CsvLine) | collect
Method = 'GET' | 'POST' | 'PUT' | 'DELETE'
Path = String[Repeated[~Space]]
Req = Sequence[
'method': Method,
'_': Space,
'path': Path,
'_': Space,
'version': String['HTTP/1.', Digit],
]
'GET /users HTTP/1.1' | parse2(Req) | collect
# {'method': 'GET', 'path': '/users', 'version': 'HTTP/1.1'}
Without forward declarations, you can't write A = B; B = A | ....
z.Forward gives you the placeholder:
Expr = z.Forward
Factor = Int | Sequence['(', Expr, ')']
Term = Sequence[Factor, Repeated[Literal['*', '/'], Factor]]
Expr = Sequence[Term, Repeated[Literal['+', '-'], Term]] # now defined
Your JsonValue is just a Python variable. You can pass it around,
store it, compose it with another grammar, generate it from configuration.
Parsers-as-values make grammar engineering much nicer than a parser-generator
tool that demands a separate build step.
| goal | pattern |
|---|---|
| one of | A | B | C |
| and-then | Sequence[A, B, C] |
| zero or more | [A] |
| one or more | Repeated[A] |
| optional | Optional[A] |
| negate char | ~Literal['"'] |
| transform match | Type[A : fn] |
| forward-declare | X = z.Forward |
| invoke | input | parse2(Pattern) | collect |
Write a pattern that parses [email protected] into {'user': 'alice', 'domain': 'example.com'}.
LocalPart = String[Repeated[~Literal['@']]]
Domain = String[Repeated[~Space]]
Email = Sequence[
'user': LocalPart,
'_': '@',
'domain': Domain,
]
'[email protected]' | parse2(Email) | collect
# {'user': 'alice', 'domain': 'example.com'}
Next up: worker processes โ isolated, fault-tolerant compute. โ