Video #173: Parser Builders: The Problem

Episode: Video #173 Date: Jan 10, 2022 Access: Members Only 🔒 URL: https://www.pointfree.co/episodes/ep173-parser-builders-the-problem

Description

Let’s revisit a favorite topic: parsing! After a short recap, we will theorize and motivate the addition of result builder syntax to our parsing library, which will help unlock a new level of ergonomics and API design.

Video

Cloudflare Stream video ID: a681b8f67cbdd20d9b14459e995a59fc Local file: video_173_parser-builders-the-problem.mp4 *(download with --video 173)*

References

Transcript

— 0:05

It has been over a year since we last talked about parsers on Point-Free, and it was more than one year before that since we first introduced the idea of parsers. There’s still so much we want to discuss when it comes to parsing, and so we are reviving the topic this week to explore some really exciting advancements we’ve made recently.

— 0:22

The last time we talked about parsing we heavily focused on:

— 0:24

Generalization, which allows parsers to process any type of input into any type of output;

— 0:29

Ergonomics, which allows us to construct parsers in the most fluent way possible; and

— 0:34

Performance, which allows our parsers to be as performant as ad-hoc, hand-rolled parsers, and sometimes even more performant.

— 0:42

Starting this week we are going to continue a bit more with the ergonomics angle of parsing by seeing what Swift’s result builders have to offer parsing. Result builders are a relatively new feature of Swift that were primarily created in order to facilitate a concise syntax for SwiftUI views, but there are many applications for builders. Parser recap

— 0:59

But, before diving straight into parser ergonomics, let us spend a few moments reminding everyone of the basics of parsing. This will be very brief, and we highly recommend everyone watch our past episodes, but it’s also the beginning of the year and so it might be nice to start the topic fresh.

— 1:19

We currently have the swift-parsing project opened, which is our open source package with the parsing tools we covered in past episodes, and a lot more. There is a playground in the package for us to explore some things.

— 1:32

In previous episodes, after much exploration into what Swift and Apple’s frameworks offer in the way of parsing, we boiled down the essence of parsing to a single function signature: (inout Input) -> Output?

— 1:42

A function that takes some inout input and produces an optional output. The input is inout because we want to be able to construct small parsers that accomplish one specific thing, such as parsing an integer from the beginning of the string. Such parsers need to be able to consume a little bit from the input so that the rest of the input is left over for other parsers to work on.

— 2:03

And the output is optional because a parser may fail to do its job. If we are trying to parse an integer from the beginning of a string, but the string starts with a non-numeric character, then we have no choice but to fail.

— 2:14

We model this parser signature in Swift as a protocol: public protocol Parser { associatedtype Input associatedtype Output func parse(_ input: inout Input) -> Output? }

— 2:19

Types conform to this protocol in order to describe how one can parse a certain type of input into a certain type of output.

— 2:26

We don’t often need to construct concrete conformances to this protocol. Instead, the library ships with many parsers as well as operators that allow us to build up complex parsers from simpler ones. This should be familiar to how we often deal with the standard library’s collections API or Apple’s Combine framework. We don’t often need to conform our types to the Collection or Publisher protocol, but rather we use the framework’s base types and their operators for constructing complex collections and publishers.

— 2:55

For example, the library ships with a parser that can parse an integer from the beginning of a string: var input = "123 Hello World!"[...] Int.parser().parse(&input) // 123 input // " Hello World!"

— 3:27

Note that we are using a substring here instead of a string. Most of the parsers and operators that ship with the library work on substrings, or even lower-level string representations such as UTF-8 views. This is because such types expose very efficient ways to incrementally consume bits from the front and back of the string. It’s only a matter of mutating some pointer indices. Whereas consuming a bit from the front of a String requires constructing a whole new string with some characters left off, which is very inefficient.

— 4:02

When we run this code in the playground we will see that the .parse invocation returns the integer 123, and the input has been mutated so that it now only consists of " Hello World!" : var input = "123 Hello World!"[...] Int.parser().parse(&input) // 123 input // " Hello World!"

— 4:16

Another parser the library ships with is the StartsWith parser, which allows us to check if the input string starts with some exact characters, and if so it consumes those characters and succeeds: StartsWith(" Hello ").parse(&input) // () input // "World!

— 4:39

Note that the .parse method returns a void value. This is because the parser doesn’t have anything of value to return when parsing succeeds. It only needs to say whether or not it was successful.

— 4:58

The library also gives a shortcut to this StartsWith parser. Strings and UTF-8 views themselves conform to the Parser protocol and perform the same logic as StartsWith under the hood: //StartsWith(" Hello ").parse(&input) " Hello ".parse(&input) // () input // "World!"

— 5:15

We can run another parser to capture everything after the “Hello “ and before the “!” by using the Prefix(while:) parser: Prefix(while: { $0 != "!" }).parse(&input) // "World" input // "!"

— 5:55

If we try to run one last parser on what’s left of the input, which is an exclamation mark, but we try to parse the wrong thing, like say a period, we will get nil back from the .parse method and nothing will be consumed from the input: ".".parse(&input) // nil input // "!"

— 6:12

Beyond individual parsers, the library also ships with operators that allows you to piece together multiple parsers to form more and more complex parsers.

— 6:19

For example, we could bundle up all of the parsers we’ve built so far into one big parser that processes the whole string at once. This is done with the .take and .skip operators, which allow you to run one parser after another, and you can choose to either keep the output of the parser or discard it. This can be helpful for when you have Void parsers, such as the " Hello " parser, for which you want to run the parser to consume some input, but you don’t need the value it produces.

— 6:46

Using these parsers we can combine all of our previous parsers into a single one that produces a tuple of an integer and a string: input = "123 Hello World!"[...] Int.parser() .skip(" Hello ") .take(Prefix { $0 != "!" }) .skip("!") .parse(&input) // (123, "World") input // ""

— 7:26

Let’s try out a more complex textual format to parse. Suppose we had a string of comma-separated values that describe user values. Each row contains an integer id, a string name, and a boolean that determines whether or not the user is an admin: input = """ 1,Blob,true 2,Blob Jr,false 3,Blob Sr,true """[...]

— 7:42

We’d like to parse this string into an array of users, which we could define as a first-class struct type in Swift: struct User { var id: Int var name: String var isAdmin: Bool }

— 7:55

We can tackle this parsing problem one step at a time. We can start by parsing the id from the first row of the input: Int.parser()

— 8:04

If that succeeds we then want to parse the “,” from the beginning of the string, and since such a parser would only return a Void value we want to discard it, which we can do using the .skip operator: Int.parser() .skip(",")

— 8:09

Next we want to parse the name, which means we want to consume everything until we reach the next comma. To do this we can once again use the Prefix parser, which allows you to specify a predicate on the characters of the string. The parser will consume every character from the beginning of the string until the predicate evaluates to false: Int.parser() .skip(",") .take(Prefix { $0 != "," })

— 8:27

Now we have parsed everything up to the second comma, but we haven’t yet parsed and consumed the comma, so let’s do that and discard it: Int.parser() .skip(",") .take(Prefix { $0 != "," }) .skip(",")

— 8:31

After parsing all of that what we have left to parse in a single line of the input is a boolean, which we can parse using another application of the .take operator along with our boolean parser: Int.parser() .skip(",") .take(Prefix { $0 != "," }) .skip(",") .take(Bool.parser())

— 8:41

This parser can now process one entire line of the input string. We can even give it a spin if we assign it to a variable and run it on the input string: let user = Int.parser() .skip(",") .take(Prefix { $0 != "," }) .skip(",") .take(Bool.parser()) user.parse(&input) // (1, "Blob", true) input // "\n2,Blob Jr,false\n3,Blob Sr,true" The parser succeeded, produced a tuple of an integer, string, and a boolean, and consumed everything up to the first newline in the input string.

— 9:05

We can take this a little further by transforming the parser in order to turn the tuple into a proper User value. We can do this by using the .map operator on parsers, which allows you to transform the output of a parser. In a very deep sense it is intimately related to the .map operation that we know and love from arrays, optionals, the result type, and more in the Swift standard library. We have an entire episode dedicated to that topic that we highly recommend all of our viewers watch.

— 9:33

If we autocomplete .map at the end of the parser we will see that it takes a transform closure, and that closure takes three arguments: an integer for the id, a substring for the name, and a boolean for the admin flag: .map(<#transform: ((Int, Substring, Bool)) -> NewOutput#>)

— 9:45

So, we can open up that closure and construct a User value, but we just have to make sure to convert the Substring back into a String since that is what the User model expects: .map { User(id: $0, name: String($1), isAdmin: $2) }

— 10:07

It’d be cooler if we could just pass the User ’s initializer directly to .map , which means we wouldn’t even have to refer to the arguments of the closure: .map(User.init(id:name:isAdmin:))

— 10:19

This style is a lot more succinct, and is what is known as “point-free” style since we are not referring to any arguments. However, this doesn’t work because the parser produces a substring, but the User initializer only wants a string.

— 10:30

To fix that we can apply another .map operator to the Prefix parser in order to coerce its output into a string: let user = Int.parser() .skip(",") .take(Prefix { $0 != "," }.map(String.init)) .skip(",") .take(Bool.parser()) .map(User.init(id:name:isAdmin:))

— 10:37

And now everything compiles, and when we run the parser we get a User value out the other end, rather than just a tuple.

— 10:41

So, we can now consume an entire line from the input string in order to procure a single user. But, we want to run this parser multiple times on the input string in order to extract out as many users as possible. To do this we can turn to the Many parser, which runs a single parser multiple times, along with an optional separator parser, and builds up the outputs into an array: let users = Many(user, separator: "\n") users.parse(&input) input // ""

— 11:30

And now we can parse the entire input string.

— 11:36

This is pretty incredible. In just a handful of lines we have built a parser that can not only extract out the fields in a comma-separated string, but can further turn those strings into proper data types such as integers, booleans, and even users. And this is only scratching the surface. The parsing library has many more parsers and operators that can do some really powerful things.

— 11:57

Much like SwiftUI and Combine, our parsing library is designed with a protocol at its core, and many, many, many types that conform to the protocol. Each operator in the library returns a whole new type that conforms to the protocol, and those types can nest really deeply. For example, here is the type of our users parser: Many< Parsers.Map< Parsers.Take3< Parsers.SkipSecond< Parsers.Take2< Parsers.SkipSecond< Parsers.UTF8ViewToSubstring< Parsers.IntParser<Substring.UTF8View, Int> >, String >, Parsers.Map< Prefix<Substring>, String > >, String >, Int, String, Parsers.UTF8ViewToSubstring< Parsers.BoolParser<Substring.UTF8View> > >, User >, [User], String >

— 12:26

This is exactly how Combine and SwiftUI work. If you build up a complex publisher or view using operators you are secretly building up a very complex type: import Combine let publisher = Just(1) .map { $0 + 1 } .flatMap { Just($0) } .filter { $0.isMultiple(of: 2) } .dropFirst() .ignoreOutput() /* Publishers.IgnoreOutput< Publishers.Drop< Publishers.Filter< Publishers.FlatMap< Just<Int>, Just<Int> > > > > */

— 12:49

And similarly for SwiftUI: import SwiftUI let view = Group { ForEach(1...10, id: \.self) { index in Button(action: {}) { HStack { Text("Number") Text("\(index)") } } } } /* Group< ForEach< ClosedRange<Int>, Int, Button< HStack< TupleView<( Text, Text )> > > > > */

— 13:01

So, this is a pretty complex parser, but let’s kick things up a notch so that we can explore another very important operator. Right now we model the admin flag as a simple boolean, but what if we wanted to model three states instead of just two. Perhaps a user can either be a guest, member, or admin: input = """ 1,Blob,member 2,Blob Jr,guest 3,Blob Sr,admin """

— 13:23

Then we need to do some extra work to extract out these values.

— 13:27

The simplest way to handle this would be to hold a string in the User struct for the user’s role: struct User { var id: Int var name: String var role: String }

— 13:33

And then we would update the user parser to take everything up to the trailing newline for the role. But, before doing that we can improve this a bit. It is not ideal to store the role of a user as a free form string because it would allow us to parse any kind of string for this property. We only want to recognize three different values: guest, member, and admin.

— 13:49

So, let’s introduce an enum to model this choice: enum Role { case admin, guest, member } struct User { var id: Int var name: String var role: Role }

— 14:00

And then we can cook up a parser for the Role type. For example, to recognize the “admin” string we could use the string parser to parser it from the beginning of the string, and then .map on it to turn it into a Role value: "admin".map { Role.admin }

— 14:18

The .map operation allows you to take the output from a parser and transform it. In this case, the string parser “admin” doesn’t produce anything interesting for its output. It is just a Void parser. And so to .map it means to simply replace the output with a value that we specify.

— 14:31

It’s worth noting that this .map is the operation defined on the Parser protocol, not the one that is defined on String as a collection. This may seem a little confusing, so if you do not like that you can be more explicit with your parsers: StartsWith("admin").map { Role.admin }

— 14:46

But we like the shorter version personally.

— 14:49

We can do something similar for each case of the Role enum: "guest".map { Role.guest } "member".map { Role.member }

— 14:57

We want to somehow combine all of these parsers into a single one that simply runs each one, and takes the first one that succeeds. To do that we can use the .orElse operator: let role = "admin".map { Role.admin } .orElse("guest".map { Role.guest }) .orElse("member".map { Role.member })

— 15:18

This says try to parse “admin”, and if that fails try to parse “guest”, and if that fails try to parse “member”, and if that fails then the whole thing fails. But if anything succeeds along the way we will short circuit all later parsers.

— 15:31

We can plug this role parser into the user parser, and everything should now parse correctly: let user = Int.parser() .skip(",") .take(Prefix { $0 != "," }.map(String.init)) .skip(",") .take(role) .map(User.init(id:name:role:)) The need for parser builders

— 15:44

So, that’s the very basics of parsers. There’s still a lot more to know, so we highly encourage watching our past episodes, but this short introduction should help you get through what we are about to discuss now.

— 15:56

In fact, building up this parser from scratch can help us show why we think result builders are necessary to unlock the next level of ergonomics and functionality.

— 16:10

Take, for example, the role parser: let role = "admin".map { Role.admin } .orElse("guest".map { Role.guest }) .orElse("member".map { Role.member })

— 16:14

This .orElse chaining can get pretty noisy. It gives the first parser higher visual standing because it is not wrapped inside an .orElse(…) . Second, if we need to add more roles in the future this list will get longer with a bunch of redundantly repeated .orElse s: let role = "admin".map { Role.admin } .orElse("guest".map { Role.guest }) .orElse("member".map { Role.member }) .orElse(...) .orElse(...) .orElse(...) .orElse(...)

— 16:33

What if instead we could use a syntax more similar to SwiftUI views by starting off with a single top level parser to indicate we want to choose one of the parsers enclosed: let role = OneOf { "admin".map { Role.admin } "guest".map { Role.guest } "member".map { Role.member } }

— 16:54

That looks much better. We get a top-level description of what we plan on doing, the OneOf , and each parser listed in the builder has prominence determined only by their position in the closure.

— 17:19

Even better, we can very simply add more roles in the future, without the noise we had before: let role = OneOf { "admin".map { Role.admin } "guest".map { Role.guest } "member".map { Role.member } … }

— 17:27

This syntax isn’t yet compiling, so let’s comment it out.

— 17:31

The user parser could take advantage of this builder syntax too: let user = Int.parser() .skip(",") .take(Prefix { $0 != "," }.map(String.init)) .skip(",") .take(role) .map(User.init(id:name:role:))

— 17:35

Here we are using the .skip and .take operators to fluently describe the idea that we want to successively run multiple parsers on an input, but sometimes we want to discard the data parsed, such as void values, and sometimes we want to keep the data.

— 17:49

These .skip and .take operators were our solution to the ergonomics problem of parsers. They simultaneously make it easy for us to read a parser from top-to-bottom, and they minimize the number of overloads we need to define so that they work on most reasonably-sized parsing problems.

— 18:05

The .skip / .take style of parsing was a huge improvement over the first style we showed, using a zip operator, but that doesn’t mean we can’t make things even better. Using result builders we can hide the .skip / .take noise and just concentrate on the parsers we want to run: let user = Parse { Int.parser() "," Prefix { $0 != "," }.map(String.init) "," role } .map(User.init(id:name:role:))

— 18:21

Here a new Parse type acts as an entry point into the builder syntax, and then we can just list our parsers one after the other. Further, it would be great if under the hood the builder could automatically discard any of the void outputs from the parsers so we wouldn’t even need to explicitly say .skip .

— 19:03

Even better, we could make the .map operation more prominent. Right now you have to read the full parser from top-to-bottom to realize that at the end we are mapping on the parsed outputs in order to bundle them up into a User value. What if instead the Parse entry point could take the transformation we want to apply to the parsed outputs: let user = Parse(User.init(id:name:role:)) { Int.parser() "," Prefix { $0 != "," }.map(String.init) "," role }

— 19:28

This allows us to say upfront that we intend on parsing a User , and then we describe all the steps necessary to parse the user. This would be pretty incredible.

— 19:41

But it gets even better. Once we start using parser builders it opens up to entirely new API designs that are not currently possible. In fact, we can take some inspiration from SwiftUI.

— 19:53

In SwiftUI, views that are created by providing secondary views are designed with view builders in mind. For example, a Section view for a form specifies the content, header and footer all as view builders: Section { Text("Content") } header: { Text("Title") } footer: { Text("Caption") }

— 20:21

Similarly for navigation links: NavigationLink { Text("Destination") } label: { Text("Press me") }

— 20:34

Pretty much all SwiftUI views from Apple are designed this way. Perhaps our Many parser could be designed this way too, where you specify the main and separator parsers in builder syntax: let users = Many { user } separator: { "\n" }

— 21:00

Or if you prefer to have some newlines in our code you could write it as: let users = Many { user } separator: { "\n" }

— 21:07

So, this is only just a small preview of what would be possible if we could leverage builder syntax for our parsers.

— 21:14

You may be wondering: why didn’t we do this all along? If this syntax is so nice, why did we ever bother with the .skip / .take style of API?

— 21:22

Well, the main reason is that we didn’t think the builder syntax was feasible due to an explosion of overloads. The skip/take style of parsing allows us to minimize the number of overloads we need to define. If we want to parse 6 values from an input in a single parser, then we just need 6 overloads of .take .

— 21:41

Most importantly, we can skip as many parser’s output as we want without defining a new overload. For example, in our user parser we could sprinkle in a whole bunch of skips in order to allow any amount of white space between fields: let zeroOrMoreSpaces = Prefix { $0 == " " } let user = Skip(zeroOrMoreSpaces) .take(Int.parser()) .skip(zeroOrMoreSpaces) .skip(",") .skip(zeroOrMoreSpaces) .take(Prefix { $0 != "," }.map(String.init(_:))) .skip(zeroOrMoreSpaces) .skip(",") .skip(zeroOrMoreSpaces) .take(role) .skip(zeroOrMoreSpaces) .map(User.init(id:name:role:)) That change allows us to parse input strings like this: input = """ 1 , Blob ,member 2,Blob Jr , guest 3,Blob Sr ,admin """[...]

— 22:48

Now technically we are running 11 parsers on the input, but there are only 3 values being taken (id, name, and role), and so this really only involved the 3rd overload of .take that deals with triples.

— 23:07

On the flip side, if we tried writing this parser in the builder style: let user = Parser(User.init(id:name:role:)) { zeroOrMoreSpaces Int.parser() zeroOrMoreSpaces "," zeroOrMoreSpaces Prefix { $0 != "," }.map(String.init) zeroOrMoreSpaces "," zeroOrMoreSpaces role zeroOrMoreSpaces }

— 23:19

It would require an overload of 11 arguments. Even worse, we would need many, many overloads that take 11 arguments in order to support every combination of taking or skipping an output value.

— 23:33

We’d need one overload for when all the outputs are non- Void . Then one overload for when just the first input is Void . And another for when just the second is Void . And on and on and on until we finally define an overload that wants to run 11 Void parsers.

— 23:55

That means we would need 2^11=2,048 overloads to support running 11 parsers and being able to automatically discard any of the ones that have Void values.

— 24:10

More generally, this means that in order to support running exactly n parsers on an input string we need to define 2^ n overloads, one for each combination of taking an output or skipping a Void . Further, to support running any number of parsers up to n parsers we need 2^ n + 2^( n -1) + 2^( n -2) + … 2^1 + 2^0 overloads, which equals 2^( n +1)-1.

— 24:52

So if we wanted to support a conservative number of parsers, say just 6 or fewer, it would still require 127 overloads!

— 25:08

It’s worth noting that this is somewhat similar to what SwiftUI does to support multiple views in a single view builder closure: static func buildBlock<C0, C1, C2, C3, C4, C5, C6, C7, C8, C9>( _ c0: C0, _ c1: C1, _ c2: C2, _ c3: C3, _ c4: C4, _ c5: C5, _ c6: C6, _ c7: C7, _ c8: C8, _ c9: C9 ) -> TupleView<(C0, C1, C2, C3, C4, C5, C6, C7, C8, C9)> where C0: View, C1: View, C2: View, C3: View, C4: View, C5: View, C6: View, C7: View, C8: View, C9: View

— 25:21

These overloads are defined for up to 10 views, and after that you will get compiler errors. For example: VStack { Text("") Text("") Text("") Text("") Text("") Text("") Text("") Text("") Text("") Text("") Text("") } Extra argument in call

— 25:48

However, SwiftUI does not have to define multiple overloads for each arity that it supports. They simply define one overload for each arity and that’s it. We, on the other hand, need many, many overloads for each arity, which is why we thought this approach was so untenable.

— 26:18

Well, then something happened recently.

— 26:21

Apple started sharing some of their explorations for future directions of Swift’s “string processing” capabilities, including regular expressions. One of their examples shows off a fancy builder syntax for creating a regular expression. We were quite surprised when we saw this because we weren’t sure how they were going to get around the exponential explosion of overloads in order to avoid Void values.

— 26:49

Well, turns out they didn’t avoid it, they just generated a gigantic Swift source file to deal with it.

— 27:17

We think this generated file is just a stopgap until Swift gains the full power of variadic generics, but either way, it got us wondering whether generated source code would be a good fit for our parser library, and then we could just rip them out once variadic generics arrive in Swift. We were most worried about compile times, but we do know that the Swift compiler team has greatly improved the performance of result builders. Our early experiments were extremely promising, so we went all in and we think it is the future of the library. Next time: result builders

— 27:48

Before diving in, let’s first discuss result builders from first principles so that we all know how to leverage them to their full potential, and then we will see how these ideas apply to parsers. We will show that result builders go far beyond just simple ergonomics for our parsing library. They unlock all new forms of APIs that can be really, really powerful.

— 28:09

So, let’s get started…next time! References Collection: Parsing Brandon Williams & Stephen Celis Note Parsing is a surprisingly ubiquitous problem in programming. Every time we construct an integer or a URL from a string, we are technically doing parsing. After demonstrating the many types of parsing that Apple gives us access to, we will take a step back and define the essence of parsing in a single type. That type supports many wonderful types of compositions, and allows us to break large, complex parsing problems into small, understandable units. https://www.pointfree.co/collections/parsing Swift Parsing Brandon Williams & Stephen Celis • Dec 21, 2021 A library for turning nebulous data into well-structured data, with a focus on composition, performance, generality, and invertibility. https://github.com/pointfreeco/swift-parsing Declarative String Processing Alex Alonso, Nate Cook, Michael Ilseman, Kyle Macomber, Becca Royal-Gordon, Tim Vermeulen, and Richard Wei • Sep 29, 2021 The Swift core team’s proposal and experimental repository for declarative string processing, which includes result builder syntax for creating regular expressions, and inspired us to explore result builders for parsing. https://github.com/apple/swift-experimental-string-processing The Many Faces of Map Brandon Williams & Stephen Celis • Apr 23, 2018 Note Why does the map function appear in every programming language supporting “functional” concepts? And why does Swift have two map functions? We will answer these questions and show that map has many universal properties, and is in some sense unique. https://www.pointfree.co/episodes/ep13-the-many-faces-of-map Downloads Sample code 0173-parser-builders-pt1 Point-Free A hub for advanced Swift programming. Brought to you by Brandon Williams and Stephen Celis . Content Become a member The Point-Free Way Beta previews Gifts Videos Collections Free clips Blog More About Us Community Slack Mastodon Twitter BlueSky GitHub Contact Us Privacy Policy © 2026 Point-Free, Inc. All rights are reserved for the videos and transcripts on this site. All other content is licensed under CC BY-NC-SA 4.0 , and the underlying source code to run this site is licensed under the MIT License .