Video #62: Parser Combinators: Part 1

Episode: Video #62 Date: Jun 24, 2019 Access: Members Only 🔒 URL: https://www.pointfree.co/episodes/ep62-parser-combinators-part-1

Description

Even though map, flatMap and zip pack a punch, there are still many parsing operations that can’t be done using them alone. This is where “parser combinators” come into play. Let’s look at a few common parsing problems and solve them using parser combinators!

Video

Cloudflare Stream video ID: a4ce3da7d89f3d914a784b802f143e5a Local file: video_62_parser-combinators-part-1.mp4 *(download with --video 62)*

References

Transcript

— 0:05

In the last three episodes of Point-Free we explored the compositional properties of parsers by defining map , flatMap and zip operations on them. Each operation carried with it precise semantic meaning, and that meaning is shared with many other types such as arrays, optionals, results, promises and more, and each one was a little more powerful than the last.

— 0:22

With those three operations we were able to cook up complex parsers by piecing together lots of tiny, easy-to-understand parsers. The example we developed was that of a latitude/longitude coordinate parser, which was built from many small parsers. But even though these three operations pack a punch, there are still some things that they cannot accomplish. For example, we cannot currently parse any number of values from some input string, like say a string that has a bunch of coordinates that are separated by newlines. Nor can we attempt to run a bunch of parsers against an input string till one succeeds, say if we support more than one format for a data type. Doing so with map , flatMap , and zip alone is just not possible, so we need to figure out a way to take our parsers to the next level.

— 1:15

The key to this leveling up is none other than functions. Just plain functions. It’s an idea we’ve seen time and time again on Point-Free. By using functions that return parsers as output, or even better, take parsers as input and return parsers as output, we will unlock a whole new world of possibilities with our parsers. These functions are called “parser combinators”, but they could maybe even be called “higher-order parsers” to draw an analogy with “higher-order functions”. Recap

— 1:50

Let’s start by reminding ourselves what we accomplished last time. If you haven’t watched the rest of our series on parsing, or if it’s been awhile, we recommend going back to earlier episodes for a refresher.

— 2:07

The core part of our library so far is the Parser type, which wraps a function that concisely and efficiently describes how to parse a value off of a string. struct Parser<A> { let run: (inout Substring) -> A? }

— 2:22

With this definition we were able to define a bunch of parsers from scratch, including parsers of integers, doubles, characters, and even string literals. let int = Parser<Int> { str in let prefix = str.prefix(while: { $0.isNumber }) let match = Int(prefix) str.removeFirst(prefix.count) return match } let double = Parser<Double> { str in let prefix = str.prefix(while: { $0.isNumber || $0 == "." }) let match = Double(prefix) str.removeFirst(prefix.count) return match } let char = Parser<Character> { str in guard !str.isEmpty else { return nil } return str.removeFirst() } func literal(_ p: String) -> Parser<Void> { return Parser<Void> { str in guard str.hasPrefix(p) else { return nil } str.removeFirst(p.count) return () } }

— 2:31

We also defined always and never parsers, which always succeed or always fail. They seemed strange at first but we found some great uses for them later on when using flatMap . func always<A>(_ a: A) -> Parser<A> { return Parser<A> { _ in a } } extension Parser { static var never: Parser { return Parser { _ in nil } } }

— 2:48

Speaking of flatMap , the Parser type has the map , flatMap , and zip functions we know and love. extension Parser { func map(_ f: @escaping (A) -> B) -> Parser { return Parser { str -> B? in self.run(&str).map(f) } } func flatMap(_ f: @escaping (A) -> Parser) -> Parser { return Parser { str -> B? in let original = str let matchA = self.run(&str) let parserB = matchA.map(f) guard let matchB = parserB?.run(&str) else { str = original return nil } return matchB } } } func zip<A, B>(_ a: Parser<A>, _ b: Parser) -> Parser<(A, B)> { return Parser<(A, B)> { str -> (A, B)? in let original = str guard let matchA = a.run(&str) else { return nil } guard let matchB = b.run(&str) else { str = original return nil } return (matchA, matchB) } }

— 2:53

And we defined a lot of zip s to handle cases where we want to combine more and more parsers together. func zip<A, B, C>( _ a: Parser<A>, _ b: Parser, _ c: Parser<C> ) -> Parser<(A, B, C)> { return zip(a, zip(b, c)) .map { a, bc in (a, bc.0, bc.1) } } func zip<A, B, C, D>( _ a: Parser<A>, _ b: Parser, _ c: Parser<C>, _ d: Parser<D> ) -> Parser<(A, B, C, D)> { return zip(a, zip(b, c, d)) .map { a, bcd in (a, bcd.0, bcd.1, bcd.2) } } func zip<A, B, C, D, E>( _ a: Parser<A>, _ b: Parser, _ c: Parser<C>, _ d: Parser<D>, _ e: Parser<E> ) -> Parser<(A, B, C, D, E)> { return zip(a, zip(b, c, d, e)) .map { a, bcde in (a, bcde.0, bcde.1, bcde.2, bcde.3) } } func zip<A, B, C, D, E, F>( _ a: Parser<A>, _ b: Parser, _ c: Parser<C>, _ d: Parser<D>, _ e: Parser<E>, _ f: Parser<F> ) -> Parser<(A, B, C, D, E, F)> { return zip(a, zip(b, c, d, e, f)) .map { a, bcdef in (a, bcdef.0, bcdef.1, bcdef.2, bcdef.3, bcdef.4) } } func zip<A, B, C, D, E, F, G>( _ a: Parser<A>, _ b: Parser, _ c: Parser<C>, _ d: Parser<D>, _ e: Parser<E>, _ f: Parser<F>, _ g: Parser<G> ) -> Parser<(A, B, C, D, E, F, G)> { return zip(a, zip(b, c, d, e, f, g)) .map { a, bcdefg in ( a, bcdefg.0, bcdefg.1, bcdefg.2, bcdefg.3, bcdefg.4, bcdefg.5 ) } }

— 3:03

We also wrote a helper function on Parser that makes it a little bit nicer to use when working in our playground. extension Parser { func run(_ str: String) -> (match: A?, rest: Substring) { var str = str[...] let match = self.run(&str) return (match, str) } }

— 3:16

Finally, here’s that latitude/longitude coordinate parser that we built out of a bunch of smaller units. // 40.446° N, 79.982° W struct Coordinate { let latitude: Double let longitude: Double } let northSouth = char .flatMap { $0 == "N" ? always(1.0) : $0 == "S" ? always(-1) : .never } let eastWest = char .flatMap { $0 == "E" ? always(1.0) : $0 == "W" ? always(-1) : .never } let latitude = zip(double, literal("° "), northSouth) .map { lat, _, latSign in lat * latSign } let longitude = zip(double, literal("° "), eastWest) .map { long, _, longSign in long * longSign } let coord = zip(latitude, literal(", "), longitude) .map { lat, _, long in Coordinate( latitude: lat, longitude: long ) } coord.run("40.446° N, 79.982° W") // (match Coordinate(latitude: 40.446, longitude: -79.982), rest "") More forgiving parsing

— 3:48

Let’s explore one of the problems we mentioned earlier and see what the Parser type has to say about solving it: skipping characters. When parsing a string it’s quite common to ignore certain characters along the way, like spaces, because they don’t always add meaning to the format. A few extra (or even no) spaces shouldn’t impact our ability to parse out a value. Consider the Swift compiler, which parses Swift source code: could you imagine if it failed to parse code that was indented by 2 spaces instead of 4? 👀

— 4:30

Unfortunately, our current parsers are strict in that way. If we deviate just slightly from the expected format, parsing will fail. For example, if we introduce a bit of extra whitespace between the tokens. coord.run("40.446° N, 79.982° W") // (nil, rest "40.446° N, 79.982° W")

— 4:38

Parsing fails, which makes sense because we’re currently baking a single space into its degree and comma literal parsers.

— 4:52

And while this may be the behavior we want, it seems perfectly reasonable to prefer a more forgiving parser: one that doesn’t care about arbitrary whitespace. The information is all still there and valid, after all.

— 5:06

Parsing solutions typically address this very common problem. Many of Foundation’s domain-specific parsers, for example, ignore extra spaces: import Foundation let df = DateFormatter() df.dateStyle = .medium df.date(from: "Jan 29, 2018") // Optional(2018-01-19 05:00:00 +0000) df.date(from: "Jan 29, 2018") // Optional(2018-01-19 05:00:00 +0000)

— 6:03

Regular expressions also offer the ability to skip over multiple spaces succinctly using the * “glob” pattern. try NSRegularExpression(pattern: " *")

— 6:20

The Scanner type even has an aptly-named charactersToBeSkipped property, which makes it super simple to automatically skip over specific characters in a character set. Scanner().charactersToBeSkipped = .whitespaces

— 6:46

So let’s build a new parser that simply consumes one or more spaces from the front of our input string. We don’t really care about how many spaces we’re consuming, just that they’re consumed before we parse anything else, so the parsed value can be Void . let oneOrMoreSpaces = Parser<Void>

— 7:03

Now we just need to initialize our parser with a closure that does the work of consuming all of those spaces. let oneOrMoreSpaces = Parser<Void> { str in let prefix = str.prefix(while: { $0 == " " }) guard !prefix.isEmpty else { return nil } str.removeFirst(prefix.count) return () } First, we grab all of the space characters that we can find, then we make sure there’s at least one space, and finally we consume those spaces.

— 8:13

Let’s take it for a spin. oneOrMoreSpaces.run(" Hello, world!") // (match (), rest "Hello, world!")

— 8:25

Working as expected! It consumes all the leading spaces.

— 8:35

And if we feed it a string that doesn’t start with spaces: oneOrMoreSpaces.run("Hello World") // (nil, rest "Hello World")

— 8:40

It fails to parse.

— 8:42

How can we use this parser to make our coordinate parser more forgiving? Currently, the latitude parser bakes a space into its degree literal parser. We can instead split this work with the oneOrMoreSpaces parser. This adds another argument to zip , so we’ll need to update our call to map to ignore the additional Void value. let latitude = zip( double, literal("°"), oneOrMoreSpaces, northSouth ) .map { lat, _, _, latSign in lat * latSign }

— 9:10

This reads really nicely, too! To parse a latitude is to parse a double, followed by a literal degree sign, followed by one or more spaces, followed by a north or south cardinal direction.

— 9:22

Now let’s update longitude with the same changes. let longitude = zip( double, literal("°"), oneOrMoreSpaces, eastWest ) .map { long, _, _, longSign in long * longSign }

— 9:30

And we can update the coordinate parser’s comma literal parser in the same fashion. let coord = zip( latitude, literal(","), oneOrMoreSpaces, longitude ) .map { lat, _, _, long in Coordinate( latitude: lat, longitude: long ) }

— 9:36

When we take things for a spin: coord.run("40.446° N, 79.982° W") // (match Coordinate(latitude: 40.446, longitude: -79.982), rest "")

— 9:39

It’s much more lenient!

— 9:45

We can even introduce trailing whitespace. coord.run("40.446° N, 79.982° W ") // ( // match Coordinate(latitude: 40.446, longitude: -79.982), // rest " " // )

— 9:50

Parsing succeeds, and we end up with some additional spaces that the next parser could come along and parse.

— 9:58

But if we introduce some leading whitespace, parsing will fail. coord.run(" 40.446° N, 79.982° W ") // (nil, rest " 40.446° N, 79.982° W ")

— 10:09

Which, again, seems needlessly strict. DateFormatter ignores leading spaces just fine. df.date(from: " Jan 29, 2018") // Optional(2018-01-19 05:00:00 +0000)

— 10:32

We can’t merely reuse oneOrMoreSpaces because it would require at least one leading space. Instead we can define another parser that parses zero or more spaces off the beginning of a string: let zeroOrMoreSpaces = Parser<Void> { str in let prefix = str.prefix(while: { $0 == " " }) str.removeFirst(prefix.count) return () } It’s almost exactly the same as our oneOrMoreSpaces parser, except we don’t have to check to see if the match is empty with a guard . In fact, this parser will never fail.

— 10:53

Now we can introduce zeroOrMoreSpaces to the beginning of the parser. let coord = zip( zeroOrMoreSpaces, latitude, literal(","), zeroOrMoreSpaces, longitude ) .map { _, lat, _, _, long in Coordinate( latitude: lat, longitude: long ) }

— 11:07

And it succeeds, even with a bunch of leading spaces. coord.run(" 40.446° N, 79.982° W ") // ( // match Coordinate(latitude: 40.446, longitude: -79.982), // rest " " // ) Extracting common parsers

— 11:17

The zeroOrMoreSpaces and oneOrMoreSpaces parsers are super handy and let us be very precise and flexible in how we go about parsing complex string formats, but we built them in a pretty naive, clunky manner. We’ve talked about how transformable and composable the Parser type is, but right now we’ve built these parsers from scratch. Any time we invoke the Parser initializer it means we’re responsible for working with that substring and mutating it in just the right way. Whenever we find ourselves implementing parsers this way, we should ask ourselves: is there is a lower-level operation out there that could hide away this complexity?

— 12:19

If we look at zeroOrMoreSpaces and oneOrMoreSpaces next to each other, it’s pretty clear how much logic they share. let zeroOrMoreSpaces = Parser<Void> { str in let prefix = str.prefix(while: { $0 == " " }) str.removeFirst(prefix.count) return () } let oneOrMoreSpaces = Parser<Void> { str in let prefix = str.prefix(while: { $0 == " " }) guard !prefix.isEmpty else { return nil } str.removeFirst(prefix.count) return () } Each parser tests a string for a prefix and removes that prefix if it exists. Maybe we can come up with a common parser that we can use to derive all these other parsers from, minimizing surface area and eliminating potential mistakes in the process.

— 12:34

The common bit of code in each of these parsers is a call to prefix(while:) with a predicate function, so maybe we could cook up a parser that does this work.

— 12:53

We can call it prefix(while:) for some nice symmetry with the Swift standard library, and because we can think of it as producing parsers that parse a prefix of characters while the predicate function returns true. func prefix(while p: (Character) -> Bool) -> Parser<> { }

— 13:09

What kind of parser is it, though? We used Void for zeroOrMoreSpaces and oneOrMoreSpaces , but we may care about the string that was matched, so let’s not discard that information. func prefix(while p: (Character) -> Bool) -> Parser<String> { }

— 13:25

To implement this function we need to return a Parser and do that bit of work again. func prefix(while p: (Character) -> Bool) -> Parser<String> { return Parser<String> { str in let prefix = str.prefix(while: p) str.removeFirst(prefix.count) Closure use of non-escaping parameter ‘p’ may allow it to escape

— 13:49

We need to mark p as @escaping since our new parser captures it. func prefix( while p: @escaping (Character) -> Bool ) -> Parser<String> { return Parser<String> { str in let prefix = str.prefix(while: p) str.removeFirst(prefix.count) Missing return in a closure expected to return ‘String?’

— 13:55

We’re still not returning anything, so let’s return the prefix to let the parser know what was parsed. func prefix( while p: @escaping (Character) -> Bool ) -> Parser<String> { return Parser<String> { str in let prefix = str.prefix(while: p) str.removeFirst(prefix.count) return prefix } } Cannot convert return expression of type ‘Substring.SubSequence’ (aka ‘Substring’) to return type ‘String?’

— 14:06

Oh right, prefix returns a Substring , so we can’t return it directly.

— 14:13

Instead of converting it to a String , which makes a copy of the underlying memory, let’s keep things efficient and make this a Substring parser instead. func prefix( while p: @escaping (Character) -> Bool ) -> Parser<Substring> { return Parser<Substring> { str in let prefix = str.prefix(while: p) str.removeFirst(prefix.count) return prefix } }

— 14:53

Okay! We’ve now generalized this work in a single place and we should be able to rewrite our other parsers in terms of the prefix parser. For zeroOrMoreSpaces it’s a matter of checking if character are spaces. let zeroOrMoreSpaces = prefix(while: { $0 == " " }) // Parser<Substring>

— 15:08

While the parsing all still works, we’ve introduced a subtle change. This is now a Parser<Substring> instead of a Parser<Void> . Returning Void from a parser is a nice way of saying that we don’t care about what was parsed, and this comes in handy when we zip a bunch of parsers together: we can quickly ignore all Void values in a map and combine the remaining ones that we care about.

— 15:23

So let’s explicitly map this parser to Void . let zeroOrMoreSpaces = prefix(while: { $0 == " " }) .map { _ in () } // Parser<Void>

— 15:45

What about oneOrMoreSpaces ? We want to start the same way, by grabbing all the spaces off the prefix of a string. let oneOrMoreSpaces = prefix(while: { $0 == " " })

— 15:56

But this time we want to fail if nothing was parsed. To do so, we can use flatMap to inspect the result of the parsing to test if it’s empty. If it’s empty we can chain into the never parser and fail. Otherwise, we can always succeed with a Void value. let oneOrMoreSpaces = prefix(while: { $0 == " " }) .flatMap { $0.isEmpty ? .never : always(()) }

— 16:46

Now this parser has the exact same behavior as the one we defined manually, before, but we’ve baked the tricky logic of extracting and consuming the prefix of a string into its own reusable unit and now can focus on the more specific logic of the parser by map ping or flatMap ping the result.

— 17:15

And everything we defined before still runs and still behaves as before! Our coordinate parsers still behave a expected with their oneOrMoreSpaces and zeroOrMoreSpaces parsers, but we were able to redefine them using a smaller, more precise unit.

— 17:25

The prefix function is an example of a “parser combinator”: it’s a function that produces brand new parsers out of thin air given a little configuration. This prefix parser combinator unlocks some extra power because we can scan off as many characters as we want off the beginning of a string, based on a predicate, and then do something later with it.

— 17:54

Although this is the first time we’ve called something a “parser combinator,” it’s not the first parser combinator that we’ve seen: map , zip , and flatMap are all parser combinators, because they’re functions that take parsers as input and produce parsers as output. Even the literal parser is a parser combinator because it takes some configuration up front (a literal string to parse) and returns a brand new parser that uses that configuration.

— 18:19

We’re beginning to see how parser combinators can enter our code and clean up things in the process: they allow us to solve more and more precise parsing problems by moving the lower-level, trickier parsing logic to these combinators. And then we can use them to build more specific kinds of parsers.

— 18:37

And this prefix combinator was handy enough to clean up the oneOrMoreSpaces and zeroOrMoreSpaces parsers, but it’ll definitely help us build plenty more parsers in the future. In fact, that kind of prefix checking is being done in a bunch of the parsers we’ve already defined. Next time: parsing multiple values

— 18:57

Let’s explore another problem: so far we have no way of parsing any number of values off of a string. For example, we can parse an integer, or a double, or even a coordinate off a string, but we have no way of parsing a list of integers, or a list of doubles, or a list of coordinates. This is a common problem, and another one that can’t be solved using map , flatMap , and zip alone. References Combinators Daniel Steinberg • Sep 14, 2018 Daniel gives a wonderful overview of how the idea of “combinators” infiltrates many common programming tasks. Note Just as with OO, one of the keys to a functional style of programming is to write very small bits of functionality that can be combined to create powerful results. The glue that combines the small bits are called Combinators. In this talk we’ll motivate the topic with a look at Swift Sets before moving on to infinite sets, random number generators, parser combinators, and Peter Henderson’s Picture Language. Combinators allow you to provide APIs that are friendly to non-functional programmers. https://vimeo.com/290272240 Parser Combinators in Swift Yasuhiro Inami • May 2, 2016 In the first ever try! Swift conference, Yasuhiro Inami gives a broad overview of parsers and parser combinators, and shows how they can accomplish very complex parsing. Note Parser combinators are one of the most awesome functional techniques for parsing strings into trees, like constructing JSON. In this talk from try! Swift, Yasuhiro Inami describes how they work by combining small parsers together to form more complex and practical ones. https://academy.realm.io/posts/tryswift-yasuhiro-inami-parser-combinator/ Regex Alexander Grebenyuk • Aug 10, 2019 This library for parsing regular expression strings into a Swift data type uses many of the ideas developed in our series of episodes on parsers. It’s a great example of how to break a very large, complex problem into many tiny parsers that glue back together. https://github.com/kean/Regex Regexes vs Combinatorial Parsing Soroush Khanlou • Dec 3, 2019 In this article, Soroush Khanlou applies parser combinators to a real world problem: parsing notation for a music app. He found that parser combinators improved on regular expressions not only in readability, but in performance! http://khanlou.com/2019/12/regex-vs-combinatorial-parsing/ Learning Parser Combinators With Rust Bodil Stokke • Apr 18, 2019 A wonderful article that explains parser combinators from start to finish. The article assumes you are already familiar with Rust, but it is possible to look past the syntax and see that there are many shapes in the code that are similar to what we have covered in our episodes on parsers. https://bodil.lol/parser-combinators/ Sparse John Patrick Morgan • Jan 12, 2017 A parser library built in Swift that uses many of the concepts we cover in our series of episodes on parsers. Note Sparse is a simple parser-combinator library written in Swift. https://github.com/johnpatrickmorgan/Sparse parsec Daan Leijen, Paolo Martini, Antoine Latter Parsec is one of the first and most widely used parsing libraries, built in Haskell. It’s built on many of the same ideas we have covered in our series of episodes on parsers, but using some of Haskell’s most powerful type-level features. http://hackage.haskell.org/package/parsec Parse, don’t validate Alexis King • Nov 5, 2019 This article demonstrates that parsing can be a great alternative to validating. When validating you often check for certain requirements of your values, but don’t have any record of that check in your types. Whereas parsing allows you to upgrade the types to something more restrictive so that you cannot misuse the value later on. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ Ledger Mac App: Parsing Techniques Chris Eidhof & Florian Kugler • Aug 26, 2016 In this free episode of Swift talk, Chris and Florian discuss various techniques for parsing strings as a means to process a ledger file. It contains a good overview of various parsing techniques, including parser grammars. https://talk.objc.io/episodes/S01E13-parsing-techniques Downloads Sample code 0062-parser-combinators-pt1 Point-Free A hub for advanced Swift programming. Brought to you by Brandon Williams and Stephen Celis . Content Become a member The Point-Free Way Beta previews Gifts Videos Collections Free clips Blog More About Us Community Slack Mastodon Twitter BlueSky GitHub Contact Us Privacy Policy © 2026 Point-Free, Inc. All rights are reserved for the videos and transcripts on this site. All other content is licensed under CC BY-NC-SA 4.0 , and the underlying source code to run this site is licensed under the MIT License .