Video #63: Parser Combinators: Part 2

Episode: Video #63 Date: Jul 1, 2019 Access: Members Only 🔒 URL: https://www.pointfree.co/episodes/ep63-parser-combinators-part-2

Description

Let’s solve another common parsing problem using parser combinators! It’s common to want to parse multiple values off a string, and while zip gets us part of the way there, it doesn’t let us parse any number of values! Luckily there’s a parser combinator that can help, and it really packs a punch.

Video

Cloudflare Stream video ID: eef89877ff760670d690833c4c740fc2 Local file: video_63_parser-combinators-part-2.mp4 *(download with --video 63)*

References

Transcript

— 0:05

The prefix function is an example of a “parser combinator”: it’s a function that produces brand new parsers out of thin air given a little configuration. This prefix parser combinator unlocks some extra power because we can scan off as many characters as we want off the beginning of a string, based on a predicate, and then do something later with it.

— 0:34

Although this is the first time we’ve called something a “parser combinator,” it’s not the first parser combinator that we’ve seen: map , zip , and flatMap are all parser combinators, because they’re functions that take parsers as input and produce parsers as output. Even the literal parser is a parser combinator because it takes some configuration up front (a literal string to parse) and returns a brand new parser that uses that configuration.

— 0:59

We’re beginning to see how parser combinators can enter our code and clean up things in the process: they allow us to solve more and more precise parsing problems by moving the lower-level, trickier parsing logic to these combinators. And then we can use them to build more specific kinds of parsers.

— 1:17

And this prefix combinator was handy enough to clean up the oneOrMoreSpaces and zeroOrMoreSpaces parsers, but it’ll definitely help us build plenty more parsers in the future. In fact, that kind of prefix checking is being done in a bunch of the parsers we’ve already defined. Parsing multiple values

— 1:37

Let’s explore another problem: so far we have no way of parsing any number of values off of a string. For example, we can parse an integer, or a double, or even a coordinate off a string, but we have no way of parsing a list of integers, or a list of doubles, or a list of coordinates. This is a common problem, and another one that can’t be solved using map , flatMap , and zip alone.

— 2:12

Let’s say we have a comma-separated list of charges that we wanted to parse into first-class values: "€42,£42,$42"

— 2:23

We previously wrote a money parser that was capable of doing the underlying work of parsing a monetary value. We represented Currency in an enum and wrote a matching currency parser that could parse a currency symbol off a string. enum Currency { case eur, gbp, usd } let currency = char.flatMap { $0 == "€" ? always(Currency.eur) : $0 == "£" ? always(.gbp) : $0 == "$" ? always(.usd) : .never } // Parser<Currency>

— 2:47

And we had a Money struct that bundled this up with a Double value which we parsed using a money parser that simply zipped the currency and double parsers together before mapping them to the Money struct’s initializer. struct Money { let currency: Currency let value: Double } let money = zip(currency, double).map(Money.init) // Parser<Money>

— 3:00

And with this defined, we could parse monetary values off of strings. money.run("€42,£42,$42") // ({eur, value 42}, rest ",£42,$42")

— 3:09

But while we can parse a single value off a string, there’s no way for us to parse multiple values off a string.

— 3:23

zip is powerful enough to precisely parse a set number of monetary values: zip( money, literal(","), money, literal(","), money ) .run("€42,£42,$42") // ( // match ( // {eur, value 42}, (), {gbp, value 42}, (), {usd, value 42} // ), // rest "" // )

— 4:14

But there’s no way for us to use it to build a parser that can parse any number of monetary values. If another value were added to the string: zip( money, literal(","), money, literal(","), money ) .run("€42,£42,$42,$42") // ( // match ( // {eur, value 42}, (), // {gbp, value 42}, (), // {usd, value 42} // ), // rest ",$42" // )

— 4:27

It’s not going to be parsed. What’s worse is if we had one fewer value: zip( money, literal(","), money, literal(","), money ) .run("€42,£42") // (nil, rest "€42,£42")

— 4:33

Parsing fails entirely.

— 4:34

What we really want is a way of parsing zero or more values off a string given a parser of values, something that zip is incapable of.

— 4:43

Let’s try defining a combinator to do just that. func zeroOrMore<A>(_ p: Parser<A>) -> Parser<[A]> { } We can call it zeroOrMore because it will parse zero or more values using a given parser. It returns a whole new parser of arrays, which will collect all of the successful results of running the given parser.

— 5:09

In the body of the function we can start by returning a Parser<[A]> and opening up its initializer, where we have access to the in-out substring we’re parsing. func zeroOrMore<A>(_ p: Parser<A>) -> Parser<[A]> { return Parser<[A]> { str in } }

— 5:21

In this closure we want to return an array of A s representing zero or more matches as a result of running the given parser.

— 5:24

First, let’s introduce an empty, mutable array that will hold onto these matches. func zeroOrMore<A>(_ p: Parser<A>) -> Parser<[A]> { return Parser<[A]> { str in var matches: [A] = [] } }

— 5:34

Running our parser p returns an optional match. func zeroOrMore<A>(_ p: Parser<A>) -> Parser<[A]> { return Parser<[A]> { str in var matches: [A] = [] let match = p.run(&str) } }

— 5:45

Which needs to be unwrapped before appending it to our matches. func zeroOrMore<A>(_ p: Parser<A>) -> Parser<[A]> { return Parser<[A]> { str in var matches: [A] = [] if let match = p.run(&str) { matches.append(match) } } }

— 5:54

But we want to append as many successful matches as we can to this array, so instead of an if let we can use a while let . func zeroOrMore<A>(_ p: Parser<A>) -> Parser<[A]> { return Parser<[A]> { str in var matches: [A] = [] while let match = p.run(&str) { matches.append(match) } } }

— 6:07

Finally, we return that array of matches. func zeroOrMore<A>(_ p: Parser<A>) -> Parser<[A]> { return Parser<[A]> { str in var matches: [A] = [] while let match = p.run(&str) { matches.append(match) } return matches } }

— 6:12

Things are building, but how can we use this?

— 6:13

If we take our existing money parser and hand it to our zeroOrMore combinator, we get a brand new parser of arrays of money: zeroOrMore(money) // Parser<[Money]>

— 6:22

But this doesn’t do quite what we want. zeroOrMore(money).run("€42,£42,$42") // (match [{...}], rest ",£42,$42")

— 6:30

The playground makes this tough to see, so let’s access the match and rest of the string directly. zeroOrMore(money).run("€42,£42,$42").match // [{eur, value 42}] zeroOrMore(money).run("€42,£42,$42").rest // ",£42,$42"

— 6:41

It bundled up the first match into an array, but it doesn’t know what to do with the commas in between. As written, it expects an uninterrupted stream of money values. zeroOrMore(money).run("€42£42$42").match // [{eur, value 42}, {gbp, value 42}, {usd, value 42}] Parsing delimited values

— 7:05

A lot of the time when you want to parse multiple values out of a string there’s some kind of separator between each value. For example, a CSV not only separates each column with a comma, but separates each row with a newline.

— 7:15

Our list of charges is kind of like a CSV: a comma separates each amount. It’d be nice to bake this kind of separation directly into the zeroOrMore parser. Something like: zeroOrMore(money, separatedBy: ",")

— 7:31

Let’s introduce this new argument, separatedBy . We could give it a character, or maybe a string, that we want to skip between each match. func zeroOrMore<A>( _ p: Parser<A>, separatedBy s: String ) -> Parser<[A]> { return Parser<[A]> { str in var matches: [A] = [] while let match = p.run(&str) { matches.append(match) } return matches } }

— 7:47

How do we parse out this separator? After each successful match, we could manually check for the separator and then call removeFirst to consume it… func zeroOrMore<A>( _ p: Parser<A>, separatedBy s: String ) -> Parser<[A]> { return Parser<[A]> { str in var matches: [A] = [] while let match = p.run(&str) { matches.append(match) if str.hasPrefix(s) { str.removeFirst(s.count) } } return matches } }

— 8:11

Otherwise, we want to bail out early with the current list of matches. func zeroOrMore<A>( _ p: Parser<A>, separatedBy s: String ) -> Parser<[A]> { return Parser<[A]> { str in var matches: [A] = [] while let match = p.run(&str) { matches.append(match) if str.hasPrefix(s) { str.removeFirst(s.count) } else { return matches } } return matches } }

— 8:22

Alright this is getting a little complicated, so let’s maybe give it a spin to see where we’re at. zeroOrMore(money, separatedBy: ",").run("€42,£42,$42").match // [{eur, value 42}, {gbp, value 42}, {usd, value 42}] zeroOrMore(money, separatedBy: ",").run("€42,£42,$42").rest // "" A backtracking bug

— 8:50

Okay! This is looking really good! But we still have a bug to fix, though, and it’s a subtle one: in addition to consuming separators in between each value, we also consume a trailing separator, should one exist. zeroOrMore(money, separatedBy: ",").run("€42,£42,$42,").rest // ""

— 9:06

Every time the zeroOrMore parser successfully parses a value, it immediately consumes any separator that follows it, even when there are no more values left to be parsed. Our parser should only consume the characters that separate each parsed value. Otherwise we should leave the rest of the substring intact.

— 9:23

We encountered a similar problem when we implemented flatMap (and zip ): flatMap gave us the ability to chain the result of one parser into another parser to produce a brand new parser, but we needed to account for the case where the first parser succeeds and the second parser fails. In such a case, the entire parser should fail and we should not consume any of the substring, which meant we needed to roll back any work the first parser may have done. What we want to do here is the same: when we fail to parse an additional value, we want to roll back any work we’ve done since we parsed the last value.

— 9:56

We can introduce a new variable that keeps track of the rest of the substring after each successful match. We’ll introduce it outside of the loop. func zeroOrMore<A>( _ p: Parser<A>, separatedBy s: String ) -> Parser<[A]> { return Parser<[A]> { str in var rest = str

— 10:04

And we’ll reassign it after each successful match. func zeroOrMore<A>( _ p: Parser<A>, separatedBy s: String ) -> Parser<[A]> { return Parser<[A]> { str in var rest = str var matches: [A] = [] while let match = p.run(&str) { rest = str

— 10:13

Finally, we need to restore the mutable substring to this value outside of the while loop. This will effectively undo the parsing of any work done between a successful match and failure, including the accidental parsing of a trailing separator. func zeroOrMore<A>( _ p: Parser<A>, separatedBy s: String ) -> Parser<[A]> { return Parser<[A]> { str in var rest = str var matches: [A] = [] while let match = p.run(&str) { rest = str matches.append(match) if str.hasPrefix(s) { str.removeFirst(s.count) } else { return matches } } str = rest return matches } }

— 10:35

This parser is getting quite complicated, but let’s run it to make sure it’s working. zeroOrMore(money, separatedBy: ",").run("€42,£42,$42,").rest // "," Yup! We no longer consume the trailing separator.

— 10:54

We start with a parser of single values and, with a single line of code, promoted it to a parser of multiple values separated by a given string. That line is packing a punch! Parsers on parsers on parsers

— 11:09

Alright, I think we’re seeing just how powerful these functions that work on existing parsers are!

— 11:19

Yup, we have another “parser combinator” on our hands, but before we go any further, there’s a little bit of cleanup that would be nice to do. The body is getting pretty long and complex, and there’s some logic happening inside that isn’t super turned to just parsing using the given parser, but has to do a bunch of prefix logic and manual string index manipulation. I think we can maybe leverage other parsers that we have in order to clean up that work, and maybe we’ll get a nicer API out of it.

— 11:52

The zeroOrMore combinator manually checks the substring for a separator, and then manually removes it, which sure feels familiar. And it should, because it’s the exact same logic as our literal parser: func literal(_ p: String) -> Parser<()> { return Parser<Void> { str in guard str.hasPrefix(p) else { return nil } str.removeFirst(p.count) return () } }

— 12:15

We should be able to reuse this unit of work instead of manually testing and consuming separators. func zeroOrMore<A>( _ p: Parser<A>, separatedBy s: String ) -> Parser<[A]> { return Parser<[A]> { str in var original = str var matches: [A] = [] while let match = p.run(&str) { original = str matches.append(match) if literal(s).run(&str) == nil { return matches } } str = original return matches } }

— 12:51

This is nicer than before, but creating a literal parser inside zeroOrMore begs the question: why aren’t we passing a parser as input instead? That would, after all, allow us to be even more flexible in defining what a separator is. func zeroOrMore<A>( _ p: Parser<A>, separatedBy s: Parser<Void> ) -> Parser<[A]> { return Parser<[A]> { str in var original = str var matches: [A] = [] while let match = p.run(&str) { original = str matches.append(match) if s.run(&str) == nil else { return matches } } str = original return matches } } We want to pass in a parser of Void because we don’t care about the value being parsed, just that it is parsed.

— 14:08

And we’ve gotten rid of a bunch of prefix logic and removeFirst index logic in the process. This stuff is always very subtle and easy to get wrong, so the more we can lean on existing parsers, the better.

— 14:20

We broke our earlier call, but all we need to do is use the literal parser explicitly. zeroOrMore(money, separatedBy: literal(",")).run("€42,£42,$42,") // ( // match [ // {eur, value 42}, // {gbp, value 42}, // {usd, value 42} // ], // rest "," // )

— 14:38

It all still works exactly as before, and exactly as we expect.

— 14:45

But now that we can control the parser being passed to zeroOrMore , we’re no longer restricted to literal parser and we can create even more flexible, powerful parsers! For example, maybe the format we’re parsing delimits by either comma or newline. We could cook up a commaOrNewline parser. let commaOrNewline = char .flatMap { $0 == "," || $0 == "\n" ? always(()) : .never }) // Parser<Void>

— 15:19

This is a parser that consumes a single character, so long as that character is a comma or newline.

— 15:27

We can feed this alongside the money parser to zeroOrMore . zeroOrMore(money, separatedBy: commaOrNewline) // Parser<[Money]>

— 15:45

And we get a much more powerful parser that can run against a more complex format. For example, a bunch of newlines with additional values: zeroOrMore(money, separatedBy: commaOrNewline).run( """ €42,£42,$42 €42,£42,$42 €42,£42,$42 """ ) .match.count // 9 And we’ve parsed them all!

— 16:09

Further if we introduce an invalid format, like a Bitcoin value. zeroOrMore(money, separatedBy: commaOrNewline).run( """ €42,£42,$42 €42,£42,$42 €42,£42,$42,฿10 """ ) .rest // ",฿10" We still parse all nine valid values, but parsing stops short of consuming the Bitcoin.

— 16:36

We’re now seeing the ability to consume very, very large strings. zeroOrMore(money, separatedBy: commaOrNewline).run( """ €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,€42,£42,$42,€42,£42,$42,€42,£42,$42 €42,£42,$42,฿10 """ ) .count // 168

— 16:53

We were able to parse out 168 values using this single zeroOrMore combinator in a single line of code that asks for zero or more money values, separated by commas or newlines!

— 17:11

The zeroOrMore parser combinator is now a function that takes two parsers as input and returns a third parser as output. This higher-order parser is unlocking the ability to parse gigantic strings with very work. This is starting to seem extremely powerful and maybe we’re starting to see “the point” of parser combinators. Next time: parsing with multiple parsers

— 17:35

We have one more problem to solve that kind of flips the previous problem around. Instead of worrying about parsing multiple values with a given parser, we want to try to parse a single value given multiple parsers. References Combinators Daniel Steinberg • Sep 14, 2018 Daniel gives a wonderful overview of how the idea of “combinators” infiltrates many common programming tasks. Note Just as with OO, one of the keys to a functional style of programming is to write very small bits of functionality that can be combined to create powerful results. The glue that combines the small bits are called Combinators. In this talk we’ll motivate the topic with a look at Swift Sets before moving on to infinite sets, random number generators, parser combinators, and Peter Henderson’s Picture Language. Combinators allow you to provide APIs that are friendly to non-functional programmers. https://vimeo.com/290272240 Parser Combinators in Swift Yasuhiro Inami • May 2, 2016 In the first ever try! Swift conference, Yasuhiro Inami gives a broad overview of parsers and parser combinators, and shows how they can accomplish very complex parsing. Note Parser combinators are one of the most awesome functional techniques for parsing strings into trees, like constructing JSON. In this talk from try! Swift, Yasuhiro Inami describes how they work by combining small parsers together to form more complex and practical ones. https://academy.realm.io/posts/tryswift-yasuhiro-inami-parser-combinator/ Regex Alexander Grebenyuk • Aug 10, 2019 This library for parsing regular expression strings into a Swift data type uses many of the ideas developed in our series of episodes on parsers. It’s a great example of how to break a very large, complex problem into many tiny parsers that glue back together. https://github.com/kean/Regex Regexes vs Combinatorial Parsing Soroush Khanlou • Dec 3, 2019 In this article, Soroush Khanlou applies parser combinators to a real world problem: parsing notation for a music app. He found that parser combinators improved on regular expressions not only in readability, but in performance! http://khanlou.com/2019/12/regex-vs-combinatorial-parsing/ Learning Parser Combinators With Rust Bodil Stokke • Apr 18, 2019 A wonderful article that explains parser combinators from start to finish. The article assumes you are already familiar with Rust, but it is possible to look past the syntax and see that there are many shapes in the code that are similar to what we have covered in our episodes on parsers. https://bodil.lol/parser-combinators/ Sparse John Patrick Morgan • Jan 12, 2017 A parser library built in Swift that uses many of the concepts we cover in our series of episodes on parsers. Note Sparse is a simple parser-combinator library written in Swift. https://github.com/johnpatrickmorgan/Sparse parsec Daan Leijen, Paolo Martini, Antoine Latter Parsec is one of the first and most widely used parsing libraries, built in Haskell. It’s built on many of the same ideas we have covered in our series of episodes on parsers, but using some of Haskell’s most powerful type-level features. http://hackage.haskell.org/package/parsec Parse, don’t validate Alexis King • Nov 5, 2019 This article demonstrates that parsing can be a great alternative to validating. When validating you often check for certain requirements of your values, but don’t have any record of that check in your types. Whereas parsing allows you to upgrade the types to something more restrictive so that you cannot misuse the value later on. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ Ledger Mac App: Parsing Techniques Chris Eidhof & Florian Kugler • Aug 26, 2016 In this free episode of Swift talk, Chris and Florian discuss various techniques for parsing strings as a means to process a ledger file. It contains a good overview of various parsing techniques, including parser grammars. https://talk.objc.io/episodes/S01E13-parsing-techniques Downloads Sample code 0063-parser-combinators-pt2 Point-Free A hub for advanced Swift programming. Brought to you by Brandon Williams and Stephen Celis . Content Become a member The Point-Free Way Beta previews Gifts Videos Collections Free clips Blog More About Us Community Slack Mastodon Twitter BlueSky GitHub Contact Us Privacy Policy © 2026 Point-Free, Inc. All rights are reserved for the videos and transcripts on this site. All other content is licensed under CC BY-NC-SA 4.0 , and the underlying source code to run this site is licensed under the MIT License .