Video #64: Parser Combinators: Part 3

Episode: Video #64 Date: Jul 8, 2019 Access: Members Only 🔒 URL: https://www.pointfree.co/episodes/ep64-parser-combinators-part-3

Description

Now that we’ve looked at how to parse multiple values given a single parser, let’s try to parse a single value using multiple parsers! And after defining a bunch of these parser combinators we’ll finally be able to ask: “what’s the point!?”

Video

Cloudflare Stream video ID: e50e232ef1ea1678c635f65bef0e786b Local file: video_64_parser-combinators-part-3.mp4 *(download with --video 64)*

References

Transcript

— 0:05

The zeroOrMore parser combinator is now a function that takes two parsers as input and returns a third parser as output. This higher-order parser is unlocking the ability to parse gigantic strings with very work. This is starting to seem extremely powerful and maybe we’re starting to see “the point” of parser combinators. Values from multiple parsers

— 0:30

We have one more problem to solve that kind of flips the previous problem around. Instead of worrying about parsing multiple values with a given parser, we want to try to parse a single value given multiple parsers.

— 0:53

What does it mean to run several parsers against an input string? Why might this be useful? Well, right now we bake in a lot of logic determining whether a parser should succeed in a single parser. But there are many times where we just want to take a bunch of smaller, simpler parsers, and try to parse a value out of a string using each of them till one succeeds.

— 1:13

Let’s make our money example even more complex by introducing an accounting-based format, where negative numbers are wrapped with parentheses. "($19.99)"

— 1:29

Let’s define a brand new parser for this format called moneyLoss . But how are we going to parse this out? Well, we know we want to parse an open parentheses, followed by a money value, followed by a closed parentheses, which seems like the perfect job for zip ! let moneyLoss = zip(literal("("), money, literal(")")) // Parser<(), Money, ()>

— 2:09

For money loss we want to take the monetary value and negate it, which we can do using map ! let moneyLoss = zip(literal("("), money, literal(")")) .map { _, money, _ in Money(currency: money.currency, value: -money.value) } // Parser<Money>

— 2:48

Let’s take it for a spin! moneyLoss.run("($19.00)") // ({use, value -19, rest "")

— 3:08

The moneyLoss parser will only parse out losses. We also want to parse out gains and losses? We can start by writing it from scratch. We know we want it to be a parser of Money , so let’s initialize a brand new Parser<Money> . let accounting = Parser<Money> { str -> Money? in }

— 3:36

In here we want to run our existing parsers. First, we can run the money parser, unwrapping the result upon success, and returning the gain. If that fails, we can run the moneyLoss parser, unwrap the result upon success to return the loss. Otherwise, we can return nil . let accounting = Parser<Money> { str -> Money? in if let match = money.run(&str) { return match } if let match = moneyLoss.run(&str) { return match } return nil } // Parser<Money>

— 4:11

And when we run it, it accounts for both gains and losses. accounting.run("£19.00") // (match {gbp, value 19}, rest "") accounting.run("£19.00") // (match {gbp, value -19}, rest "")

— 4:30

Nice! We were able to combine multiple parsers into a single one and while we did so in an ad hoc way, we never manipulated the string directly: we always delegated to other parsers to do that work for us.

— 4:43

But we’re still defining this unit of work from scratch using the Parser type’s initializer. It’d be nice if we had another combinator that handled things for us. What we want is a combinator that tries to run a parser, and if it fails, tries to run another? We can call it oneOf , because it will attempt to parse a value using one of the given two parsers. func oneOf<A>(_ p1: Parser<A>, _ p2: Parser<A>) -> Parser<A> { }

— 5:37

The work is almost identical to the work we did in our accounting parser, so let’s copy and paste that work and make a few small changes. func oneOf<A>(_ p1: Parser<A>, _ p2: Parser<A>) -> Parser<A> { return Parser<A> { str -> A? in if let match = p1.run(&str) { return match } if let match = p2.run(&str) { return match } return nil } }

— 5:56

Let’s try redefining accounting in terms of oneOf . let accounting = oneOf(money, moneyLoss) // Parser<Money>

— 6:14

This single line packs a big punch! We previously defined this same logic over about nine lines of very imperative code. And everything behaves just as before. Values from even more parsers

— 6:30

So we’ve now defined another higher-order parser: the oneOf combinator. Just like the other ones we’ve introduced, we’ve been able to eliminate a lot of ad hoc work and dramatically simplify the creation of new parsers.

— 6:41

I think we can make it even better, though. Why stop at two parsers? Why not run any number of a given array of parsers instead?

— 7:06

If we revisit the signature of oneOf , there’s something interesting going on. func oneOf<A>( _ p1: Parser<A>, _ p2: Parser<A> ) -> Parser<A> { All of the parsers in the signature are of the same type, A . While zip had a similar signature, it had a parser of A and a parser of B , which pretty much forced the return parser to be a parser of (A, B) . But in the case of oneOf , we have the opportunity to work with any number of parsers instead. func oneOf<A>( _ ps: [Parser<A>] ) -> Parser<A> {

— 7:39

Instead of checking two parsers directly, we can iterate over our array of parsers and keep trying them till one succeeds. func oneOf<A>(_ ps: [Parser<A>]) -> Parser<A> { return Parser<A> { str in for p in ps { if let match = p.run(&str) { return match } } return nil } }

— 8:19

We broke our accounting parser because oneOf works with arrays now, but the fix is easy enough. let accounting = oneOf([money, moneyLoss]) // Parser<Money>

— 8:28

Now that oneOf can take any number of parsers, let’s look to a more compelling example than accounting , which only takes two: the currency parser! let currency = char.flatMap { $0 == "€" ? always(Currency.eur) : $0 == "£" ? always(.gbp) : $0 == "$" ? always(.usd) : .never }

— 8:59

While it’s nice we’re using flatMap and not exposing ourselves to the mutable substring and any weird string index juggling of building a parser from scratch, it’s still a bit messy. We have this nested ternary, that can be difficult to read, and we have to explicitly succeed and fail using always and never .

— 9:14

What if we try to use oneOf instead? Well, rather than manually checking a char , we could create a parser per currency by checking for the literal currency symbol, and then we could map each literal to its Currency value. let currency = oneOf([ literal("€").map { Currency.eur }, literal("£").map { .gbp }, literal("$").map { .usd } ]) // Parser<Currency>

— 9:58

This looks pretty nice because each building block is made from something we understand very well: we understand what literal does and we understand what it means to map on it. There’s no logic here. We’re just more declaratively telling the story of wanting to try a bunch of parsers till one succeeds.

— 10:25

Even though it’s roughly the same number of lines of code, it tells a much better story: we want one of the literal euro, pound, or US dollar parsers.

— 10:37

And everything still continues to work: we’re still parsing monetary values of these currencies and all the downstream work of parsing a bunch of money all at once, and of parsing gains and losses, it all still works. What’s the point?

— 10:57

Alright, we’ve now introduced three parser combinators: we’ve got oneOf , which can parse a value by running an array of given parsers till the first one succeeds. We’ve got zeroOrMore , which attempts to run a given parser against a string as many times as possible, and which also takes a secondary “separator” parser, which attempts to parse out separators between values. And then we had the prefix parser, which we’ve used a ton now, and it attempts to parse as much off the beginning of a string as possible while a predicate succeeds. These are the “higher-order parsers” (or “parser combinators”) we talked about, and we talked about how they unlock the next level of parsing.

— 11:41

However, on Point-Free, whenever we introduce a topic, by the end we like to ask: “what’s the point?” Here we are making a big claim that we can parse very complex string formats, yet so far we’ve parsed some pretty simple, homogeneous things: comma-separated monetary values and other very basic stuff. Do parser combinators really unlock the next level of parsing more complex string formats? Parsing a complex, ad hoc format

— 12:16

In order to answer this question, let’s take a look at a more complex string format and see what we can do.

— 12:25

Let’s look at a pretty complicated format. In our application we may want to parsing a string that represents a bunch of upcoming marathons that our users may want to run in. let upcomingRaces = """ New York City, $300 40.60248° N, 74.06433° W 40.61807° N, 74.02966° W 40.64953° N, 74.00929° W 40.67884° N, 73.98198° W 40.69894° N, 73.95701° W 40.72791° N, 73.95314° W 40.74882° N, 73.94221° W 40.75740° N, 73.95309° W 40.76149° N, 73.96142° W 40.77111° N, 73.95362° W 40.80260° N, 73.93061° W 40.80409° N, 73.92893° W 40.81432° N, 73.93292° W 40.80325° N, 73.94472° W 40.77392° N, 73.96917° W 40.77293° N, 73.97671° W --- Berlin, €100 13.36015° N, 52.51516° E 13.33999° N, 52.51381° E 13.32539° N, 52.51797° E 13.33696° N, 52.52507° E 13.36454° N, 52.52278° E 13.38152° N, 52.52295° E 13.40072° N, 52.52969° E 13.42555° N, 52.51508° E 13.41858° N, 52.49862° E 13.40929° N, 52.48882° E 13.37968° N, 52.49247° E 13.34898° N, 52.48942° E 13.34103° N, 52.47626° E 13.32851° N, 52.47122° E 13.30852° N, 52.46797° E 13.28742° N, 52.47214° E 13.29091° N, 52.48270° E 13.31084° N, 52.49275° E 13.32052° N, 52.50190° E 13.34577° N, 52.50134° E 13.36903° N, 52.50701° E 13.39155° N, 52.51046° E 13.37256° N, 52.51598° E --- London, £500 51.48205° N, 0.04283° E 51.47439° N, 0.02170° E 51.47618° N, 0.02199° E 51.49295° N, 0.05658° E 51.47542° N, 0.03019° E 51.47537° N, 0.03015° E 51.47435° N, 0.03733° E 51.47954° N, 0.04866° E 51.48604° N, 0.06293° E 51.49314° N, 0.06104° E 51.49248° N, 0.04740° E 51.48888° N, 0.03564° E 51.48655° N, 0.01830° E 51.48085° N, 0.02223° W 51.49210° N, 0.04510° W 51.49324° N, 0.04699° W 51.50959° N, 0.05491° W 51.50961° N, 0.05390° W 51.49950° N, 0.01356° W 51.50898° N, 0.02341° W 51.51069° N, 0.04225° W 51.51056° N, 0.04353° W 51.50946° N, 0.07810° W 51.51121° N, 0.09786° W 51.50964° N, 0.11870° W 51.50273° N, 0.13850° W 51.50095° N, 0.12411° W """

— 12:35

It includes a bunch of information: the city in which the marathon takes place, the entrance fee, and a newline-separated list of coordinates for the race’s path. In addition, each marathon is separated by a line of dashes. So let’s parse this string into an application data format.

— 13:08

To solve a problem this complex, it can be nice to work our way backwards from the finish line. In the end what we want is a parser of races. let races: Parser<[Race]>

— 13:27

But we don’t even have a Race data type defined yet, so let’s do just that. struct Race { let location: String let entranceFee: Money let path: [Coordinate] }

— 13:54

The location type is a bit lax. Our application only supports a few locations so far, so let’s define an enum representation enum Location { case nyc, berlin, london }

— 14:06

And we can update our Race struct accordingly. struct Race { let location: Location let entranceFee: Money let path: [Coordinate] }

— 14:08

So how do we build this parser of arrays of races? We’re not quite there yet, so let’s stub things out with the never parser for now. let races: Parser<[Race]> = .never

— 14:18

Instead, what does it mean to parse out a single race? let race: Parser<Race>

— 14:30

Each race is declared with a bunch of information, sequentially, so it seems like we should be able to use zip to list out the information we need using smaller parsers. let race: Parser<Race> = zip( )

— 14:37

The first thing that we want to parse is the location, but we don’t have a location parser yet. It’s going to look a lot like our currency parser. let location = oneOf([ literal("New York City").map { Location.nyc }, literal("Berlin").map { .berlin }, literal("London").map { .london } ]) let race: Parser<Race> = zip( location )

— 15:16

After we parse the location, we need to parse out a literal comma, followed by one or more spaces, before parsing the entrance fee with the money parser. let race: Parser<Race> = zip( location, literal(","), oneOrMoreSpaces, money )

— 15:39

And now we need to parse a newline, followed by a newline-separated list of coordinates, which we can do using the zeroOrMore parser. let race: Parser<Race> = zip( location, literal(","), oneOrMoreSpaces, money, literal("\n"), zeroOrMore(coord, separatedBy: literal("\n") )

— 15:57

This is a parser of a tuple that we need to coax into the shape of our Race struct, which we can do with map . let race: Parser<Race> = zip( location, literal(","), oneOrMoreSpaces, money, literal("\n"), zeroOrMore(coord, separatedBy: literal("\n") ).map { location, _, _, entranceFee, _, path in Race(location: location, entranceFee: entranceFee, path: path) } // Parser<Race>

— 16:36

And just like that we have a race parser, which we plucked out of thin air by describing each component in a row.

— 16:44

Now that we have a parser of single races, we should be able to define our parser of multiple races. let races: Parser<[Race]> = zeroOrMore( race, separatedBy: literal("\n---\n") ) // Parser<[Race]>

— 17:04

And with that, we can parse the raw format into a first-class Swift data type. races.run(upcomingRaces) // ([{...}, {...}, {...}], rest "")

— 17:20

And there we have it! We consumed the entire string and we have an array of three abbreviated values in the playground. Let’s expand the match to inspect things further. races.run(upcomingRaces).match // [ // {nyc, {...}, [...]}, // {berlin, {...}, [...]}, // {london, {...}, [...]} // ]

— 17:31

Those are the three marathons we expect! We can even dump things to the console to see all the information we parsed, because it’s a lot of data and we were able to write a parser for it in a matter of minutes!

— 17:56

We just parsed a ton of data in a completely ad hoc format, and we were able to do so in a very declarative fashion. We didn’t have to do any string index manipulation or mutation: all of that work was hidden away in our parser combinators. Conclusion

— 18:14

It certainly does look like parser combinators unlock the next level of parsing! And this isn’t even the end of the story. We defined just a few of the important parser combinators out there, but there are a ton more that are just waiting to be written and discovered. In fact, whenever you find yourself defining a parser from scratch, it’s worthwhile to consider if there are parser combinators out there that would simplify things and hide away the more complicated, error-prone string index parsing. There’s likely some reusable code hidden in those ad hoc parsers that could become a combinator instead!

— 18:49

Parser combinators help keep us from having to open up parsers directly and manipulate the in-out substring, which can be very subtle and difficult to get right, so what we can have instead is a library of parser combinators that do this work perfectly, and then our more domain-specific parsers can leverage all that work.

— 19:07

Parser combinators are a huge topic within the world of parsing, and we’ve been talking about parsing for awhile now, but we still have a few more things to say. We’re getting close to the point where we might even have an open-source-able library, but it’s not quite right yet. The library could be made a little nicer by leveraging some Swift features to make things more ergonomic, and there are a few things that could make these parsers even more powerful and flexible. Till next time! References Combinators Daniel Steinberg • Sep 14, 2018 Daniel gives a wonderful overview of how the idea of “combinators” infiltrates many common programming tasks. Note Just as with OO, one of the keys to a functional style of programming is to write very small bits of functionality that can be combined to create powerful results. The glue that combines the small bits are called Combinators. In this talk we’ll motivate the topic with a look at Swift Sets before moving on to infinite sets, random number generators, parser combinators, and Peter Henderson’s Picture Language. Combinators allow you to provide APIs that are friendly to non-functional programmers. https://vimeo.com/290272240 Parser Combinators in Swift Yasuhiro Inami • May 2, 2016 In the first ever try! Swift conference, Yasuhiro Inami gives a broad overview of parsers and parser combinators, and shows how they can accomplish very complex parsing. Note Parser combinators are one of the most awesome functional techniques for parsing strings into trees, like constructing JSON. In this talk from try! Swift, Yasuhiro Inami describes how they work by combining small parsers together to form more complex and practical ones. https://academy.realm.io/posts/tryswift-yasuhiro-inami-parser-combinator/ Regex Alexander Grebenyuk • Aug 10, 2019 This library for parsing regular expression strings into a Swift data type uses many of the ideas developed in our series of episodes on parsers. It’s a great example of how to break a very large, complex problem into many tiny parsers that glue back together. https://github.com/kean/Regex Regexes vs Combinatorial Parsing Soroush Khanlou • Dec 3, 2019 In this article, Soroush Khanlou applies parser combinators to a real world problem: parsing notation for a music app. He found that parser combinators improved on regular expressions not only in readability, but in performance! http://khanlou.com/2019/12/regex-vs-combinatorial-parsing/ Learning Parser Combinators With Rust Bodil Stokke • Apr 18, 2019 A wonderful article that explains parser combinators from start to finish. The article assumes you are already familiar with Rust, but it is possible to look past the syntax and see that there are many shapes in the code that are similar to what we have covered in our episodes on parsers. https://bodil.lol/parser-combinators/ Sparse John Patrick Morgan • Jan 12, 2017 A parser library built in Swift that uses many of the concepts we cover in our series of episodes on parsers. Note Sparse is a simple parser-combinator library written in Swift. https://github.com/johnpatrickmorgan/Sparse parsec Daan Leijen, Paolo Martini, Antoine Latter Parsec is one of the first and most widely used parsing libraries, built in Haskell. It’s built on many of the same ideas we have covered in our series of episodes on parsers, but using some of Haskell’s most powerful type-level features. http://hackage.haskell.org/package/parsec Parse, don’t validate Alexis King • Nov 5, 2019 This article demonstrates that parsing can be a great alternative to validating. When validating you often check for certain requirements of your values, but don’t have any record of that check in your types. Whereas parsing allows you to upgrade the types to something more restrictive so that you cannot misuse the value later on. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ Ledger Mac App: Parsing Techniques Chris Eidhof & Florian Kugler • Aug 26, 2016 In this free episode of Swift talk, Chris and Florian discuss various techniques for parsing strings as a means to process a ledger file. It contains a good overview of various parsing techniques, including parser grammars. https://talk.objc.io/episodes/S01E13-parsing-techniques Downloads Sample code 0064-parser-combinators-pt3 Point-Free A hub for advanced Swift programming. Brought to you by Brandon Williams and Stephen Celis . Content Become a member The Point-Free Way Beta previews Gifts Videos Collections Free clips Blog More About Us Community Slack Mastodon Twitter BlueSky GitHub Contact Us Privacy Policy © 2026 Point-Free, Inc. All rights are reserved for the videos and transcripts on this site. All other content is licensed under CC BY-NC-SA 4.0 , and the underlying source code to run this site is licensed under the MIT License .