EP 61 · Composable Parsing · Jun 10, 2019 ·Members

Video #61: Composable Parsing: Zip

smart_display

Loading stream…

Video #61: Composable Parsing: Zip

Episode: Video #61 Date: Jun 10, 2019 Access: Members Only 🔒 URL: https://www.pointfree.co/episodes/ep61-composable-parsing-zip

Episode thumbnail

Description

While flatMap allowed us to take our parser type to the next level, it introduced a nesting problem. Isn’t flatMap all about solving nesting problems!? Well, we have one more operation at our disposal: zip! Let’s define zip on the parser type, see what it brings to the table, and finally ask, “what’s the point?”

Video

Cloudflare Stream video ID: 3b2844d9639c7d7eb4d1e75e23b1a284 Local file: video_61_composable-parsing-zip.mp4 *(download with --video 61)*

References

Transcript

0:05

So, what are we to do? Parsing this string with map and flatMap gets the job done, but it isn’t super nice. Luckily, there’s a third operation that sits right in between map and flatMap . We talked about this very topic over the course of 3 entire episodes on Point-Free ( 1 , 2 , 3 ), and it naturally encompasses the idea of “context independence” computation, and it is none other than zip . It allows you to take multiple generic values and combine them into a single generic value. The Swift standard library defines zip on arrays, but in our previous episodes we showed that it makes sense to define zip on many more types, such as optionals, results, validated values, lazy values, and asynchronous values. Zip on Parser

1:15

Turns out there is yet another type that supports a zip operation, and it’s our parser type!

1:35

Let’s remember what the shape of zip looks like: zip: (F<A>, F<B>) -> F<(A, B)>

1:46

The defining quality of this signature is that it allows you to transform a tuple of F ’s into an F of tuples.

1:51

You can plug in many different generic types for F and you will get a valid statement. For example:

1:55

zip transforms a tuple of arrays into an array of tuples

2:00

zip transforms a tuple of optionals into an optional of tuples

2:05

zip transforms a tuple of results into a result of tuples

2:09

zip transforms a tuple of async values into an async value of tuples

2:14

And finally, in our case here, a tuple of parsers into a parser of tuples.

2:21

So, let’s try implementing this. The signature is simple enough: func zip<A, B>(_ a: Parser<A>, _ b: Parser<B>) -> Parser<(A, B)> { }

2:40

To implement this we know we want to return a parser of an (A, B) tuple, so we can start there: func zip<A, B>(_ a: Parser<A>, _ b: Parser<B>) -> Parser<(A, B)> { return Parser<(A, B)> { str -> (A, B)? in } }

2:56

Inside this parser we just wanna run each parser on the input string, and return the result if both succeed. we can start by running each parser, one after the other. func zip<A, B>(_ a: Parser<A>, _ b: Parser<B>) -> Parser<(A, B)> { return Parser<(A, B)> { str -> (A, B)? in let matchA = a.run(&str) let matchB = b.run(&str) } }

3:18

Both of these values is optional, so let’s use guard to unwrap them. func zip<A, B>(_ a: Parser<A>, _ b: Parser<B>) -> Parser<(A, B)> { return Parser<(A, B)> { str in guard let matchA = a.run(&str) else { return nil } guard let matchB = b.run(&str) else { return nil } } }

3:27

Finally, we can return our unwrapped pair of matches. func zip<A, B>(_ a: Parser<A>, _ b: Parser<B>) -> Parser<(A, B)> { return Parser<(A, B)> { str in guard let matchA = a.run(&str) else { return nil } guard let matchB = b.run(&str) else { return nil } return (matchA, matchB) } }

3:33

This isn’t quite right yet. As we saw with flatMap , we’re going to need to backtrack a bit to make sure not to consume any of the input if the first parser succeeds and the second parser fails.

3:42

So let’s keep track of the original substring at the beginning of parsing. func zip<A, B>(_ a: Parser<A>, _ b: Parser<B>) -> Parser<(A, B)> { return Parser<(A, B)> { str in let original = str

3:55

And if matchB fails to parse, we’ll restore the substring to its original state. func zip<A, B>(_ a: Parser<A>, _ b: Parser<B>) -> Parser<(A, B)> { return Parser<(A, B)> { str in let original = str guard let matchA = a.run(&str) else { return nil } guard let matchB = b.run(&str) else { str = original return nil } return (matchA, matchB) } }

4:15

And with this simple little operation we can combine two parsers into one, but in such a way that their results do not depend on each other, and thus we get to avoid nesting. Let’s take it for a quick spin on something simple before we go back to our latitude/longitude parser.

4:29

Suppose we wanted to parse some simple currency strings for a pre-determined list of currencies: "$10" "€10"

4:35

Say we had an enum for the currencies we support: enum Currency { case eur case gbp case usd }

4:40

We can build a currency symbol parser easily enough by simply flatMap ping on the char parser: let currency = char.flatMap { $0 == "€" ? always(Currency.eur) : $0 == "£" ? always(.gbp) : $0 == "$" ? always(.usd) : .never }

5:39

This parser is very similar to how we did the northSouth parser, and it’s a pretty common use case for the flatMap operation.

5:44

With this parser at hand we can zip it with our double parser to get what we want to build up a value of a Money struct. struct Money { let currency: Currency let value: Double }

5:54

We can build a money parser with a simple zip and a map . let money = zip(currency, double).map(Money.init) // Parser<Money>

6:07

Let’s take it for a spin. money.run("$10") // ({usd, value 10}, rest "") money.run("€10") // ({eur, value 10}, rest "") money.run("£10") // ({gbp, value 10}, rest "") money.run("฿10") // (nil, rest: "฿10")

6:41

So already zip is pretty handy since it made parsing this money format super easy. We were able to zip together two parsers in a readable, flat way, to produce a brand new, more complex parser. Zipped coordinate parsing

7:00

Let’s now try applying it to our coordinate parsing example.

7:22

We want to zip together a bunch of parsers to incrementally parse the input string, and then take all of their results to construct a Coordinate value.

7:28

For example, we can parse off the double and degree literal first: zip(double, literal("° ")) // Parser<(Double, ())> This produces a parser of a double and a Void value for the literal that comes along for the ride.

7:50

Then we want to parse off the northSouth sign for the latitude, but in order to do that right now we would have to nest our zips: zip(zip(double, literal("° ")), northSouth) // Parser<((Double, ()), Double)>

8:04

And now we have a parser of a nested tuple.

8:21

Now we turned to zip in the first place to get rid of this nesting, so instead, let’s define an overload of zip for zipping 3 things: func zip<A, B, C>( _ a: Parser<A>, _ b: Parser<B>, _ c: Parser<C> ) -> Parser<(A, B, C)> { return zip(a, zip(b, c)).map { a, bc in (a, bc.0, bc.1) } }

9:21

Luckily we can just define this in terms of the other zip . With this new overload we can now just do: zip(double, literal("° "), northSouth) // Parser<(Double, (), Double)>

9:40

But of course we have more to parse, next is the comma: zip(zip(double, literal("° "), northSouth), literal(", ")) // Parser<((Double, (), Double), ())>

9:56

We’re back to the nesting so I guess we need yet another overload: func zip<A, B, C, D>( _ a: Parser<A>, _ b: Parser<B>, _ c: Parser<C>, _ d: Parser<D> ) -> Parser<(A, B, C, D)> { return zip(a, zip(b, c, d)) .map { a, bcd in (a, bcd.0, bcd.1, bcd.2) } }

10:38

So now we can do: zip(double, literal("° "), northSouth, literal(", ")) // Parser<(Double, (), Double, ()>

10:49

But we have more parsing to do, which means more nesting. Looks like this is going to keep happening so let’s just define the zip with high enough arity for what we need. We are trying to piece together 7 parsers, so let’s make a zip up through arity-7: func zip<A, B, C, D, E>( _ a: Parser<A>, _ b: Parser<B>, _ c: Parser<C>, _ d: Parser<D>, _ e: Parser<E> ) -> Parser<(A, B, C, D, E)> { return zip(a, zip(b, c, d, e)) .map { a, bcde in (a, bcde.0, bcde.1, bcde.2, bcde.3) } } func zip<A, B, C, D, E, F>( _ a: Parser<A>, _ b: Parser<B>, _ c: Parser<C>, _ d: Parser<D>, _ e: Parser<E>, _ f: Parser<F> ) -> Parser<(A, B, C, D, E, F)> { return zip(a, zip(b, c, d, e, f)) .map { a, bcdef in (a, bcdef.0, bcdef.1, bcdef.2, bcdef.3, bcdef.4) } } func zip<A, B, C, D, E, F, G>( _ a: Parser<A>, _ b: Parser<B>, _ c: Parser<C>, _ d: Parser<D>, _ e: Parser<E>, _ f: Parser<F>, _ g: Parser<G> ) -> Parser<(A, B, C, D, E, F, G)> { return zip(a, zip(b, c, d, e, f, g)) .map { a, bcdefg in ( a, bcdefg.0, bcdefg.1, bcdefg.2, bcdefg.3, bcdefg.4, bcdefg.5 ) } }

11:31

Now we can glue all of our parsers together: zip( double, literal("° "), northSouth, literal(", "), double, literal("° "), eastWest ) // Coordinate<(Double, (), Double, (), Double, (), Double)>

12:02

There’s a lot going on with this parser, but we can map on it and construct the Coordinate value from them. let coord2 = zip( double, literal("° "), northSouth, literal(", "), double, literal("° "), eastWest ) .map { lat, _, latSign, _, long, _, longSign in Coordinate( latitude: lat * latSign, longitude: long * longSign ) } // Parser<Coordinate>

12:53

One thing to be careful of is properly accounting for the Void parsers that come from the literal parsers. Extracting reusable parsers

13:11

We will soon see a nice way of dealing those void parsers, but even before getting to that we can clean this up a bit by make a few more domain-specific parsers: let latitude = zip(double, literal("° "), northSouth) .map { lat, _, latSign in lat * latSign } // Parser<Double> We not only factored out a smaller parser, we were able to hide away the work of multiplication.

14:16

We can do the same work for the longitude. let longitude = zip(double, literal("° "), eastWest) .map { long, _, longSign in long * longSign } // Parser<Double>

14:37

We can piece those together to form a much simpler coordinate parser. let coord2 = zip(latitude, literal(", "), longitude) .map { lat, _, long in Coordinate( latitude: lat, longitude: long ) } // Parser<Coordinate>

15:09

This is nice because now we’ve gotten all the zips down to arity-3, which means as of now we don’t even need the higher arities. Now it’s still useful to have them around because we never know what else we are going to zip in the future, but it’s nice to know that often you can just do a small refactor and get smaller zips.

15:46

Now we are starting to see some real power in just a few lines of code. We were able to define a few simple, domain-specific parsers, like parsing a single character into a latitude or longitude multiplier, then built a parser off that one that could parse a latitude or longitude coordinate in its entirety, and then finally built the parser that can parse the pair of coordinates together.

16:24

So yet again zip has shown that it is really powerful, and an indispensable tool when defining transformations on our generic types. However, not everything is peaches-and-cream in the land of zip , there is a little weirdness that we are seeing with all these Void types appearing in our parsers. Turns out there is a really wonderful solution to that, but we are going to save that for a future episode. What’s the point?

16:38

It was nice to see our friends map , zip , and flatMap pop up once again, but we like to end every topic by asking “what’s the point?” This is our chance to bring everything down to earth, and make sure we aren’t getting into the theoretical weeds too much.

16:59

So, what’s the point of defining all of these transformations on the Parser type?

17:03

This is the first time we seen that the Parser type has features that none of the other parsers we’ve come across have. Everything from the failable initializers on Int , Double ,

UUID 17:30

To understand why this is so powerful, let’s compare the 3 styles of parsing we have explored to see how they compare. All of these styles are trying to parse the latitude and longitude coordinate format that we have been considered for this whole series. Comparing hand-rolled parsing

UUID 17:42

First we have the hand rolled parsing style: func parseLatLongHandRolled(_ string: String) -> Coordinate? { let parts = string.split(separator: " ") guard parts.count == 4 else { return nil } guard let lat = Double(parts[0].dropLast()), let long = Double(parts[2].dropLast()) else { return nil } let latCard = parts[1].dropLast() guard latCard == "N" || latCard == "S" else { return nil } let longCard = parts[3] guard longCard == "E" || longCard == "W" else { return nil } let latSign = latCard == "N" ? 1.0 : -1 let longSign = longCard == "E" ? 1.0 : -1 return Coordinate( latitude: lat * latSign, longitude: long * longSign ) }

UUID 17:54

This is simplest because:

UUID 17:58

it has no dependencies,

UUID 18:01

it uses just the functionality from the Swift standard library,

UUID 18:05

and it’s a really low tech solution to the problem. Almost anyone can just read this function line by line and mostly get what it is doing, it just may take awhile to fully grok.

UUID 18:26

However, it has some obvious drawbacks:

UUID 18:29

This is a one time, ad hoc solution to parsing. There is no guiding principle behind how this parsing was done, we just kinda chipped away at it with the tools we had.

UUID 18:39

We had to do a lot of messy work to handle the edge cases, and we didn’t even get all of them.

UUID 19:01

Nothing in this function is reusable. Sure we could extract out some helper functions, but then those helper functions would be responsible for consuming bits of the input, which would lead us to re-inventing the Parser type all over again.

UUID 19:15

So, while this style of parsing isn’t terrible at first, it’s not really sustainable to keep parsing in this fashion. It’s very difficult to build off of this foundation, and as time goes on it will be harder to maintain. Comparing Scanner parsing

UUID 19:32

So then we turned to the Scanner type, which is a first-class API that Apple gives us, and provides a more cohesive strategy for parsing. We didn’t code this one from scratch in the episode, but it’s not too hard to do, and if you clean things up a bit it might look something like this: import Foundation func parseLatLongWithScanner(_ string: String) -> Coordinate? { let scanner = Scanner(string: string) var lat: Double = 0 var northSouth: NSString? = "" var long: Double = 0 var eastWest: NSString? = "" guard scanner.scanDouble(&lat), scanner.scanString("° ", into: nil), scanner.scanCharacters(from: ["N", "S"], into: &northSouth), scanner.scanString(", ", into: nil), scanner.scanDouble(&long), scanner.scanString("° ", into: nil), scanner.scanCharacters(from: ["E", "W"], into: &eastWest) else { return nil } let latSign = northSouth == "N" ? 1.0 : -1 let longSign = eastWest == "E" ? 1.0 : -1 return Coordinate( latitude: lat * latSign, longitude: long * longSign ) }

UUID 20:08

This style has some benefits:

UUID 20:10

Well for one, we are using Apple’s APIs, and so in some sense this style has been blessed. So at the very least there are probably lots of people using these APIs and Apple is responsible for maintaining it.

UUID 20:21

Also the scanner can parse off many types of values from the front of a string. Here we are parsing doubles, specific characters and specific strings. There is a whole family of scan methods on the scanner type that can parse various types.

UUID 20:38

Also we are starting to get a bit of a better top-to-bottom story of parsing as we consume the input string. We can easily read from this that we are first parsing a double, then a string, then a character, then a string, then a double, etc…

UUID 20:59

But there are still some drawbacks:

UUID 21:01

The API is a bit cumbersome to use because it’s pretty old and pre-dates Swift. So it’s not taking advantage of value types, higher-order functions, optionals or generics.

UUID 21:11

This style doesn’t easily lead to reusing parsers. If you wanted to create a reusable parser you would need to extend the Scanner type to add your own scan methods, but you are pretty much forced to do it in the same style, where take a mutable pointer as an argument which you are responsible for mutating to hold the result of parsing. And even with that there’s no way to abstractly take two scanners and combine them into a single scanner that does the work of both. Comparing Parser parsing

UUID 21:53

So, with those two styles of parsing discussed and their drawbacks understood, we finally turned to defining our own parser type. It was just a simple function that took an inout Substring as input and return an optional first-class value. That little type ended up being able to support tons of operations that enhance its composability. You could map , zip and flatMap parsers just like you would arrays and optionals, and so with that type and those operators defined we came up with some parsers to parse the coordinate format.

UUID 22:26

Let’s quickly cut and paste them all down here: let northSouth = char .flatMap { $0 == "N" ? .always(1.0) : $0 == "S" ? .always(-1) : .never } let eastWest = char .flatMap { $0 == "E" ? .always(1.0) : $0 == "W" ? .always(-1) : .never } let latitude = zip(double, literal("° "), northSouth) .map { lat, _, sign in lat * sign } let longitude = zip(double, literal("° "), eastWest) .map { long, _, sign in long * sign } let coord2 = zip(latitude, literal(", "), longitude) .map { lat, _, long in Coordinate(latitude: lat, longitude: long) }

UUID 22:57

Amazingly this is about the same number of lines as the other forms of parsing, but some really nice things have come out of it:

UUID 23:03

First, it’s clearly built from reusable pieces. We defined 4 little parsers that can be understood and tested in isolation, and can be used anywhere, not just for this particular parser. And then we glued them together in various ways to get the final coordinate parser.

UUID 23:27

Also, just like the Scanner parsing it tells a story of how the parsing is happening. It’s easy to see exactly the order that the parsing is happening and how you get from start to finish.

UUID 23:51

It takes advantage of some of Swift’s most powerful features, such as generics and optionals.

UUID 23:56

It’s more composable. We haven’t even covered all of Parser ‘s composability properties, there’s even more to say.

UUID 24:03

And perhaps most amazingly, we have built up a little parser library with some helper functions that allows us to attack parsing problems in much the same way that we have attached other problems in the past, such as randomness . If you haven’t watched our episodes on randomness yet, we highly recommend you do, because in those episodes we define a proper randomness type, and then introduce map , flatMap and zip in order to show how truly complex forms of randomness can be built from smaller pieces. And I think it’s amazing that we now have a common set of language primitives that can be used to discuss two topics as different and disparate as parsing and randomness.

UUID 24:41

But it wouldn’t be fair to only talk about the pros without also talking a bit about the cons. Some things that come to mind are:

UUID 24:49

So we are specifically eschewing Apple’s APIs and some people may be uncomfortable with that. We think it’s ok because we have identified that Apple’s APIs are not composable and do not take advantage of Swift’s features, which are properties we hold to high regard, but others may not.

UUID 25:00

Also this may be a new style of parsing to you, and so it may take some time to learn and get comfortable with. It’s very powerful, and often packs a huge punch in a tiny package, and so it can take time to get fully comfortable with its capabilities.

UUID 25:49

So these 70 or so lines of code is the point of why we are studying the parser type. The third style improves the first two styles in almost every respect, and honestly we are only just getting started. We have a bunch more stuff we want to cover.

UUID 26:29

We want to generalize the parser type so that it can parse more than just strings. There are lots of things out there we want to parse.

UUID 26:38

Also we’d like to be able to open source this parser as a library so that people can use it, but there are some ergonomics issues to work through. There are some really simple things we can do to make this type much nicer to work with.

UUID 26:52

Also, it turns out that there are a few small problems with zip as it currently stands, and there’s a really wonderful idea we can explore to fix it.

UUID 27:13

So all of that is really exciting, but even that isn’t the end of it. The topic of parsers is truly wide and deep, and we want to explore it all. But we won’t spoil any of that, and we’ll leave it for now.

UUID 27:18

Until next time. References Combinators Daniel Steinberg • Sep 14, 2018 Daniel gives a wonderful overview of how the idea of “combinators” infiltrates many common programming tasks. Note Just as with OO, one of the keys to a functional style of programming is to write very small bits of functionality that can be combined to create powerful results. The glue that combines the small bits are called Combinators. In this talk we’ll motivate the topic with a look at Swift Sets before moving on to infinite sets, random number generators, parser combinators, and Peter Henderson’s Picture Language. Combinators allow you to provide APIs that are friendly to non-functional programmers. https://vimeo.com/290272240 Parser Combinators in Swift Yasuhiro Inami • May 2, 2016 In the first ever try! Swift conference, Yasuhiro Inami gives a broad overview of parsers and parser combinators, and shows how they can accomplish very complex parsing. Note Parser combinators are one of the most awesome functional techniques for parsing strings into trees, like constructing JSON. In this talk from try! Swift, Yasuhiro Inami describes how they work by combining small parsers together to form more complex and practical ones. https://academy.realm.io/posts/tryswift-yasuhiro-inami-parser-combinator/ Learning Parser Combinators With Rust Bodil Stokke • Apr 18, 2019 A wonderful article that explains parser combinators from start to finish. The article assumes you are already familiar with Rust, but it is possible to look past the syntax and see that there are many shapes in the code that are similar to what we have covered in our episodes on parsers. https://bodil.lol/parser-combinators/ Sparse John Patrick Morgan • Jan 12, 2017 A parser library built in Swift that uses many of the concepts we cover in our series of episodes on parsers. Note Sparse is a simple parser-combinator library written in Swift. https://github.com/johnpatrickmorgan/Sparse parsec Daan Leijen, Paolo Martini, Antoine Latter Parsec is one of the first and most widely used parsing libraries, built in Haskell. It’s built on many of the same ideas we have covered in our series of episodes on parsers, but using some of Haskell’s most powerful type-level features. http://hackage.haskell.org/package/parsec Parse, don’t validate Alexis King • Nov 5, 2019 This article demonstrates that parsing can be a great alternative to validating. When validating you often check for certain requirements of your values, but don’t have any record of that check in your types. Whereas parsing allows you to upgrade the types to something more restrictive so that you cannot misuse the value later on. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ Ledger Mac App: Parsing Techniques Chris Eidhof & Florian Kugler • Aug 26, 2016 In this free episode of Swift talk, Chris and Florian discuss various techniques for parsing strings as a means to process a ledger file. It contains a good overview of various parsing techniques, including parser grammars. https://talk.objc.io/episodes/S01E13-parsing-techniques Downloads Sample code 0061-composable-parsing-zip Point-Free A hub for advanced Swift programming. Brought to you by Brandon Williams and Stephen Celis . Content Become a member The Point-Free Way Beta previews Gifts Videos Collections Free clips Blog More About Us Community Slack Mastodon Twitter BlueSky GitHub Contact Us Privacy Policy © 2026 Point-Free, Inc. All rights are reserved for the videos and transcripts on this site. All other content is licensed under CC BY-NC-SA 4.0 , and the underlying source code to run this site is licensed under the MIT License .