Video #119: Parser Combinators Recap: Part 1

Episode: Video #119 Date: Oct 5, 2020 Access: Members Only 🔒 URL: https://www.pointfree.co/episodes/ep119-parser-combinators-recap-part-1

Description

It’s time to revisit one of our favorite topics: parsing! We want to discuss lots of new parsing topics, such as generalized parsing, performance, reversible parsing and more, but before all of that we will start with a recap of what we have covered previously, and make a few improvements along the way.

Video

Cloudflare Stream video ID: a92a932b4bacf0a7e200f62110627939 Local file: video_119_parser-combinators-recap-part-1.mp4 *(download with --video 119)*

References

Transcript

— 0:05

Today we will pick up a topic that we last covered over a year ago, and it’s one of the most popular series of episodes we have ever done. And that’s parsing.

— 0:14

We spent 9 episodes exploring the problem space of parsing. First we gave a concise definition of what it means to parse something in general, including what tools Apple gives us in their frameworks for parsing. Then we gave a proper functional programming definition of a parser, and of course it basically boils down to just one single function signature. And with that function signature we were able to define all the usual functional operators on it that we have grown to love, such as map , zip and flatMap , and each one had a very precise job of how to break down a large, complex problem into smaller ones. And then finally we discussed “parser combinators”, which are custom little functions that take parsers as input and produce parsers as output, and these things allowed us to take parsing to the next level.

— 0:58

However, even though the things we covered previously were quite powerful, it doesn’t even begin to scratch the surface of parsing. There is so much we want to cover.

— 1:10

First, we want to improve the ergonomics of the parsers that we have defined so far, because we haven’t given much attention to that yet and there are a few sharp edges right now.

— 1:19

Next, we want to show how to generalize parsing so that we can parse more than just strings. We’d like to be able to parse any kind of data type.

— 1:28

And then, amazingly, the very act of generalizing our parser will give just the tools we need to finely tune the performance of our parsers. If done correctly they can be nearly as efficient as hand rolled parsers, which is pretty amazing.

— 1:40

And then finally, we want to show how to flip parsing on its head by describe what it means to “unparse” something. That is, if you were to “unparse” a parsed result, and then re-parse it, you should get back to where you started. That may not sound very interesting, but trust us, it’s surprising and amazing to see, but we don’t want to give away any spoilers right now.

— 2:02

So, there’s still a ton to cover, and we’ll get there in the coming months, but right now we want to take a step back and give a little recap on everything we covered before. There are a lot of people out there that haven’t watched our parser episodes, though we highly encourage you to, and others that have watched may have gotten a bit rusty. This episode will quickly summarize everything we covered in the past 9 episodes to get us all on the same foundation from which we can build much bigger concepts. It won’t have a ton of new content for those who are already familiar with parsing, but we will make a few tweaks to the previous code in order to improve its ergonomics along the way, so hopefully everyone will get at least something from this episode.

— 2:44

So let’s get started! What is a parser?

— 2:46

We began our parser episodes last time by giving a broad definition of what parsing is, and then showed that Apple actually gives us quite a few tools for doing rudimentary parsing. Very roughly speaking, parsing is the process of turning a nebulous blob of data into a well-structured piece of data.

— 3:04

That’s a very rough definition, but there are some functions in the standard library that aim to do precisely that. For example, there’s an initializer on Int that takes a string as input: Int.init as (String) -> Int?

— 3:22

This initializer will attempt to parse an integer out of a string or return nil if it fails: Int("42") // 42 Int("Blob") // nil

— 3:38

So, this is taking a nebulous blob of data, in this case a string, and turning it into something more well-structured, an integer.

— 3:51

A similar Double parser exists: Double.init as (String) -> Double?

— 3:57

And it, too, will attempt to parse a double out of a string. Double("42.42") // 42.42 Double("Blob") // nil

— 4:08

Even Bool comes with a parser: Bool.init as (String) -> Bool? Bool("true") // true Bool("false") // false Bool("1") // nil Bool("tru") // nil Bool("verdad") // nil

— 4:34

Those are all parsers in the Swift standard library, but Foundation also has a bunch of parsers that we probably use almost every day. If you’ve ever constructed a

URL 4:55

When you give this function a well-formed string it will give you back a valid URL: URL(string: "https://www.pointfree.co")

URL 5:02

But when you pass it bad data it will return nil : URL(string: "bad^website")

URL 5:10

There are similar functions for parsing URL components and UUIDs: URLComponents.init(string:) as (String) -> URLComponents? UUID.init(uuidString:) as (String) -> UUID?

URL 5:39

There are even a bunch of parsers that are static functions on NSCoder for turning strings into various types, for example CGRect s: NSCoder.cgRect(for: "{{1,2},{3,4}}") // {x 1 y 2 w 3 h 4} NSCoder.cgRect(for: "") // {x 0 y 0 w 0 h 0}

URL 6:15

And UIEdgeInsets : NSCoder.uiEdgeInsets(for: "{1,2,3,4}") NSCoder.uiEdgeInsets(for: "")

URL 6:18

Further, there are also a bunch of formatting APIs in Foundation whose job is to parse dates and numbers from strings: DateFormatter().date(from:) as (String) -> Date? NumberFormatter().number(from:) as (String) -> NSNumber? Defining the problem of parsing

URL 6:39

So, parsing nebulous strings into well-structured types is definitely a ubiquitous problem, and Apple has given us lots of tools to do some basic parsing.

URL 6:48

All of these functions have more or less the same shape: they take a String as input and return an optional value as output. So, we may be tempted to define a parser as literally a function: (String) -> A?

URL 7:04

While this is a good start, it’s not quite enough. See, all of the parser examples we considered above have this shape, but all of those parsers also have a flaw: they are not composable. That is, we can’t take two parsers and somehow magically turn them into a single parser of two things. This fundamental operation is the basis of breaking down large parsing problems into smaller ones.

URL 7:28

So, to give parsers this kind of power we needed to tweak the signature a bit. It is too much to demand that our parser consume the entirety of the input string and produce an output value all at once. We want to be able to parse a little bit at a time. This means what we’d like is to return not only an optional value, representing whether or not we successful parsed, but also a string that represents what is left to parse after the parser ran: (String) -> (match: A?, rest: String)

URL 8:16

This allows us to parse a little bit at a time, and then we can run multiple parsers, one after the other.

URL 8:30

We further tweaked this signature by employing an equivalence between functions that have an input and output of the same type. Such functions can drop the output and simply take an inout version of the input: (inout String) -> A?

URL 8:48

This allows us to consume a bit of the string as we parse an A value off.

URL 8:56

And then we made one final tweak to the signature to work on substrings instead of strings, which helps us avoid needless copies of strings when parsers run: (inout Substring) -> A?

URL 9:33

This was the final signature for our parser type, and so we wrapped it up in a struct: struct Parser<A> { let run: (inout Substring) -> A? }

URL 9:48

We’re going to make one small change to this type before moving on: we are going to rename the generic: struct Parser<Output> { let run: (inout Substring) -> Output? }

URL 9:54

This makes it clear that the parser produces some output, and this distinction will be important in future episodes on parsers. Defining a parser

URL 10:04

And with that definition we can construct some parsers. Say we want a parser that will scan an integer off the beginning of a string. If it succeeds it will return that integer and consume all the characters from the string that went into building the integer. We can start this by constructing a new parser: let int = Parser<Int> { input in }

URL 10:31

The real work happens in the body of this closure. We have to figure out how to grab an integer off the beginning of the input parameter. One way to do this would be to use the prefix method on the substring in order to get all the numeric characters from the front of the string: let int = Parser<Int> { input in let intPrefix = input.prefix(while: \.isNumber) }

URL 11:10

And then we can try converting that prefix to an integer: let match = Int(intPrefix)

URL 11:20

And this match is what we want to return. return match

URL 11:30

But first, we have to make sure to consume a bit of the string since this is what allows other parsers to then operate on the rest of the string. In this case we just have to remove all the numeric characters we scanned from the beginning of the string: input.removeFirst(intPrefix.count)

URL 11:53

But because parsing could fail, we should bail out before attempting to consume any bit of the input: guard let match = Int(intPrefix) else { return nil } All together it’s quite simple: let int = Parser<Int> { input in let intPrefix = input.prefix(while: { $0.isNumber }) guard let match = Int(intPrefix) else { return nil } input.removeFirst(intPrefix.count) return match }

URL 12:27

And we can give this parser a spin by constructing a mutable substring and running the parser on it: var input = "123 Hello"[...] // or: var input: Substring = "123 Hello" // or: var input = "123 Hello" as Substring int.run(&input) // 123 input // " Hello"

URL 13:23

While testing these parsers it can get a little annoying to have to deal with the inout version of this interface. So, we also defined a little helper with a better API for exploring and debugging parsers: extension Parser { func run(_ input: String) -> (match: Output?, rest: Substring) { var input = input[...] let match = self.run(&input) return (match, input) } }

URL 14:00

And now we can test it like this: int.run("123 Hello") // (match: 123, rest: " Hello") int.run("Hello Blob") // (match: nil, rest: "Hello Blob")

URL 14:05

Now we see very clearly that the parser was able to parse an integer from the front of the string, and it consumed it, leaving behind only “ Hello”. And if we parse a string that does not begin with a number, we get nil and see that none of the string was consumed.

URL 14:23

This is the exact parser we developed in our past parser episodes, but it’s not entirely correct yet. We had exercises for fixing this, but let’s go ahead and do it now. The problem is that we aren’t handling negative numbers properly: int.run("-123 Hello") // (match: nil, rest: "-123 Hello")

URL 14:59

We need to further beef up this parser to handle a leading negative sign.

URL 15:04

Before scanning off all the leading numeric characters we can first check if the first character is a dash, and from that information we can compute a signed value that we will multiple our integer by later: let sign: Int // +1, -1 if input.first == "-" { sign = -1 input.removeFirst() } else { sign = 1 } Notice that when we detect the leading dash we have to further consume that single character so that the later code can start consuming the numeric characters.

URL 15:42

If we also wanted to support leading positive signs we can do that too: let sign: Int if input.first == "-" { sign = -1 input.removeFirst() } else if input.first == "+" { sign = 1 input.removeFirst() } else { sign = 1 }

URL 16:13

Now that we have the sign we can use it to multiply the matched value: return match * sign

URL 16:18

When we run this parser against our negative value we get the correct result: int.run("+123 Hello") // (match: 123, rest: "Hello") int.run("-123 Hello") // (match: -123, rest: "Hello")

URL 16:35

But this still isn’t quite right. If we try parsing a leading dash with no number after it we will find that we are accidentally consuming the dash: int.run("-Hello") // (match: nil, rest: "Hello")

URL 16:47

This isn’t right because it means that no other parsers will have the opportunity to parse this dash, and that may be important. There are many times we want to try a whole bunch of parsers on some input, and take the first one that succeeds. Usually when we do that we want each parser to have the chance to work on the original, unaltered string. But if this parser were used it would accidentally consume some of the string even though it did not succeed. In general, parsers that fail should not consume any of the input string.

URL 17:36

The fix is easy enough, we just need to keep track of the original input string so that we can reset the input if parsing the integer fails: let int = Parser<Int> { input in let original = input let sign: Int if input.first == "-" { sign = -1 input.removeFirst() } else if input.first == "+" { sign = 1 input.removeFirst() } else { sign = 1 } let prefix = input.prefix(while: { $0.isNumber }) guard let match = Int(prefix) else { input = original return nil } input.removeFirst(prefix.count) return match * sign }

URL 18:12

And now this is a correct integer parser. It may seem quite complex, and it kind of is, but this is exactly why we want composable parsing. If it’s this hard to properly parse a single integer from a string, what hope do we have to parse more complex formats, like dates, times, JSON, CSV, server logs, and all types of other string formats? So rather than trying to parse those formats all at once, we can instead define super small, focused parsers, like this int parser, and then use operators to plug them together into larger and larger parsers. Correction The int parser we developed above still isn’t quite right. Point-Free viewer Rhys Morgan pointed out that it will still not correctly parse Int.min , which is -9223372036854775808 . The reason is because we try converting the 9223372036854775808 fragment into an integer, and then multiply by -1 , however the maximum Int value is 9223372036854775807 , and so that conversion fails. To fix this we can forgo the logic for computing a sign, and instead be smarter about how we prefix the input to include an optional sign character as long as it is the first character: extension Parser where Output == Int { static let int = Self { input in let original = input var isFirstCharacter = true let intPrefix = input.prefix { character in defer { isFirstCharacter = false } return (character == "-" || character == "+") && isFirstCharacter || character.isNumber } guard let match = Int(intPrefix) else { input = original return nil } input.removeFirst(intPrefix.count) return match } } Defining a parser’s home

URL 18:48

Before moving on there’s a small thing we can do to improve this code. Right now this parser is a global variable at the file-scope. This variable isn’t a true global in the sense that it is available in every scope always. It lives only in its surrounding module, which right now is the playground, which mean it can’t ever accidentally leak into other people’s code.

URL 19:10

However, variables living in the file scope by still make some people feel a little uneasy. Instead we can simply define the variable as a static inside the type, which lets us not only use an abbreviated syntax wherever a parser is expected, but also lets us simplify references of the parser type to Self : // Parser<Int>.int // or simply: .int extension Parser where Output == Int { static let int = Self { input in … } }

URL 20:03

Now the parser is namespaced a bit more, and this is a style we have used in many of the libraries we have open sourced, such as snapshot testing, our random generator library, and even the Composable Architecture.

URL 20:22

The call site of using this parser got a little noisier now: Parser.int.run(&input) … Parser.int.run("123 Hello") Parser.int.run("+123 Hello") Parser.int.run("Hello Blob") Parser.int.run("-123 Hello") Parser.int.run("-Hello")

URL 20:31

But we will soon see that we don’t typically invoke the parser like this directly, and instead compose it into other parsers using combinators, and in that situation we get to keep the brevity of the previous style of parser.

URL 20:42

Once we defined the int parser we turned our attention to a double parser. I’ll just paste in what we came up with last time, but this time as a static: extension Parser where Output == Double { static let double = Self { input in let prefix = input.prefix(while: { $0.isNumber || $0 == "." }) let match = Double(prefix) input.removeFirst(prefix.count) return match } }

URL 20:55

This essentially scans all the numeric and decimal point characters from the beginning of the string and tries to convert that into a double. We can give it a spin: Parser.double.run("42 hello") // (match: 42, rest: " hello") Parser.double.run("4.2 hello") // (match: 4.2, rest: " hello") Parser.double.run("42. hello") // (match: 42, rest: " hello") Parser.double.run(".42 hello") // (match: 0.42, rest: " hello")

URL 21:36

So it seems to work. But, like the int parser it has a few problems too. Not only does it not handle plus and minus signs correctly, but it also tries to parse multiple decimal points: Parser.double.run("-42 hello") // (match: nil, rest: "-42 hello") Parser.double.run("+42 hello") // (match: nil, rest: "+42 hello") Parser.double.run("1.2.3 hello") // (match: nil, rest: " hello")

URL 21:54

While it fails to explicitly negative and positive doubles, even worse, it consumes “1.2.3” and still manages to fail! We’d like to scan off the 1.2 from the beginning of the string and not consume the trailing “.3”.

URL 22:09

We gave exercises to fix these problems, but let’s go ahead and paste the final solution now: extension Parser where Output == Double { static let double = Self { input in let original = input let sign: Double if input.first == "-" { sign = -1 input.removeFirst() } else if input.first == "+" { sign = 1 input.removeFirst() } else { sign = 1 } var decimalCount = 0 let prefix = input.prefix { char in if char == "." { decimalCount += 1 } return char.isNumber || (char == "." && decimalCount <= 1) } guard let match = Double(prefix) else { input = original return nil } input.removeFirst(prefix.count) return match * sign } }

URL 22:22

It’s a lot, but it also looks quite similar to the int parser. It first computes the sign of the number. It then scans all the numeric characters and possible decimal point from the beginning of the string, taking special care to not allow multiple decimal points. And then after that it’s basically the same as the int parser.

URL 22:51

If we run this parser on a wide variety of inputs we will see we get the results we expect: Parser.double.run(" hello") // (match: -42, rest: "-42 hello") Parser.double.run(" hello") // (match: 42, rest: "+42 hello") Parser.double.run(".3 hello") // (match: 1.2, rest: ".3 hello")

URL 23:00

Some simpler parsers

URL 23:00

Next we considered some considerably simpler parsers. For example, the char parser, which simply parses the first character off of a string: extension Parser where Output == Character { static let char = Self { input in guard !input.isEmpty else { return nil } return input.removeFirst() } }

URL 23:10

This can be used like so: Parser.char.run("Hello Blob") // (match: "H", rest: "ello Blob") Parser.char.run("") // (match: nil, rest: "")

URL 23:22

As well as the literal parser, which allows you to parse a precise string off the front of the input string: extension Parser where Output == Void { static func literal(_ p: String) -> Self { Self { input in guard input.hasPrefix(p) else { return nil } input.removeFirst(p.count) return () } } }

URL 23:29

And this can be used like so: Parser.literal("Hello").run("Hello Blob") // (match: (), rest: " Blob")

URL 23:40

Note that this is a parser of Void , which means it either succeeds with a value that doesn’t carry any meaning (because it’s Void ), or it fails. We do this because the only reasonable thing we could return is the string "Hello" , but that is exactly what you fed to the literal function in the first place, so doesn’t seem necessary.

URL 24:02

We can also improve the ergonomics of this parser before moving on. First of all, the name literal makes it clear we are parsing off the string passed it, but does not make it clear we are parsing it from the beginning. Now it’s true that the majority of parsers we will be building do their parsing on the front of the string, but it’s also not a hard requirement. So, instead we can take some inspiration from the collections API in the Swift standard library, and perhaps a better name would be prefix : extension Parser where Output == Void { static func prefix(_ p: String) -> Self { … } }

URL 24:28

Now it’s a bit clearer that we are only parsing off the beginning of the input string: Parser.prefix("Hello").run("Hello Blob") // (match: (), rest: " Blob") Next time: parser composition recap

URL 24:38

Now all of these parsers are cool and all, but the real power comes in the composability of parsers. The Parser type supports many forms of composition that unlock the ability to break large, complex problems into simpler ones. And ideally we can be very confident that the small parsers do their job correctly, like the int and double parsers, and then we can be confident that we glued all the parsers together correctly…next time! References Combinators Daniel Steinberg • Sep 14, 2018 Daniel gives a wonderful overview of how the idea of “combinators” infiltrates many common programming tasks. Note Just as with OO, one of the keys to a functional style of programming is to write very small bits of functionality that can be combined to create powerful results. The glue that combines the small bits are called Combinators. In this talk we’ll motivate the topic with a look at Swift Sets before moving on to infinite sets, random number generators, parser combinators, and Peter Henderson’s Picture Language. Combinators allow you to provide APIs that are friendly to non-functional programmers. https://vimeo.com/290272240 Parser Combinators in Swift Yasuhiro Inami • May 2, 2016 In the first ever try! Swift conference, Yasuhiro Inami gives a broad overview of parsers and parser combinators, and shows how they can accomplish very complex parsing. Note Parser combinators are one of the most awesome functional techniques for parsing strings into trees, like constructing JSON. In this talk from try! Swift, Yasuhiro Inami describes how they work by combining small parsers together to form more complex and practical ones. https://academy.realm.io/posts/tryswift-yasuhiro-inami-parser-combinator/ Regex Alexander Grebenyuk • Aug 10, 2019 This library for parsing regular expression strings into a Swift data type uses many of the ideas developed in our series of episodes on parsers. It’s a great example of how to break a very large, complex problem into many tiny parsers that glue back together. https://github.com/kean/Regex Regexes vs Combinatorial Parsing Soroush Khanlou • Dec 3, 2019 In this article, Soroush Khanlou applies parser combinators to a real world problem: parsing notation for a music app. He found that parser combinators improved on regular expressions not only in readability, but in performance! http://khanlou.com/2019/12/regex-vs-combinatorial-parsing/ Learning Parser Combinators With Rust Bodil Stokke • Apr 18, 2019 A wonderful article that explains parser combinators from start to finish. The article assumes you are already familiar with Rust, but it is possible to look past the syntax and see that there are many shapes in the code that are similar to what we have covered in our episodes on parsers. https://bodil.lol/parser-combinators/ Sparse John Patrick Morgan • Jan 12, 2017 A parser library built in Swift that uses many of the concepts we cover in our series of episodes on parsers. Note Sparse is a simple parser-combinator library written in Swift. https://github.com/johnpatrickmorgan/Sparse parsec Daan Leijen, Paolo Martini, Antoine Latter Parsec is one of the first and most widely used parsing libraries, built in Haskell. It’s built on many of the same ideas we have covered in our series of episodes on parsers, but using some of Haskell’s most powerful type-level features. http://hackage.haskell.org/package/parsec Parse, don’t validate Alexis King • Nov 5, 2019 This article demonstrates that parsing can be a great alternative to validating. When validating you often check for certain requirements of your values, but don’t have any record of that check in your types. Whereas parsing allows you to upgrade the types to something more restrictive so that you cannot misuse the value later on. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ Ledger Mac App: Parsing Techniques Chris Eidhof & Florian Kugler • Aug 26, 2016 In this free episode of Swift talk, Chris and Florian discuss various techniques for parsing strings as a means to process a ledger file. It contains a good overview of various parsing techniques, including parser grammars. https://talk.objc.io/episodes/S01E13-parsing-techniques Downloads Sample code 0119-parsers-recap-pt1 Point-Free A hub for advanced Swift programming. Brought to you by Brandon Williams and Stephen Celis . Content Become a member The Point-Free Way Beta previews Gifts Videos Collections Free clips Blog More About Us Community Slack Mastodon Twitter BlueSky GitHub Contact Us Privacy Policy © 2026 Point-Free, Inc. All rights are reserved for the videos and transcripts on this site. All other content is licensed under CC BY-NC-SA 4.0 , and the underlying source code to run this site is licensed under the MIT License .