EP 57 · What Is a Parser · May 13, 2019 ·Members

Video #57: What Is a Parser?: Part 2

smart_display

Loading stream…

Video #57: What Is a Parser?: Part 2

Episode: Video #57 Date: May 13, 2019 Access: Members Only 🔒 URL: https://www.pointfree.co/episodes/ep57-what-is-a-parser-part-2

Episode thumbnail

Description

Now that we’ve looked at a bunch of parsers that are at our disposal, let’s ask ourselves what a parser really is from the perspective of functional programming and functions. We’ll take a multi-step journey and optimize using Swift language features.

Video

Cloudflare Stream video ID: ad8a1fe006a82c6c76fd485015c6d3d0 Local file: video_57_what-is-a-parser-part-2.mp4 *(download with --video 57)*

References

Transcript

0:05

The manual, hand-rolled coordinate parsing function we made last time is already pretty complicated, and capturing every edge case will make it more complicated still.

0:13

But that’s not even the worst part. The worst part is that this parse function is a one-off, ad hoc solution to parsing the very specific format of latitude/longitude coordinates. There is nothing inside the body that is reusable outside of this example.

0:27

We’re definitely seeing how tricky and subtle parsing can be, and it’s easy to end up with a complex function that would need to be studied very carefully to understand what it’s doing, and even then it would be easy to overlook the potential bugs that lurk within.

0:47

Let’s back up and formally state what the problem of parsing is and analyze it from the perspective of functional programming and functions. What is a parser?

1:00

According to the work we just did, we could maybe define a parser as literally a function from a string into some other type. We could maybe even define it as a simple type alias: typealias Parser<A> = (String) -> A

1:18

It’s going to be handy to give this a proper type so that we can extend it with methods, so let’s wrap the function in a struct: struct Parser<A> { let run: (String) -> A }

1:44

However, the shape of Parser isn’t quite correct right now because parsing isn’t always guaranteed to succeed and produce a value. There will always be some blobs of data that are malformed and we just can’t parse it. So, let’s introduce some failability: struct Parser<A> { let run: (String) -> A? }

2:09

This simple type comes up quite a bit in application development. We often have unstructured data, like a bare string, and want to transform it into something structured. For example, deep-link routing in an iOS application could be thought of as a parser: let router = Parser<Route> { str in fatalError() }

2:53

Route represents a tree of routes that you support in your application, for example: enum Route { case home case profile case episodes case episode(id: Int) }

3:20

If we could construct that router value, then whenever a deep-link request comes through our app delegate, we can simply run the parser on it, get our first-class Route value of out it, switch on that value, and then do some basic logic to bring up the correct screen in our app: router.run("/") // .home router.run("/episodes/42") // .episode(42)

4:18

And once you have this apparatus in place, you could route incoming deep links using a switch statement. switch router.run("/") { case .none: case .some(.home): case .some(.profile): case .some(.episodes): case let .some(.episode(id)): }

5:11

We could also think of command line tools as parsers: let cli = Parser<EnumPropertyGenerator> { _ in fatalError() }

5:30

EnumPropertyGenerator represents all of the ways you can use the tool. enum EnumPropertyGenerator { case help case version case invoke(urls: [URL], dryRun: Bool) }

6:14

And just like with our router, should we construct one of these values we can switch on it to handle the input. cli.run("generate-enum-properties --version") // .version cli.run("generate-enum-properties --help") // .help cli.run("generate-enum-properties --dry-run path/to/file") // .invoke(urls: ["path/to/file"], dryRun: true) switch cli.run("generate-enum-properties --dry-run path/to/file") { case .none: case .some(.help): case .some(.version): case let .some(.invoke(urls, dryRun)): } Parsing as a multi-step process

7:33

So although we can see how useful it would be to have these parsers, the type we have defined so far is still not quite right. Creating a parser like this means we must definitely answer the question of how to turn a string into an A value, and so that is not conducive to being able to create small parsers that do a little bit of parsing, and piecing them together into parsers that parser a big thing.

8:14

A way to do this is to think of parsing as a multi-step process, and the Parser type should only describe one step in that process. This means changing the function signature to return a result alongside the rest of the string we want to parse.

8:46

We want to have this functionality for our Parser type. A step in parsing consists of consuming some subset of the input string, and then returning what is left of the string to parse along with the result of the parsing. struct Parser<A> { let run: (String) -> (match: A?, rest: String) }

9:42

As a very simple example, let’s build an integer parser. It will parse an integer off the beginning of a string, if possible, and return it along with the remainder of the string. Let’s start with the basic set up of the parser: let int = Parser<Int> { str in // (Int?, String) }

9:58

Somehow in here we have to return a new string that represents the input string after we have consumed an integer off the beginning, and the result of having converted that consumed prefix into a string. Let’s start by getting a prefix of all the numeric characters of the given string: let int = Parser<Int> { str in let prefix = str.prefix(while: { $0.isNumber })

10:34

We can then try to convert this prefix to an integer using the Int initializer. let int = Parser<Int> { str in let prefix = str.prefix(while: { $0.isNumber }) let int = Int(prefix)

10:39

This is a failable operation, though, and if the prefix of the string is not numeric, int will be nil , so we want to guard against that case and bail out by returning a nil match and the original string, since we don’t want to consume any of the string if parsing failed. let int = Parser<Int> { str in let prefix = str.prefix(while: { $0.isNumber }) guard let int = Int(prefix) else { return (nil, str) }

11:14

We’re getting close, we have the integer we want to return but we don’t have the “rest” of the string that we’ve consumed. Do do so, we can use its end index to return the characters we didn’t consume: let int = Parser<Int> { str in let prefix = str.prefix(while: { $0.isNumber }) guard let int = Int(prefix) else { return (nil, str) } let rest = String(str[prefix.endIndex...]) return (int, rest) }

12:15

And this is our first parser! It’s a little complicated and we’re working with string indices, which can be tricky, but it should encapsulate all of what it means to parse an integer off the beginning of a string.

12:39

Let’s take it for a spin: int.run("42") // (match 42, rest "") int.run("42 Hello World") // (match 42, rest " Hello World" int.run("Hello World") // (match nil, rest "Hello World"

13:12

And just like that we have already created a parser. It may not seem like much, but this little unit will be the basis of a lot of amazing things to come. Optimized parsing with inout

13:43

But before we continue developing the parser type, let’s address a few quick optimization opportunities. Right now a parser’s run function takes a string, and returns a whole new string that represents what is left of the input after some subset of it has been consumed in parsing. We could make this more efficient by instead mutating the input to represent that consumption, and in fact material covered in an old Point-Free episode can lead us to the exact refactor we need to do.

14:03

In just our second episode on Point-Free, we showed that there’s an equivalence between functions that take a value in A and return a value in A and functions that simply take an inout A and return Void : // (A) -> A // (inout A) -> Void

14:13

You can easily build a generic transformation that moves between these two styles of functions. And more generally, if you have a type in a function signature that appears once on each side of a function arrow, you can convert that into a function that takes an inout input and no longer returns that type.

14:28

And this is exactly what we want to do with our run function. It has a String type on both sides of the function, and so we will convert that to an inout : struct Parser<A> { // let run: (String) -> (match: A?, rest: String) let run: (inout String) -> A? }

14:44

And now we have to fix our int parser. Instead of returning a new string and result, we will in-place mutate the string to consume the prefix. let int = Parser<Int> { str in let prefix = str.prefix(while: { $0.isNumber }) guard let match = Int(prefix) else { return nil } str.removeFirst(prefix.count) return match }

15:31

Now our run s don’t work because they expect an inout argument, so for now let’s recreate the non-mutating version of run so that we can continue using it this way: extension Parser { func run(_ str: String) -> (match: A?, rest: String) { var str = str let match = self.run(&str) return (match, str) } }

16:07

And everything continues working just as it did before. Optimized parsing with substring

16:10

There’s one more optimization we can make, and it’s a big one. Right now we are operating on strings, and each time we consume a bit from the front of a string we are creating a whole new string to return from the run function. That could be very inefficient if we are trying to parse large input strings. But it doesn’t have to be that way. Since we are typically consuming from the front of the string, what if we could do something very lightweight, like change the startIndex of the string to point to characters inside the string. That would trick the string into thinking it’s shorter than it really is, but it will still be the same string that we started with, no copying necessary.

16:35

Turns out, this is exactly how the Substring type works in Swift. So if we were to work with a string’s substring instead, maybe we could move that index along and make things really performant.

16:58

Substring is a wrapper around some string storage, and it’s intended to share that storage with other Substring instances. Then, certain mutations of substrings can be done very efficiently because internally all it does is move around indices that indicate where the string begins and ends.

17:20

Every Swift collection has an underlying SubSequence type, and they’re all intended to make collection usage more efficient.

17:48

Let’s try updating our parser to work on Substring and see what goes right and what goes wrong. We’ll start by changing its mutating run function to work with inout Substring instead of a String . struct Parser<A> { let run: (inout Substring) -> A? }

18:00

This breaks our non-mutating version of run , which is still working with plain ole String : func run(_ str: String) -> (match: A?, rest: String) { var str = str let match = self.run(&str) Cannot invoke ‘run’ with an argument list of type ‘(inout String)’

18:11

We can now avoid the string copy by using its substring representation, which we can get by subscripting with what Swift calls an “unbounded range expression”: var str = str[...] This looks a lil funky but it’s effectively giving us a view into the string rather than creating a brand new one.

18:20

And finally, we should update the return value to return that substring instead. func run(_ str: String) -> (match: A?, rest: Substring) { var str = str[...] let match = self.run(&str) return (match, str) }

18:29

And that’s all that needed to change! The interface for mutating substrings is the same as the interface for mutating strings, but we’ll get a bit of a performance boost by working with a view into the string instead of a copy.

18:58

We still have the restriction that the entire String must be in memory, which means parsing a very large String isn’t going to be efficient, but the optimizations we’ve made so far were very low-hanging and already buy us a lot, so let’s kick that can a bit down the road. Till next time

19:08

We now have the “final form” of our parser: it’s a function that takes an in-out substring and produces an optional match. So let’s create a few more parsers so that we get a feel for how this goes. References Parse, don’t validate Alexis King • Nov 5, 2019 This article demonstrates that parsing can be a great alternative to validating. When validating you often check for certain requirements of your values, but don’t have any record of that check in your types. Whereas parsing allows you to upgrade the types to something more restrictive so that you cannot misuse the value later on. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ Ledger Mac App: Parsing Techniques Chris Eidhof & Florian Kugler • Aug 26, 2016 In this free episode of Swift talk, Chris and Florian discuss various techniques for parsing strings as a means to process a ledger file. It contains a good overview of various parsing techniques, including parser grammars. https://talk.objc.io/episodes/S01E13-parsing-techniques Swift Strings and Substrings Chris Eidhof & Florian Kugler • Dec 1, 2017 In this free episode of Swift talk, Chris and Florian discuss how to efficiently use Swift strings, and in particular how to use the Substring type to prevent unnecessary copies of large strings. Note We write a simple CSV parser as an example demonstrating how to work with Swift’s String and Substring types. https://talk.objc.io/episodes/S01E78-swift-strings-and-substrings Swift Pitch: String Consumption Michael Ilseman et al. • Mar 3, 2019 Swift contributor Michael Ilseman lays out some potential future directions for Swift’s string consumption API. This could be seen as a “Swiftier” way of doing what the Scanner type does today, but possibly even more powerful. https://forums.swift.org/t/string-consumption/21907 Difficulties With Efficient Large File Parsing Ezekiel Elin et al. • Apr 25, 2019 This question on the Swift forums brings up an interesting discussion on how to best handle large files (hundreds of megabytes and millions of lines) in Swift. The thread contains lots of interesting tips on how to improve performance, and contains some hope of future standard library changes that may help too. https://forums.swift.org/t/difficulties-with-efficient-large-file-parsing/23660 Downloads Sample code 0057-what-is-a-parser-pt2 Point-Free A hub for advanced Swift programming. Brought to you by Brandon Williams and Stephen Celis . Content Become a member The Point-Free Way Beta previews Gifts Videos Collections Free clips Blog More About Us Community Slack Mastodon Twitter BlueSky GitHub Contact Us Privacy Policy © 2026 Point-Free, Inc. All rights are reserved for the videos and transcripts on this site. All other content is licensed under CC BY-NC-SA 4.0 , and the underlying source code to run this site is licensed under the MIT License .