EP 58 · What Is a Parser · May 20, 2019 ·Members

Video #58: What Is a Parser?: Part 3

smart_display

Loading stream…

Video #58: What Is a Parser?: Part 3

Episode: Video #58 Date: May 20, 2019 Access: Members Only 🔒 URL: https://www.pointfree.co/episodes/ep58-what-is-a-parser-part-3

Episode thumbnail

Description

It’s time to ask the all important question: what’s the point? We now have a properly defined parser type, one that can parse efficiently and incrementally, but does it give us anything new over existing tools?

Video

Cloudflare Stream video ID: b3c2a2e018c3d4a5371a273fe5bce0a5 Local file: video_58_what-is-a-parser-part-3.mp4 *(download with --video 58)*

References

Transcript

0:06

The interface for mutating substrings is the same as the interface for mutating strings, but we’ll get a bit of a performance boost by working with a view into the string instead of a copy.

0:20

We still have the restriction that the entire String must be in memory, which means parsing a very large String isn’t going to be efficient, but the optimizations we’ve made so far were very low-hanging and already buy us a lot, so let’s kick that can a bit down the road. Constructing more parsers

0:30

We now have the “final form” of our parser: it’s a function that takes an in-out substring and produces an optional match. So let’s create a few more parsers so that we get a feel for how this goes.

0:50

First let’s build a parser that will try to parse a double off the beginning of a string: let double = Parser<Double> { str in let prefix = str.prefix(while: { $0.isNumber || $0 == "." }) guard let match = Double(prefix) else { return nil } str.removeFirst(prefix.count) return match }

2:32

Let’s take it for a spin! double.run("42") // (match 42.0, rest "") double.run("42.8743289247") // (match 42.8743289247, rest "") double.run("42.8743289247 Hello World") // (match 42.8743289247, rest " Hello World")

3:04

This double parser isn’t perfect because we can’t consume strings with multiple decimal points: double.run("42.4.1.4.6") // (nil, rest: "42.4.1.4.6")

3:28

With a little more work we can make it right, but we are going to just leave it here for now.

3:34

We could also make a parser that parses a constant string off the beginning of a string. This is useful for making sure that certain tokens are present in a string. This parser is a little different in the other two, in that we don’t actually care about getting back the data we parsed off the string, but only care whether or not the parsing succeeded. Therefore the parser is of type Parser<Void> , which may seem a little weird:

4:11

One interesting thing about this parser is that it’s a function that produces a parser. This allows us to provide some upfront configuration for how our parser behaves, in this case we provide the string that we want to match on the beginning of the input string. func literal(_ literal: String) -> Parser<Void> { return Parser<Void> { str in guard str.hasPrefix(literal) else { return nil } str.removeFirst(literal.count) return () } }

5:33

So, for example: literal("cat").run("cat dog") // ((), rest " dog" literal("cat").run("dog cat") // (nil, rest "dog cat")

6:13

We’ll encounter a bunch of other “parser generators” like literal as we go onward, where functions return parsers or take parsers as input to produce brand new parsers in a higher-order kind of way.

6:28

We could also cook up some seemingly pathological parsers, but they actually turn out to be pretty handy as we will later see. For example, we could cook up a parser that always succeeds and doesn’t consume anything from the input: func always<A>(_ a: A) -> Parser<A> { return Parser { _ in a } }

7:01

The always parser always succeeds. always("cat").run("dog") // (match "cat", rest "dog")

7:09

This may not seem like it makes a lot of sense, but the parser succeeded with “cat” and we still have “dog” to left to parse.

7:22

We could also do the opposite: a parser that never succeeds but instead immediately fails and does not consume any of the input: func never<A>() -> Parser<A> { return Parser { _ in nil } }

7:42

So to use it, we can give an explicit generic and run our parser. (never() as Parser<Int>).run("dog") // (nil, rest "dog")

8:07

It’s a little awkward to use a generic function like this, but it’s necessary since Swift doesn’t support “generic variables”: // let never<A> = Parser<A> { … }

8:31

A way we can approximate this is through static computed properties, which is something we’ve used quite a lot on Point-Free: extension Parser { static var never: Parser { return Parser { _ in nil } } }

8:47

And now using never becomes a little bit nicer. Parser<Int>.never.run("dog") // (nil, rest "dog") What’s the point?

9:04

We’ve now built five parsers: an int parser that scans an Int off the beginning of a string, a double parser that scans a Double , the literal parser that scans an exact string off the beginning of another string, and then an always parser and a never parser, which always or never succeed.

9:28

OK, now we are getting into some weird stuff. We are defining parsers that always succeed and never succeed? Before we go too much further, let’s slow down and ask “what’s the point?”. We started this episode by demoing some of the parsers that come with Swift and Foundation. They got the job done, but we claimed that they weren’t very extensible or composable. And so we went down this road building our own Parser type, and although we have parsed a few things, we still haven’t done anything too complicated. So, where is this going?

10:00

Well, the real point of this episode was for us to all get comfortable with the problem space of parsers, and to properly define what a parser is. As far as we are concerned, it’s just a function that takes a string and returns an optional value of some type, and it will possibly consume some subset of the input string.

10:26

And although the type we have defined so far doesn’t seem to be able to do too much, we promise that there is an entire world of composability lurking in that type, we should haven’t explored it yet. But even before we get to all of that, we think this type is already showing some promise. Revisiting coordinate parsing

10:50

Let’s go back to the latitude/longitude coordinate parsing function and update it to use our new Parser type.

10:58

We previously defined this parser as a plain ole function, where we did a bunch of manual work using split , checking array count s, character equality checks, and so on. func parseLatLong(_ string: String) -> Coordinate? { let parts = string.split(separator: " ") guard parts.count == 4 else { return nil } guard let lat = Double(parts[0].dropLast()), let long = Double(parts[2].dropLast()) else { return nil } let latCardinal = parts[1].dropLast() guard latCardinal == "N" || latCardinal == "S" else { return nil } let longCardinal = parts[3] guard longCardinal == "E" || longCardinal == "W" else { return nil } let latSign = latCardinal == "N" ? 1.0 : -1 let longSign = longCardinal == "E" ? 1.0 : -1 return Coordinate( latitude: lat * latSign, longitude: long * longSign ) }

11:23

Let’s redo this function using the parsers we constructed.

11:33

The first thing we need to do is get our string input into the Substring format because that is what are parsers understand: func parseLatLong(_ coordString: String) -> Coordinate? { var str = coordString[...] }

11:53

Then we could first parse off a double from the front of the input: func parseLatLong(_ coordString: String) -> Coordinate? { var str = coordString[...] guard let lat = double.run(&str) else { return nil } }

12:21

And then we could parse off the degree symbol and whitespace from the input: func parseLatLong(_ coordString: String) -> Coordinate? { var str = coordString[...] guard let lat = double.run(&str) else { return nil } guard literal("° ").run(&str) != nil else { return nil }

12:52

Before going further, let’s combine our guard statements to clean things up a bit. func parseLatLong(_ coordString: String) -> Coordinate? { var str = coordString[...] guard let lat = double.run(&str), literal("° ").run(&str) != nil else { return nil }

13:11

Now we need to parse an “N” or “S” character from the string, and convert that to a +1 or -1. Previously we did that in multiple steps, but now we can build up a specialized parsers that does just that: let northSouth = Parser<Double> { str in guard let cardinal = str.first, cardinal == "N" || cardinal == "S" else { return nil } str.removeFirst(1) return cardinal == "N" ? 1 : -1 }

14:16

And we can use this self-contained, reusable unit of parsing rather simply: func parseLatLong(_ coordString: String) -> Coordinate? { var str = coordString[...] guard let lat = double.run(&str), literal("° ").run(&str) != nil, let latSign = northSouth.run(&str) else { return nil } }

14:37

Then we need to parse off the comma and whitespace: func parseLatLong(_ coordString: String) -> Coordinate? { var str = coordString[...] guard let lat = double.run(&str), literal("° ").run(&str) != nil, let latSign = northSouth.run(&str), literal(", ").run(&str) != nil else { return nil } }

14:55

And then we do the process all over again by parsing off a double, then the degree symbol, and then the cardinal direction. func parseLatLong(_ coordString: String) -> Coordinate? { var str = coordString[...] guard let lat = double.run(&str), literal("° ").run(&str) != nil, let latSign = northSouth.run(&str), literal(", ").run(&str) != nil, let long = double.run(&str), literal("° ").run(&str) != nil, let longSign = eastWest.run(&str) else { return nil }

15:18

We’re going to need to define eastWest though: let eastWest = Parser<Double> { str in guard let cardinal = str.first, cardinal == "E" || cardinal == "W" else { return nil } str.removeFirst(1) return cardinal == "E" ? 1 : -1 }

15:35

And bringing it all together we have: func parseLatLong(_ coordString: String) -> Coordinate? { var str = coordString[...] guard let lat = double.run(&str), literal("° ").run(&str) != nil, let latSign = northSouth.run(&str), literal(", ").run(&str) != nil, let long = double.run(&str), literal("° ").run(&str) != nil, let longSign = eastWest.run(&str) else { return nil } return Coordinate( latitude: lat * latSign, longitude: long * longSign ) }

16:05

And now our previous bit of parsing, which was erroneously succeeding, now fails! print(parseLatLong("40.6782% N- 73.9442% W")) // nil

16:16

If we switch things to use the correct symbols, it passes. print(parseLatLong("40.6782° N, 73.9442° W")) // Coordinate(latitude: 40.6782, longitude: -73.9442)

16:28

I think this is already looking quite a bit better than the hand rolled parsing we were doing before. For one thing, all of the incremental consumption of the input is happening in a linear, line-by-line fashion and telling a very direct story. First we parse off a double, then the degree sign, then the cardinal direction, then a comma, then another double, then another degree sign, and then another cardinal direction. Once all the data is parsed from the string we bring it all together to create the actual Coordinate value.

17:07

Something else that is better about this style is that we got our first glimpse at code reuse. The northSouth and eastWest parsers we built can be used anywhere, not just for parsing this specific coordinate format. And we can peel off as many little helper parsers as we want. For example, right now we are repeating the literal("° ") parser twice, so maybe we should extract it out:

17:32

The Scanner style of parsing has many of these benefits, but because its API hasn’t been updated for Swift it can be quite cumbersome to use. We’ll save you from the details of coding it up from scratch, but this is what it would look like: func parseLatLongWithScanner(_ string: String) -> Coordinate? { let scanner = Scanner(string: string) var lat: Double = 0 guard scanner.scanDouble(&lat) else { return nil } guard scanner.scanString("° ", into: nil) else { return nil } var northSouth: NSString? = "" guard scanner.scanCharacters(from: ["N", "S"], into: &northSouth) else { return nil } let latSign = northSouth == "N" ? 1.0 : -1 guard scanner.scanString(", ", into: nil) else { return nil } var long: Double = 0 guard scanner.scanDouble(&long) else { return nil } guard scanner.scanString("° ", into: nil) else { return nil } var eastWest: NSString? = "" guard scanner.scanCharacters(from: ["E", "W"], into: &eastWest) else { return nil } let longSign = eastWest == "E" ? 1.0 : -1 return Coordinate( latitude: lat * latSign, longitude: long * longSign ) }

18:39

Each time we parse something we need two lines: one to set up a mutable variable, and another to do the scanning, which also requires a guard to check if it scanned successfully.

18:55

So at the very least our Parser type provides a more ergonomic interface to parsing that is more conducive to sharing parsers and embracing code reuse. That alone might be reason enough to use this type, but it’s only the beginning. Now that we’ve laid the foundation for the structure of what a parser is we can begin to define a whole bunch of useful, reusable parsers and build up far more complex ones. And we’ll be able to do so with all of those universal operators that we have covered many times on Point-Free, like map , zip and flatMap , which it turns out the Parser type has! And it’s those operators that unleash a whole world of composability, which allow us to piece together lots of small parsers to build up a huge, complex parser. This is what we will discuss next time! References Parse, don’t validate Alexis King • Nov 5, 2019 This article demonstrates that parsing can be a great alternative to validating. When validating you often check for certain requirements of your values, but don’t have any record of that check in your types. Whereas parsing allows you to upgrade the types to something more restrictive so that you cannot misuse the value later on. https://lexi-lambda.github.io/blog/2019/11/05/parse-don-t-validate/ Ledger Mac App: Parsing Techniques Chris Eidhof & Florian Kugler • Aug 26, 2016 In this free episode of Swift talk, Chris and Florian discuss various techniques for parsing strings as a means to process a ledger file. It contains a good overview of various parsing techniques, including parser grammars. https://talk.objc.io/episodes/S01E13-parsing-techniques Swift Strings and Substrings Chris Eidhof & Florian Kugler • Dec 1, 2017 In this free episode of Swift talk, Chris and Florian discuss how to efficiently use Swift strings, and in particular how to use the Substring type to prevent unnecessary copies of large strings. Note We write a simple CSV parser as an example demonstrating how to work with Swift’s String and Substring types. https://talk.objc.io/episodes/S01E78-swift-strings-and-substrings Swift Pitch: String Consumption Michael Ilseman et al. • Mar 3, 2019 Swift contributor Michael Ilseman lays out some potential future directions for Swift’s string consumption API. This could be seen as a “Swiftier” way of doing what the Scanner type does today, but possibly even more powerful. https://forums.swift.org/t/string-consumption/21907 Difficulties With Efficient Large File Parsing Ezekiel Elin et al. • Apr 25, 2019 This question on the Swift forums brings up an interesting discussion on how to best handle large files (hundreds of megabytes and millions of lines) in Swift. The thread contains lots of interesting tips on how to improve performance, and contains some hope of future standard library changes that may help too. https://forums.swift.org/t/difficulties-with-efficient-large-file-parsing/23660 Scanner Apple Official documentation for the Scanner type by Apple. Although the type hasn’t (yet) been updated to take advantage of Swift’s modern features, it is still a very powerful API that is capable of parsing complex text formats. https://developer.apple.com/documentation/foundation/scanner NSScanner Nate Cook • Mar 2, 2015 A nice, concise article covering the Scanner type, including a tip of how to extend the Scanner so that it is a bit more “Swifty”. Take note that this article was written before NSScanner was renamed to just Scanner in Swift 3. https://nshipster.com/nsscanner/ Downloads Sample code 0058-what-is-a-parser-pt3 Point-Free A hub for advanced Swift programming. Brought to you by Brandon Williams and Stephen Celis . Content Become a member The Point-Free Way Beta previews Gifts Videos Collections Free clips Blog More About Us Community Slack Mastodon Twitter BlueSky GitHub Contact Us Privacy Policy © 2026 Point-Free, Inc. All rights are reserved for the videos and transcripts on this site. All other content is licensed under CC BY-NC-SA 4.0 , and the underlying source code to run this site is licensed under the MIT License .