EP 184 · Invertible Parsing · Apr 4, 2022 ·Members

Video #184: Invertible Parsing: The Point

smart_display

Loading stream…

Video #184: Invertible Parsing: The Point

Episode: Video #184 Date: Apr 4, 2022 Access: Members Only 🔒 URL: https://www.pointfree.co/episodes/ep184-invertible-parsing-the-point

Episode thumbnail

Description

We conclude our series on invertible parsing by converting a more complex parser into a parser-printer, and even enhance its format. This will push us to think through a couple more fun parser-printer problems.

Video

Cloudflare Stream video ID: 786ec71e1af97b10c40422d46d3bacf7 Local file: video_184_invertible-parsing-the-point.mp4 *(download with --video 184)*

References

Transcript

0:05

Phew, ok. We didn’t plan on having to do this additional episode after the last one, but it just goes to show how truly bizarre parser-printers can be. We didn’t think it would take us 6 episodes to cover the foundations of parser-printers, yet here we are.

0:21

But, we don’t think it’s appropriate to end the series just yet. So far we’ve only built a single parser-printer, which is essentially just a CSV parser that transforms the data into an array of User structs. While we did encounter some mind trippy stuff along the way, real world parser-printers can have even more bizarre situations that need careful thinking. So, we want to end this series by building one more parser-printer that is a lot more complex.

0:50

Recall that the parser we used as an example when building up the parser library from scratch in past episodes was a “marathon race” parser. It worked on a textual format that described a collection of races, each race of which had a city name, an entry fee with different currencies, and a list of geographic coordinates that described the race route. Race parser-printing

1:12

Let’s try converting this parser to be a parser-printer, and along the way we are going to come across some really interesting challenges, and thinking through those challenges in detail should help you as you build out your own parser-printers.

1:28

Let’s copy and paste the entire race parser out of the benchmark target and into a new playground page in order to make it easier to work with.

2:30

While looking through these parsers the first thing we will notice is that the first two parsers are already printers: let northSouth = OneOf { "N".utf8.map { 1.0 } "S".utf8.map { -1.0 } } let eastWest = OneOf { "E".utf8.map { 1.0 } "W".utf8.map { -1.0 } }

2:51

One thing to remember about this parser is that it works on the level of UTF-8 rather than substring. That allows parsing to be more performant, and it working on the lower level string representation of UTF-8 does not make this parser any more complicated.

3:06

All the parsers involved in these lines of code have been made into printers, which means when composed together we get another printer.

3:28

We can take these printers for a spin in order to see that the only things they print are 1 and -1 : var input = ""[...].utf8 try northSouth.print(1, to: &input) Substring(input) // "N" input = ""[...].utf8 try northSouth.print(-1, to: &input) Substring(input) // "S"

4:02

No other value prints due to how OneOf and .map on void parser-printers works: input = ""[...].utf8 try eastWest.print(2, to: &input) Substring(input) An error was thrown and was not caught

4:09

So we got pretty lucky that the first few parsers out of the gate happen to also be printers, but our luck is up with the next two parsers: let latitude = Parse(*) { Double.parser() "° ".utf8 northSouth } let longitude = Parse(*) { Double.parser() "° ".utf8 eastWest }

4:19

These aren’t parser-printers because we are just using a simple function to map the result, which multiplies the coordinate double with the north/south or east/west sign. To convince ourselves of this we can try upgrading the Parse entry point to be a ParsePrint and we will immediately get a compiler error: let latitude = ParsePrint(*) { Double.parser() "° ".utf8 northSouth } let longitude = ParsePrint(*) { Double.parser() "° ".utf8 eastWest }

5:13

The problem is that * transforms two doubles into a single one, and that works great for parsing, but for printing we need to go in the opposite direction. When printing a single double representing a latitude we need to split it into two doubles so that we can then feed those doubles to each of the double and north/south printers.

5:33

We do this kind of operation using conversions, which allow us to simultaneously describe how to transform from one type to another and back. We need to somehow cook up a conversion from two doubles to one: Conversion<(Double, Double), Double>( apply: <#((Double, Double)) throws -> Double#>, unapply: <#(Double) throws -> (Double, Double)#> )

5:56

The apply direction is easy, it’s just multiplication, which is the operation we want performed when parsing. The parser has already produced two doubles, one for the coordinate and one for the north/south sign, and so we just have to multiply them together: Conversion<(Double, Double), Double>( apply: *, unapply: <#(Double) throws -> (Double, Double)#> )

6:05

The other direction is trickier, as is usually the case when writing parser-printers. In this case we have a single double that we want to somehow separate into two. We want the magnitude of the coordinate, which is its absolute value, and separately we want the sign of the coordinate.

6:33

We can do this by checking if the value we are “unapplying” is positive or negative, and use that information to figure out how to split the value into its magnitude and sign: Conversion<(Double, Double), Double>( apply: *, unapply: { value in value < 0 ? (-value, -1) : (value, 1) } )

6:53

We just have to give this conversion a name. Since we are dealing with the magnitude and sign of a value we will name this magnitudeSign : let magnitudeSign = Conversion<(Double, Double), Double>( apply: *, unapply: { value in value < 0 ? (-value, -1) : (value, 1) } )

7:02

We can plug this conversion into the ParsePrint entry points: let latitude = ParsePrint(magnitudeSign) { Double.parser() "° ".utf8 northSouth } let longitude = ParsePrint(magnitudeSign) { Double.parser() "° ".utf8 eastWest }

7:51

And now, magically, we can print with the latitude and longitude parsers: input = ""[...].utf8 try latitude.print(42, to: &input) Substring(input) // "42.0° N" input = ""[...].utf8 try latitude.print(-42, to: &input) Substring(input) // "42.0° S" input = ""[...].utf8 try longitude.print(42, to: &input) Substring(input) // "42.0° E" input = ""[...].utf8 try longitude.print(-42, to: &input) Substring(input) // "42.0° W"

8:33

Notice that regardless of us printing positive or negative values, the printed value is always positive and instead the sign is represented as either “N”, “S”, “E” or “W”.

8:42

This is very cool to see. The parsing library is giving us the tools that allow us to concentrate on printing problems in small, hyper focused domains. To turn the latitude and longitude parsers into printers we literally only had to describe how it is we can split a double into its magnitude and sign. Once that is done the OneOf and Zip parsers take care of the rest by piecing everything together for you. It’s really incredible.

9:27

The next parser in the file is not a printer yet: let coord = Parse(Coordinate.init(latitude:longitude:)) { latitude Skip { ",".utf8 zeroOrMoreSpaces } longitude }

9:58

This is not a printer primarily because we are trying to map the two doubles obtained from the latitude and longitude into a Coordinate struct. We now need to use the struct conversion that we cooked up last episode that can not only turn two doubles into a Coordinate but can also destructure a Coordinate into a tuple: let coord = ParsePrint(.struct(Coordinate.init(latitude:longitude:))) { latitude Skip { ",".utf8 zeroOrMoreSpaces } longitude } Initializer ‘init(_:with:)’ requires that ‘Parsers.ZipVO<String.UTF8View, Prefix<Substring.UTF8View>>’ conform to ‘Printer’

10:52

This isn’t compiling because we are trying to Skip a ZipVO parser, and it’s a ZipVO parser because although ",".utf8 is a Void parser, the zeroOrMoreSpaces parser is not. As we have seen in previous episodes it simply does not make sense to skip non-void printers. The whole point of Skip is to discard whatever output was produced by the parser, and so that makes it impossible for us to be able to turn around and undo that parsing when we don’t have the output that could be used to figure out what to print.

11:39

Luckily there’s a very simple solution to this, and it’s using a tool we’ve already cooked up in a previous episode. There is an operator on parsers called .printing that not only allows you to override printing so that you can explicitly say what you want printed: let zeroOrMoreSpaces = Skip { Prefix { $0 == .init(ascii: " ") } } .printing(" "[...].utf8)

12:11

So Skip turns the Prefix parser into a Void parser (but not a printer), and then .printing turns it back into a printer by providing an override. Now this is a Void parser-printer, which means we are now skipping void parsers, and so this is a printer: Skip { ",".utf8 zeroOrMoreSpaces }

12:29

And even better, the whole coordinate parser is now also a printer. We can now print a coordinate back to a string, and it will correctly handle the north/south/east/west logic, as well as separate the longitude and latitude coordinates by a comma and a single space: input = ""[...].utf8 try coord.print(.init(latitude: 42, longitude: -10), to: &input) Substring(input) // "42.0° N, 10.0° W"

13:18

We’re making a lot of progress on converting the race parser into a parser-printer.

13:23

The next parser in the file is the currency parser, which is already a parser-printer by virtue of the fact that OneOf is a parser-printer, and we made .map into a parser-printer when mapping on void parsers into equatable values: let currency = OneOf { "€".utf8.map { Currency.eur } "£".utf8.map { Currency.gbp } "$".utf8.map { Currency.usd } } input = ""[...].utf8 try currency.print(.gbp, to: &input) Substring(input) // "£" input = ""[...].utf8 try currency.print(.usd, to: &input) Substring(input) // "$"

13:59

Next we have the money parser. It is not parser-printer because it is using a simple one-direction transformation for turning a tuple of currency and integer into a Money struct value. We can make this a bidirectional transformation by using the .struct conversion: let money = ParsePrint(.struct(Money.init(currency:dollars:))) { currency Int.parser() }

14:30

And now printing currency works: input = ""[...].utf8 try money.print(.init(currency: .eur, dollars: 100), to: &input) Substring(input) // "€100"

15:03

The next parser is the race parser, which consumes the name of the race, then a comma and space, then the money, then a new line, and then any number of newline-separated coordinates. This parser is not yet a parser-printer for a few reasons.

15:34

Most glaringly we are using the one-directional transformation when we should be using the bidirectional conversion: let race = ParsePrint(.struct(Race.init)) { … }

15:54

But even that isn’t enough because there’s another one-directional transformation being used: locationName.map { String(Substring($0)) }

16:41

The locationName parser outputs a UTF8View , and we want to massage that into a String because that’s what the location field on Race is. But when printing we need to be able to go in the other direction. That is, when printing a String we need to turn that string back into a UTF8View so that it can then be handed to the locationName printer.

17:07

While we already have a conversion from Substring to String , we need one that goes all the way from UTF8View . We can get one into place quickly enough: extension Conversion where A == Substring.UTF8View, B == String { static let string = Self { .init( apply: { String(Substring($0)) }, unapply: { $0[...].utf8 } ) } }

18:10

And now we can use it: let race = ParsePrint(.struct(Race.init)) { locationName.map(.string) Skip { ",".utf8 zeroOrMoreSpaces } money "\n".utf8 Many { coord } separator: { "\n".utf8 } }

18:16

And now we can print a full race back into a string: input = ""[...].utf8 try race.print( .init( location: "New York", entranceFee: .init(currency: .usd, dollars: 300), path: [ .init(latitude: 42, longitude: -10), .init(latitude: -5, longitude: 5), ] ), to: &input ) Substring(input) // New York, $300 // 42.0° N, 10.0° W // 5.0° S, 5.0° E This is pretty incredible.

19:33

And even more incredible, now that race is a parser-printer, the races parser automatically because a parser-printer thanks to how Many works: let races = Many { race } separator: { "\n---\n".utf8 }

19:47

So without doing anything else we can already print an entire collection of races back into a string: input = ""[...].utf8 try races.print( [ .init( location: "New York", entranceFee: .init(currency: .usd, dollars: 300), path: [ .init(latitude: 42, longitude: -10), .init(latitude: -5, longitude: 5), ] ), .init( location: "Berlin", entranceFee: .init(currency: .eur, dollars: 200), path: [ .init(latitude: 42, longitude: -10), .init(latitude: -5, longitude: 5), ] ) ], to: &input ) Substring(input) // New York, $300 // 42.0° N, 10.0° W // 5.0° S, 5.0° E // --- // New York, $300 // 42.0° N, 10.0° W // 5.0° S, 5.0° E

20:38

It is absolutely incredible to see. We only made a few very minor tweaks and this moderately complex parser has been upgraded to a parser-printer so that we can parse a string into an array of races and print an array of races back into a string.

20:56

And there was only one mind trippy thing we had to deal with which was how to replace simple double multiplication by a bidirectional process that not only multiplies two doubles into a single double, but also separates a single double into its magnitude and sign. Now that subtlety of printing exists no matter what. Even if you completely abandoned using our library for the printing of these data types, and instead decided to use ad hoc printing, you would still be faced with similar problems. It’s just that the solutions to those problems would be muddied with all the other things happening in your printing code, and the solutions would be ad hoc and not generalizable at all.

21:42

For example, something as innocent as simply printing the coordinate to a string in an ad hoc manner might look like this: func print(coordinate: Coordinate) -> String { "\(abs(coordinate.latitude))° \(coordinate.latitude < 0 ? "S" : "N"), " + "\(abs(coordinate.longitude))° \(coordinate.longitude < 0 ? "E" : "W")" }

22:34

This is really messy, and we have technically reproduced the logic of the magnitudeSign conversion, just in an ad hoc and messier manner. It’s actually pretty cool that we can completely isolate the idea of combining magnitude and sign into a value as well as separating a value into its magnitude and sign in one simple unit: let magnitudeSign = Conversion<(Double, Double), Double>( apply: *, unapply: { value in value < 0 ? (-value, -1) : (value, 1) } )

23:03

And then get to use this unit anywhere we need for our parser-printers. Difficulty

23:38

But now let’s introduce another piece of data that we want to track in our Race data type, and figuring out how to support it both from the perspective of parsers and printer is going to really stretch our minds. If you thought the magnitudeSign conversion was fun, wait till you see this.

24:00

We want to add a measure of difficulty for each race. We will put it after the entrance fee, and to make things more interesting and fun, we are going to measure difficult by the number exasperated emojis listed: New York City, $300, 🥵🥵🥵🥵 … --- Berlin, €100, 🥵🥵🥵 … --- London, £500, 🥵🥵 …

24:24

And we’d like to parse this information into an integer field on the Race type: struct Race { let location: String let entranceFee: Money let difficulty: Int let path: [Coordinate] }

24:42

Let’s see what it takes to cook up a difficulty parser-printer that can consume any number of emojis.

24:47

If we didn’t care about printing then we may be tempted to write the difficulty parser simply as: let difficulty = Prefix { $0 == "🥵" }.map(\.count) Here we consume as many emojis from the beginning as possible, and then count them. And this seems to work, at least as far as the parsing is concerned: try difficulty.parse("🥵🥵🥵🥵") // 4

25:17

However, this parser works on the level of Substring : difficulty.parse(<#&Substring#>)

25:21

And the rest of the race parser works on the level of UTF8View in order to be as performant as possible. Working on such a low-level string representation for this parser isn’t really a hindrance either because all of the literal strings we use to parse, such as “N”, “°” and even this emoji do not have multiple representations as UTF8 bytes, so we don’t have to worry about normalization of extended grapheme clusters.

25:56

So, we’d like the difficulty parser to work on UTF8View s, but that can be hard to do with Prefix because the predicate works on the level of bytes. We want to repeatedly check and consume if the beginning of the input starts with the bytes of the emoji character: Array("🥵".utf8) // [240, 159, 165, 181]

26:21

Well, there’s another way to do the work of this parser, and that’s using the Many parser: let difficulty = Many { "🥵".utf8 }.map(\.count)

26:44

It still works exactly as it did before: try difficulty.parse("🥵🥵🥵🥵".utf8) // 4 But now it is working on the level of UTF8View s.

26:51

Unfortunately the difficulty parser is not yet also a printer: difficulty.print

26:56

This is because we are using a one-directional map operation for turning the output of Many into an integer: Many { "🥵".utf8 }.map(\.count)

27:08

But what is the output type of the Many ? The element parser "🥵".utf8 is a Void parser, and Many produces an array of results, which must mean that it’s an array of voids parser. And indeed, it is: Many { "🥵".utf8 }.print(<#[()]#>, to: &<#Substring.UTF8View#>)

27:30

This array of voids is what the .map operation is acting on, and is the array that we are counting its elements.

27:56

We need to somehow upgrade this one-directional mapping operation to a bidirectional conversion. When applying we will count the elements of an array of voids, and when unapplying we will… what? Conversion<[Void], Int>( apply: \.count, unapply: <#(Int) throws -> [Void]#> )

28:23

We somehow need to turn an integer back into an array of void values. Well, there’s one really easy way to do that. We can simply build an array that repeats Void a bunch of times: Conversion<[Void], Int>( apply: \.count, unapply: { count in Array(repeating: (), count: count) } )

28:38

It’s very strange, but it compiles!

28:42

And best of all we can use it when mapping the Many parser: let count = Conversion<[Void], Int>( apply: \.count, unapply: { count in Array(repeating: (), count: count) } ) let difficulty = Many { "🥵".utf8 }.map(count)

29:07

We are calling it count because it is the closest we can get to doing an array count in the parser-printer world. We can’t expect to be able to count the elements of any kind of array and expect to be a printer. How can we reasonable turn an integer into an array of some elements we know nothing about? So, we have no choice but to force the array element to be something specific that we know about, and what better type to choose than the type that only has a single value. And with these changes we can now print an integer into a string of emojis: input = ""[...].utf8 try difficulty.print(5, to: &input) Substring(input) // "🥵🥵🥵🥵🥵"

29:35

And we’d hope we could add the difficulty parser to our race parser and everything should just work: let race = ParsePrint(.struct(Race.init(location:entranceFee:difficulty:path:))) { locationName.map(.string) Skip { ",".utf8 zeroOrMoreSpaces } money Skip { ",".utf8 zeroOrMoreSpaces } difficulty "\n".utf8 Many { coord } separator: { "\n".utf8 } } Extra argument in call

29:56

But unfortunately we have run up against an unfortunate limitation of our library. Our parser builders only support 6 parsers in a builder context, and we now have 7. Luckily there’s an easy work around because we can just group two of the parsers in a ParsePrint { } to reduce the arity by one: let race = ParsePrint(.struct(Race.init(location:entranceFee:difficulty:path:))) { locationName.map(.string) Skip { ",".utf8 zeroOrMoreSpaces } money Skip { ",".utf8 zeroOrMoreSpaces } ParsePrint { difficulty "\n".utf8 } Many { coord } separator: { "\n".utf8 } }

30:23

That gets more compiling!

30:35

And now the only compiler errors we have is we aren’t specifying a difficulty in some races we construct, so let’s do that. Conclusion

30:54

So, we have done it! We have accomplished everything we set out to do when we started this series 7 episodes ago.

31:00

We’ve distilled the concept of printing down into a protocol with a single function requirement, one that describes how to print some output into an input via an inout argument. We then conformed a bunch of types to the Printer protocol, and cooked up operators that allow us to transform existing printers into all new printers, which allowed us to construct large, complex printers from smaller printers that just focus on one single problem.

31:25

And then miraculously, it just so happens that all of the printers and operators we created are also parsers! That means if we are careful enough we can simultaneously build a printer and parser for our domain so that we can turn nebulous blobs of data into well-structured data and then turn that well-structured data back into a nebulous blob.

31:44

And we accomplished all of this without sacrificing any of the core tenets of the parsing library, such as ergonomics, generality, or performance. We can still build parser-printers with a nice, succinct builder syntax, we are free to parse-print any kind of input and output, and both parsing and printing can both be performant because they can work on low-level string representations and they can work incrementally.

32:29

There are still some fun things we can explore with parser-printers, and we will do more in the future, but we think this is enough for right now.

32:40

Until next time. References Invertible syntax descriptions: Unifying parsing and pretty printing Tillmann Rendel and Klaus Ostermann • Sep 30, 2010 Note Parsers and pretty-printers for a language are often quite similar, yet both are typically implemented separately, leading to redundancy and potential inconsistency. We propose a new interface of syntactic descriptions, with which both parser and pretty-printer can be described as a single program using this interface. Whether a syntactic description is used as a parser or as a pretty-printer is determined by the implementation of the interface. Syntactic descriptions enable programmers to describe the connection between concrete and abstract syntax once and for all, and use these descriptions for parsing or pretty-printing as needed. We also discuss the generalization of our programming technique towards an algebra of partial isomorphisms. This publication (from 2010!) was the initial inspiration for our parser-printer explorations, and a much less polished version of the code was employed on the Point-Free web site on day one of our launch! https://www.informatik.uni-marburg.de/~rendel/unparse/ Unified Parsing and Printing with Prisms Fraser Tweedale • Apr 29, 2016 Note Parsers and pretty printers are commonly defined as separate values, however, the same essential information about how the structured data is represented in a stream must exist in both values. This is therefore a violation of the DRY principle – usually quite an obvious one (a cursory glance at any corresponding FromJSON and ToJSON instances suffices to support this fact). Various methods of unifying parsers and printers have been proposed, most notably Invertible Syntax Descriptions due to Rendel and Ostermann (several Haskell implementations of this approach exist). Another approach to the parsing-printing problem using a construct known as a “prism” (a construct Point-Free viewers and library users may better know as a “case path”). https://skillsmatter.com/skillscasts/16594-unified-parsing-and-printing-with-prisms Downloads Sample code 0184-parser-printers-pt7 Point-Free A hub for advanced Swift programming. Brought to you by Brandon Williams and Stephen Celis . Content Become a member The Point-Free Way Beta previews Gifts Videos Collections Free clips Blog More About Us Community Slack Mastodon Twitter BlueSky GitHub Contact Us Privacy Policy © 2026 Point-Free, Inc. All rights are reserved for the videos and transcripts on this site. All other content is licensed under CC BY-NC-SA 4.0 , and the underlying source code to run this site is licensed under the MIT License .