Video #298: Back to Basics: Hashable

Episode: Video #298 Date: Oct 14, 2024 Access: Members Only 🔒 URL: https://www.pointfree.co/episodes/ep298-back-to-basics-hashable

Description

While the documentation for Equatable discusses the notions of “equivalence relation” and “substitutability”, there are conformances in the Standard Library that run afoul, but for pragmatic reasons. Let’s explore them and then dive deeper into a related protocol: Hashable.

Video

Cloudflare Stream video ID: 6e98abaeffcd5df8281b335002073b14 Local file: video_298_back-to-basics-hashable.mp4 *(download with --video 298)*

References

Transcript

— 0:05

So, that was a really deep dive into the Equatable protocol. I bet most of our viewers didn’t expect to get dragged into a abstract mathematical hole in order to understand something that seems to be so simple.

— 0:17

And here is where we try to bring things down to Earth a bit.

— 0:19

Everything we have said up to this point is technically correct, and as you can see there are a lot of weird things that can happen if you start to indiscriminately fudge with the concept of equality. You can very easily write reasonably looking code that gives very unreasonable results, and that completely breaks out intuition when it comes to understanding how code is supposed to work by reading it. Stephen

— 0:40

However, even in the standard library there are multiple examples of Equatable implementations that do not live up to the promise of the documentation of the protocol. And sometimes we need to strike a balance between an idealized mathematical world, and the real, pragmatic world.

— 0:55

So, let’s take a few looks at how these ideas can fall apart in practice, and why we have chosen to live with it. Pragmatism and Equatable

— 1:02

Probably the most common example of a type that has a slightly strange Equatable conformance is floating point numbers. Every mainstream language follows a standard for defining such numbers, called the IEEE standard, including Swift. And so Swift really has no choice but to support things like the “not-a-number” value in all floating point types, such as Double : Double.nan

— 1:21

To deviate from IEEE standards would mean that Swift has a numerics system that is unrecognizable to nearly every other programming language out there, including C and C++ which one of Swift’s big strengths is that it can interop with those languages.

— 1:35

And this special value was included into the IEEE specification to handle certain mathematical operations that do not have a well-defined result, such as dividing 0 by 0: Double.zero / .zero // NaN

— 1:48

…or taking the square root of -1: sqrt(-1) // NaN

— 1:54

…or multiplying infinity by 0: Double.infinity * .zero // NaN

— 1:59

And unfortunately this special value kind of exists outside the number system in certain ways. Because it is being used as a catch-all for “an invalid operation occurred”, such as the 3 cases above, it doesn’t carry any information with it to distinguish whether the NaN came from doing 0/0 or sqrt(-1) or some other invalid operation.

— 2:19

And because of that, the IEEE standard made the choice to have NaN be unequal to itself : var nanValue = Double.nan nanValue == nanValue // false

— 2:33

This right here flies in the face of the reflexivity contract that Equatable wants to uphold, which says that a value should always be equal to itself.

— 2:42

But this decision was made as a balance between being principled and being practical. At the end of the day people will expect floating point types in Swift to adhere to the IEEE standards so that it will be compatible across platforms, but also people will expect these types to be equatable because they are just numbers. It would be quite upsetting if you couldn’t ask if two doubles are equal, even though once you are familiar with the ins and outs of floating point numbers you realize that that is not such a simple question after all.

— 3:07

This isn’t the only strangeness. The bigger a double gets the less precision it has. Doubles seem to allow you to have arbitrarily large numbers in Swift. For example, check out this gigantic number: let veryLargeNumber: Double = 1_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000_000 // 1E+84

— 3:19

That’s a 1 with 84 zeros! This number is bigger than some of the estimates of how many atoms there are in the observable universe, and yet Swift can seemingly hold this number.

— 3:28

However, that’s not entirely true, this number is actually a very crude approximation of what we have written down. Swift cannot distinguish between this number and all the numbers around it. It starts to lose precision as the numbers get bigger, as well as smaller, and at these extremes it will just have no choice but to say that numbers near each other are equal.

— 3:46

For example, this very large number is technically equal to its successor: veryLargeNumber == veryLargeNumber + 1 // true

— 3:53

That is of course non-sensical, but it’s also necessary for a numeric type that needs to be efficient. It can’t possibly represent the infinitude of real numbers out there.

— 4:01

And so because Double has such a strange Equatable conformance it makes it all that much more important for us to be careful writing functions that use Doubles so that we do not fall afoul of the “substitutability” principle. In fact, it is quite easy to write functions that do not satisfy the principle.

— 4:16

For example, in the IEEE standard there are technically two forms of zero, a positive number and negative number: Double.zero // 0 -Double.zero // -0

— 4:28

The standard specifically calls them out as two separate values, but they are equal: Double.zero == -.zero // true

— 4:39

However, using these two values in certain contexts can lead to very different results. For example, suppose we had an algorithm that determined if a number was special: func isSpecialNumber(_ value: Double) -> Bool { 1 / value == .infinity }

— 4:56

We don’t need to know anything about the implementation details of this algorithm. All we need to know is that given it a number it returns a boolean that determines if it is special or not.

— 5:03

That seems reasonable, but unfortunately it is possible to feed this function two seemingly “equal” values that somehow give very different results: Double.zero == -.zero // true isSpecialNumber(.zero) // true isSpecialNumber(-.zero) // false And so we are seeing that floating points in Swift unfortunately do not satisfy the expected substitutability rule that Equatable wants.

— 5:18

And floating point numbers are one of those things in the programming world that everyone knows is weird, and so none of this may be too surprising to our viewers. It is a bummer that we have to fudge the technical correctness of the Equatable protocol in order to accommodate them, but that is the pragmatic approach to take in a programming language.

— 5:34

But another foundational type of Swift has a really tricky Equatable conformance that can wreak havoc on our intuition, and this one may be more surprising. And that’s String . The String type in Swift is a highly abstracted view into a low level thing that is mired in gotchas and special cases. For the most part we get to create strings from simple characters: let name = "Blob"

— 5:57

…and even not so simple characters, such as letters with accents: let cafe = "Café"

— 6:03

And String comes with an Equatable conformance: name == name // true cafe == cafe // true name == cafe // false

— 6:14

…and this is a perfectly legitimate Equatable conformance. It is reflexive, symmetric and transitive, and so it is a proper equivalence relation as spelled out in the docs and as defined in the pure mathematical sense.

— 6:25

However, the characters that make up these strings are what are known as “extended grapheme clusters”, and they represent visually what we like to think a character is, even though there may be some subtleties under the hood.

— 6:37

For example, “café” can be written in two different ways that visually look the same. The “e” with the accent can be one single character, “é”, or it can be a plain “e” with a “combining character” that presents the accent in isolation: let cafe1 = "Caf\u{e9}" let cafe2 = "Cafe\u{301}" When that character is put next to another, they combine to add the accent to the first character.

— 7:33

Visually these strings look the same: cafe1 // "café" cafe2 // "café"

— 7:36

…and so Swift does the correct thing when we ask it if the strings are equal: cafe1 == cafe2 // true

— 7:41

We wouldn’t want these strings to unequal just because they are using a different internal representation. That would force us to handle these kinds of edge cases all over our code base.

— 7:49

What we are seeing here is an example of an Equatable conformance on a type in Swift whose equivalence classes are not just single elements. Remember that previously we defined an “equivalence class” as the set of values equal to some specific value.

— 8:01

When we are using the structural equality of a type, where all of the data and only the data of a type is considered for equality, then the equivalence classes consist of just a single value. For example, if we use the default synthesized Equatable conformance for a Flag type like so: struct Flag: Equatable { let id: UUID var isEnabled = false }

— 8:17

Then there is only one single flag value in the whole universe that equals a given one, and that of course is just the given one.

— 8:24

However, as we are seeing with the cafe1 and cafe2 variables, it is possible to have two different values that are equal: cafe1 == cafe2 // true

— 8:31

These values are not structurally identical. They have very different internal representations, but that is all being squashed into a kind of normalized form in order to compare for equality.

— 8:41

And of course it’s possible to have even more than 2 strings in an equivalence class. For example, the word “déjà-vu” has two accented characters, an acute and a grave: let dejavu = "déjà-vu"

— 9:07

And there is an extended grapheme cluster for each of those characters alone, but you can also represent them as a combination of an ASCII character with combining accent character. That means there are 4 possible choices: let dejavu1 = "d\u{e9}j\u{e0}-vu" let dejavu2 = "de\u{301}j\u{e0}-vu" let dejavu3 = "d\u{e9}ja\u{300}-vu" let dejavu4 = "de\u{301}ja\u{300}-vu" dejavu1 == dejavu2 // true dejavu2 == dejavu3 // true dejavu3 == dejavu4 // true dejavu4 == dejavu1 // true

— 9:26

And so this equivalence class has 4 values in it.

— 9:29

What we are seeing here is that although the Equatable conformance on String is perfectly valid as it satisfies the laws of an equivalence relation, the fact that is doing extra work to normalize the string representation under the hood for equality checking makes it easy for us to write algorithms that fall afoul of the substitutability principle. In more mathematical terms: we can write functions that are not “well-defined.”

— 9:49

To see this, let’s take a look under the hood of String to see what is being lost in the normalization process. There are lower level abstractions of strings that we can get access to, such as unicode scalars: Array(cafe1.unicodeScalars).description // "["C", "a", "f", "\\u{00E9}"]" Array(cafe2.unicodeScalars).description // "["C", "a", "f", "e", "\\u{0301}"]"

— 10:31

Here we can clearly see how the “e with accent” forms just a single unicode scalar, but the “e” and combining accent form two unicode scalars. And for this reason, these collections are not equal: cafe1.unicodeScalars.elementsEqual(cafe2.unicodeScalars) // false

— 10:44

And we can go down another layer of abstraction to the actual UTF-8 code units: Array(cafe1.utf8).description // "[67, 97, 102, 195, 169]" Array(cafe2.utf8).description // "[67, 97, 102, 101, 204, 129]"

— 11:00

And these representations look even more different. Here we see that the “e with an accent” is formed by two code units, where as the separate “e” and accent are represented by 3 code units.

— 11:08

And of course these collections are not equal: cafe1.utf8.elementsEqual(cafe2.utf8) // false

— 11:14

And what this is showing is that when we write algorithms on strings we can easily violate the substitutability principle if we start to access its lower-level abstractions.

— 11:23

For example, suppose you have a special algorithm defined on strings, and it is able to provide a fast of the algorithm when the string’s UTF8 representation is small enough: func specialAlgorithm(_ str: String) -> Bool { if str.utf8.count <= 5 { // Fast path return true } else { // Slow path return false } }

— 11:54

This can potentially lead you to a situation where you return different results for two equal strings: cafe1 == cafe2 // true specialAlgorithm(cafe1) // true specialAlgorithm(cafe2) // false

— 12:07

And so these strings are not substitutable for each other even though they are technically “equal”, as far as the Equatable protocol can say. Laws of Hashable

— 12:24

So this seems pretty dangerous! Any algorithm you have written that deals with strings is potentially susceptible to mysterious bugs in which you can plug in two equal strings yet get two very different results back.

— 12:36

However, this is also a very well known property of strings, and anyone who deals with strings at a low level should be aware of these facts. For example, we spent a ton of time discussing these very topics in our parsing series, because when writing a parser it is crucial that you understand how to safely travel between the various abstractions without falling on the wrong side of well-definedness. Brandon

— 12:57

So, this all seems pretty bad, but there are a few reasons why it’s also not the end of the world. The domains of floating point numbers and strings are notoriously tricky and difficult to get right in the programming language world.

— 13:09

Every language has given their best shot at solving these problems, and there is no one language that can claim to have the absolute best solution. There is a triumvirate of principles that guide any implementation of these concepts: you’ve got correctness, performance, and ergonomics. It’s impossible to get all 3, and very difficult to get even 2. Each choice you make to improve in one area will probably detract from another. Stephen

— 13:38

Most languages try to strike a balance between all 3, which is what Swift is doing. Even Haskell, the language that is often put up on a pedestal as the exemplary model of a programming language, also has trouble with this problem. Their foundational types for floating points and strings also have all of the exact same problems and subtleties that we are discussing here. But they do at least go the extra mile by documenting this fact, which I think it would be nice for Swift to do.

— 14:02

Some languages will put one principle above all others, such as C. The C language provides a very barebones concept for strings that is extremely performant, but also not ergonomic and dangerous to use. Brandon

— 14:15

And then there are languages out there used for formal mathematical proofs in which the oddities we saw in the Double type are a complete non-starter. In those languages, real numbers are axiomatized, and so would be completely unrecognizable to us working in Swift. That means those numbers are very difficult to perform actual computations with, but they are great for proving mathematical theorems.

— 14:38

And it’s because of this long history of floating points and strings being mired in complexity, that it makes it less surprising when we see them fall short of the pure mathematical aspirations of the Equatable protocol. When we add a Double to our domain we know that there are a lot of gotchas to be aware of, and when we reach down into the lower abstraction layers of strings, we know that we are giving up a lot of the benefits that the String type provides to us. Stephen

— 15:03

And with all of that said, even though there are a few exceptions to the Equatable rules even in the Swift standard library, that is no reason for you to go around purposely implementing Equatable conformances on your types that fly in the face of those rules. There is not an expansive history and corpus of deep research in your own types, and so there will be no reason whatsoever to expect to see the strange behavior we previously saw. The ability to have a Set of users, which should only contain unique users, suddenly contain multiple of the same user, makes you start to question everything about your app. If that can happen, then what other bizarre situations can happen? Brandon

— 15:37

There is another protocol closely related to Equatable in Swift that can be very useful for various constructions and algorithms, and it’s called Hashable . Its purpose is to allow one to distill a single integer from the data type that can be used as a kind of fast-path comparison of data types to help save expensive work.

— 15:57

Let’s take a look at this protocol and see what fundamental laws must be satisfied.

— 16:03

The Hashable protocol actually inherits from the Equatable protocol and adds two new requirements: public protocol Hashable: Equatable { var hashValue: Int { get } func hash(into hasher: inout Hasher) }

— 16:19

The hashValue property allows one to compute a single integer from a value, and the hash(into:) method allows one to combine the data inside the conforming type into a Hasher , which is the thing that ultimately computes the hashValue . In fact, when conforming this type you never need to implement hashValue yourself. That is automatically provided based on your implementation of hash(into:) .

— 16:46

So, what is the point of this protocol?

— 16:47

It allows a type to distill all of internal data down to just a single integer. And of course it is impossible for us to compute a unique integer for any value of our data type. After all, the User type we defined earlier holds onto an integer, boolean and string: struct User: Equatable { let id: Int var isAdmin = false var name: String }

— 17:17

There is no way to compute a single integer that uniquely identifies every possible value from this type.

— 17:21

However, uniqueness is not a requirement of this hashValue . It is only meant as a simple and cheap computation to produce an integer that can be used as a kind of fast path for certain kinds of computations. The most prototypical example of this is dictionaries. The hashValue provides a relatively coarse bucketing of values by an integer, and so when we need to search for a value in a dictionary we can first look it up by its hashValue , and then further compare all the values that have the same hashValue . This will allow us to skip searching nearly all values in the dictionary, which could be hundreds or thousands of elements.

— 18:00

However, in order for this to work there is a crucial property that must be satisfied by hashValue . We will find the following paragraph in the documentation for Hashable : Note Hashing a value means feeding its essential components into a hash function, represented by the Hasher type. Essential components are those that contribute to the type’s implementation of Equatable . Two instances that are equal must feed the same values to Hasher in hash(into:) , in the same order.

— 18:32

This last sentence is the really important part. It says that if we have two values of a Hashable type such that they are equal, then their hashes should be equal: let x: Hashable let y: Hashable if x == y { assert(x.hashValue == y.hashValue) }

— 18:56

Note that the inverse of this does not need to hold: let x: Hashable let y: Hashable if x.hashValue == y.hashValue { assert(x == y) }

— 19:08

Just because two values have the same hashValue it does not mean that they must be equal. And in fact, that is impossible. The Int type has a finite number of potential values, though it is a huge number of values, and there are types out there with a much bigger space of possible values, such as arrays of strings. So the only property we really want to hold is that equal values have equal hashes.

— 19:36

And this property should sound pretty familiar to us by now. This is precisely the wording used when describing the “substitutability” principle for functions on Equatable types, also known as the well-definedness property of functions.

— 19:52

The hashValue property can be thought of as a function: func hashValue(_ value: some Hashable) -> Int { value.hashValue }

— 19:56

And this law of the Hashable protocol says that hashValue must be well-defined: let x: Hashable let y: Hashable if x == y { assert(hashValue(x) == hashValue(y)) }

— 20:03

The type String is Hashable , and as we saw a moment ago, its Equatable conformance is tricky since it can squash many distinct representations down into a single, normalized representation. But luckily for us, String ’s Hashable conformance took all of that into account and is well-defined. We can check this by seeing that the hashValue on the 4 distinct, yet equal, representations of “déjà-vu” are indeed equal: dejavu1.hashValue // -287,754,916,349,372,141 dejavu2.hashValue // -287,754,916,349,372,141 dejavu3.hashValue // -287,754,916,349,372,141 dejavu4.hashValue // -287,754,916,349,372,141

— 20:51

It’s pretty cool to see this tie in with the documentation from Equatable , and given everything we know about Equatable and well-defined functions, this is honestly the kind of thing that doesn’t really need to be stated. We now know that when functions are defined on Equatable types we should always strive for well-definedness, and so implementing hashValue should be no different. But of course it’s sometimes useful to restate the obvious, especially when it comes to subtle mathematical laws.

— 21:36

So, we now see that the Hashable protocol merely means we must have some process to distill an integer from our data type, and further that process must be well-defined. But how do we actually conform to the protocol?

— 21:48

Well, just as with the Equatable protocol, Swift will automatically synthesize a conformance for structs that hold onto only Hashable data. That means our User type can become Hashable right away without any additional work: struct User: Hashable { let id: Int var isAdmin = false var name: String }

— 22:10

And that’s because the compiler is generating a conformance that walks each Hashable field to compute the final hash value: func hash(into hasher: inout Hasher) { hasher.combine(id) hasher.combine(isAdmin) hasher.combine(name) }

— 22:37

And once this is done we can use the User type as a key in dictionaries, for example a dictionary that maps users to their collection of friends: var friends: [User: [User]] = [:]

— 22:55

And we can populate this dictionary with the friends of Blob: let blob = User(id: 42, name: "Blob") friends[blob] = [ User(id: 43, name: "Blob Jr."), User(id: 44, name: "Blob Sr."), ]

— 23:05

And then we can ask this dictionary to give us all the friends of Blob like so: friends[blob]

— 23:27

And if we access the friends of a user that is not in the dictionary we will get nil : let blobJr = User(id: 43, name: "Blob Jr.") friends[blobJr] // nil

— 23:42

And the way this look up works is that the dictionary first computes the hashValue of Blob: blob.hashValue // -6,875,202,220,650,386,605

— 23:53

Internally the dictionary has a very efficient lookup mechanism for seeing if there are any values stored for this hash value. If there isn’t, then it can immediately return nil without doing any more work. And if there are values with this hash value, then it only needs to perform an equality check on just those values. And the better the Hashable implementation, the finer these buckets will be so that often you only need to compare the equality of just a few values, most often even just one.

— 24:29

This concept also works for the Set data type, which requires its element to be Hashable . We can add blob to a set: let users = Set([ blob, ])

— 24:41

And if we try adding more blob s we will see that they are skipped: let users = Set([ blob, blob, blob, blob, blobJr, blobJr, blobJr, ])

— 24:54

The Set can de-duplicate any values, and can do so in an efficient manner thanks to the Hashable conformance of User .

— 25:02

And it can also perform lookups quite efficiently: users.contains(blob) // true users.contains(User(id: 44, name: "Blob Sr.") // false

— 25:14

Similar to dictionaries, this will first perform the lookup based only on the hash value, which is very fast and efficient, and only if multiple values have the same hash value does it need to perform multiple equality checks.

— 25:45

So clearly Hashable conformances are quite handy, but this is only true if the conformance is legitimate. What bad things can happen if we mess up the conformance?

— 25:54

First of all, it possible to provide a quote unquote “bad” conformance in that it’s just not very efficient. For example, if we provided our own implementation of hash(into:) for the User type that does nothing: struct User: Hashable { … func hash(into hasher: inout Hasher) {} }

— 26:08

…then everything works exactly as it did before. And this technically is a legitimate conformance in that two equal values do have the same hash. In fact, all values have the same hash: blob.hashValue // 9,043,809,318,645,924,119 blobJr.hashValue // 9,043,809,318,645,924,119

— 27:03

However, although everything seems to work on the surface, under the hood our code has become way less efficient. When a dictionary or set tries to its fast path lookup it will find this one integer hash value, and then it will be forced to perform a linear scan through all elements in the collection to compare it to the given one.

— 27:24

We can even see this concretely. Let’s provide an explicit implementation of == so that we can print when it is called: struct User: Hashable { … static func == (lhs: Self, rhs: Self) -> Bool { print("==") return lhs.id == rhs.id && lhs.isAdmin == rhs.isAdmin && lhs.name == rhs.name } }

— 27:36

And then let’s see how many times this called when asking a set if it contains a user: print("----------------") users.contains(blob) print("----") users.contains(User(id: 44, name: "Blob Sr.")) print("----------------"

— 27:59

We see that == was called three times, once to check for Blob, and twice to check for Blob Sr., who isn’t even in the set: ---------------- == ---- == == ----------------

— 28:39

But, if we bring back the default, synthesized hash(into:) : // func hash(into hasher: inout Hasher) {}

— 28:48

Now == is only called a single time for Blob, and because Blob Sr. isn’t in the set, the hash value doesn’t return any other user to compare to, and so there are zero equality checks in this case: ---------------- == ---- ----------------

— 29:05

So, while this conformance of Hashable isn’t wrong , it just isn’t very good.

— 29:13

But it is possible to give a completely wrong implementation of Hashable . One that does not satisfy the substitutability rule that is mentioned in the docs. But in order to do this one needs to actually provide a custom implementation of hash(into:) that hashes different data than what’s used for equality.

— 29:33

Remember that by default, the synthesized conformances just checks every stored property for equality, and combines every stored property into the hasher : static func == (lhs: Self, rhs: Self) -> Bool { lhs.id == rhs.id && lhs.isAdmin == rhs.isAdmin && lhs.name == rhs.name } func hash(into hasher: inout Hasher) { hasher.combine(id) hasher.combine(isAdmin) hasher.combine(name) }

— 29:52

Suppose we keep this == fixed for a moment, and let’s tweak the hash(into:) . For example, we could stop hashing the id : func hash(into hasher: inout Hasher) { // hasher.combine(id) hasher.combine(isAdmin) hasher.combine(name) }

— 29:59

This is also a perfectly legitimate implementation of hash(into:) . If two users are equal, as determined by == , then of course their hashes will be equal because their isAdmin and name fields will be equal.

— 30:29

We could even just has the name alone: func hash(into hasher: inout Hasher) { // hasher.combine(id) // hasher.combine(isAdmin) hasher.combine(name) }

— 30:32

That too is perfectly legitimate.

— 30:33

However, each of these implementations are strictly less efficient than the one that hashes all stored properties. This is because it is possible to have many distinct users with the same hashValue , meaning the buckets inside a dictionary or set have the possibility of holding onto many values, meaning that when we check for the presences of a value there could be more elements to check for equality.

— 30:43

In general, it is perfectly fine to hash any subset of the fields that are being checked for equality in the implementation of == . You will only get a less efficient hashValue , but it’s still technically correct. What is not ok to do is to hash fields that are not checked for equality. And so where things can get tricky, and where it is easy to accidentally mess things up, is if you tweak the implementation of == without also tweaking the implementation of hash(into:) .

— 31:08

For example, suppose we wanted a more relaxed form of equality for users where we just check their unique ID: static func == (lhs: Self, rhs: Self) -> Bool { lhs.id == rhs.id } And let’s use the default hash(into:) method that incorporates all of the fields into the hashValue .

— 31:17

This allows us to violate the substitutability principle the docs mention. For example, we can create a user: let blob = User(id: 42, name: "Blob")

— 31:32

And then create a mutable copy of this user as a “draft” so that we can toggle the admin privileges: var blobDraft = blob blobDraft.isAdmin.toggle()

— 31:43

Since their IDs are equal, they are equal as users, even though their isAdmin flag differs: blob == blobDraft // true

— 31:53

However, because the hashValue incorporates the isAdmin flag, each of these values will have different hashes: blob.hashValue == blobDraft.hashValue // false

— 32:05

This is going to wreak havoc on our intuitions of how this type behaves in sets and dictionaries.

— 32:17

For example, we currently have this friends dictionary that maps blob to their two friends: var friends: [User: [User]] = [:] friends[blob] = [ User(id: 43, name: "Blob Jr."), User(id: 44, name: "Blob Sr."), ] friends[blob] // [User(…), User(…)]

— 32:26

Well, if we were to look up the friends of blobDraft we would hope to get these two friends back. After all, blobDraft ’s ID is the same, and that is all that matters when it comes to equality checking.

— 32:38

However, unfortunately, we will see that blobDraft has no friends: friends[blobDraft] // nil

— 32:44

The reason this is happening is because blobDraft ’s hashValue is different from blob ’s, and so the dictionary immediately sees that there is no data for blobDraft ’s hash, and short-circuits all other logic to return nil .

— 33:02

And getting nil back when we expect an array of two friends isn’t the worst thing that can happen. It is actually possible for this code to crash.

— 33:14

If we update the dictionary through blobDraft and run the preview a few times we will eventually get a crash on this line: friends[blobDraft] = []

— 33:33

…letting us know: Fatal error: Duplicate keys of type ‘User’ were found in a Dictionary. This usually means either that the type violates Hashable’s requirements, or that members of such a dictionary were mutated after insertion.

— 33:54

This is very clearly letting us know that something is wrong with our Hashable implementation. There is an internal check inside Dictionary that sees if it ever gets into a state where it is holding onto two equal keys in different buckets. The only way that can happen is if our Hashable implementation is allowing equal values to produce unequal hash values, meaning it is not well-defined.

— 34:22

We can even hop over to the open source Swift code to see where this is defined . // This function has a highly visible name to make it stand out in stack traces. @usableFromInline @inline(never) @_unavailableInEmbedded internal func KEY_TYPE_OF_DICTIONARY_VIOLATES_HASHABLE_REQUIREMENTS( _ keyType: Any.Type ) -> Never { _assertionFailure( "Fatal error", """ Duplicate keys of type '\(keyType)' were found in a Dictionary. This usually means either that the type violates Hashable's requirements, or that members of such a dictionary were mutated after insertion. """, flags: _fatalErrorFlags()) }

— 34:35

And then further searching for the usage of this function in this file shows the following place it is invoked . /// Insert a new element into uniquely held storage. /// Storage must be uniquely referenced with adequate capacity. /// The key must not be already present in the Dictionary. @inlinable internal func _unsafeInsertNew(key: __owned Key, value: __owned Value) { _internalInvariant(count + 1 <= capacity) let hashValue = self.hashValue(for: key) if _isDebugAssertConfiguration() { // In debug builds, perform a full lookup and trap if we detect duplicate // elements -- these imply that the Element type violates Hashable // requirements. This is generally more costly than a direct insertion, // because we'll need to compare elements in case of hash collisions. let (bucket, found) = find(key, hashValue: hashValue) guard !found else { #if !$Embedded KEY_TYPE_OF_DICTIONARY_VIOLATES_HASHABLE_REQUIREMENTS(Key.self) #else fatalError("duplicate keys in a Dictionary") #endif } hashTable.insert(bucket) uncheckedInitialize(at: bucket, toKey: key, value: value) } else { let bucket = hashTable.insertNew(hashValue: hashValue) uncheckedInitialize(at: bucket, toKey: key, value: value) } _storage._count &+= 1 }

— 34:45

And this makes it clear what is going wrong. The method is _unsafeInsertNew , which is called when it detects we are inserting a new key and value into the dictionary. And in debug builds, the code goes ahead and does a correctness check by searching the dictionary if the key already exists in the dictionary.

— 35:18

So, this is clearly really bad. A bad implementation of Hashable can actually lead to a crash in your code. And when you really think hard about how dictionaries work on the inside, this all makes sense. However, it should not be necessary to be intimately familiar with how dictionaries work, or dig around the source code of dictionaries, in order to understand why this is happening.

— 35:40

In fact, the Hashable requirement for Dictionary is purely for optimization. Hashability isn’t required in order to create a dictionary-like type, in general. It’s just required to make an efficient one. We could make our own dictionary type that drops that requirement, and under the hood just stores keys and values as an array of tuples: struct SlowDictionary<Key, Value> { var keyValues: [(Key, Value)] }

— 36:05

We could even implement a subscript on this to retrieve a value from a key by linearly searching for the key: subscript(key: Key) -> Value? where Key: Equatable { for (k, v) in keyValues { if key == k { return v } } return nil }

— 36:18

This is of course very inefficient. The whole point of dictionaries and the Hashable requirement is to make it so that you don’t have to search the entire collection to find a key. But, even so, it is of course possible to implement a dictionary with only an Equatable requirement, and no Hashable .

— 36:40

And this dictionary does behave exactly as we would hope. We can create a dictionary mapping blob to their two friends: var friends2 = SlowDictionary(keyValues: [ (blob, [ User(id: 43, name: "Blob Jr."), User(id: 44, name: "Blob Sr."), ]) ])

— 37:10

And then asking for the friends of blob or blobDraft returns those friends: friends2[blob] // [User(…), User(…)] friends2[blobDraft] // [User(…), User(…)]

— 37:35

And so the only reason our code dealing with a regular dictionary is leading to unreasonable results is because we violated the substitutability principle for its Hashable implementation. That is why it is called out in the docs. And we don’t need to know all the nitty-gritty details of how dictionaries are implemented to know why this is important, but we can rest assured that as long as we fulfill that principle things will work how we expect. Next time: Reference types

— 38:13

So, that is the basics of Hashable for data types in Swift. It is just a means to distill a simple integer from a potentially complex data structure in order to allow for a fast path in certain kinds of algorithms and data structures, such as dictionaries and sets. And thanks to all of the knowledge we have built up previously, we can summarize the Hashable protocol as simply a computed property on an equatable type that is well-defined. That’s all it is. And if you don’t fulfill that promise in your type, it will be very easy to write reasonable looking code that produces very unreasonable results. Stephen

— 38:48

So, we should all feel pretty comfortable with the Equatable and Hashable protocols now. We have gone deep into their theoretical foundations, and shown time and time again why the properties of “equivalence relations” and “well-definedness” must be upheld in order to write code that is easy to reason about. But there is one major part of the Swift programming language that we have purposely ignored each step of the way, and that is reference types. Brandon

— 39:11

Everything so far has used value types, and for good reason. Value types are just bags of data, and this is a concept that mathematics does very well with, which is where the root of all the concepts can be found. Stephen

— 39:24

But reference types throw a wrench in all of the wonderful mathematical properties we have explored. They are an amalgamation of data and behavior, and they can change their data over time all on their own. That seems quite complicated, and so how are we supposed to deal with equatability when it comes to reference types?

— 39:42

Well, let’s dive into that…next time! References Hashable Note A type that can be hashed into a Hasher to produce an integer hash value. Documentation for the Swift protocol. https://developer.apple.com/documentation/swift/equatable Equatable Note A type that can be compared for value equality. Documentation for the Swift protocol. https://developer.apple.com/documentation/swift/equatable Equivalence relation Note In mathematics, an equivalence relation is a binary relation that is reflexive, symmetric and transitive. The Wikipedia page defining an “equivalence relation,” a mathematical concept underpinning Swift’s Equatable protocol. https://en.wikipedia.org/wiki/Equivalence_relation Downloads Sample code 0298-back-to-basics-equatable-pt2 Point-Free A hub for advanced Swift programming. Brought to you by Brandon Williams and Stephen Celis . Content Become a member The Point-Free Way Beta previews Gifts Videos Collections Free clips Blog More About Us Community Slack Mastodon Twitter BlueSky GitHub Contact Us Privacy Policy © 2026 Point-Free, Inc. All rights are reserved for the videos and transcripts on this site. All other content is licensed under CC BY-NC-SA 4.0 , and the underlying source code to run this site is licensed under the MIT License .