Video #337: Modern Search: Relevance & Ranking

Episode: Video #337 Date: Sep 1, 2025 Access: Members Only 🔒 URL: https://www.pointfree.co/episodes/ep337-modern-search-relevance-ranking

Description

We’re ready to take advantage of some of the superpowers of full-text search, starting with relevancy. We will do a deep dive into the ranking algorithm of SQLite’s FTS5 module, explore how the text of a document affects its relevancy score, and how we can tweak these scores based on the column containing a match.

Video

Cloudflare Stream video ID: a96af6422257ef2bd946405dac023aa8 Local file: video_337_modern-search-relevance-ranking.mp4 *(download with --video 337)*

References

Transcript

— 0:05

OK, we now have the basics of full-text search in place for our reminders app. When reminders are created, updated or deleted, we update the full-text search index with the newest information, which gives us immediate and efficient access to searching through our reminders. And in the process we were able to simplify our queries, reduce the number of tables we have to join, and just generally offload the complexities of search to the FTS5 module in SQLite. Stephen

— 0:32

But we haven’t captured all of the behavior we had previously when searching with the simple SQL

LIKE 0:53

But, luckily for us it is quite easy to deal with using just a few more triggers. So, let’s take a look. Searching tags

LIKE 1:00

First let’s see concretely that we cannot search for tags. If we run the app in the simulator search for “week” we will find only a single result, “Send weekly emails”. But in reality there are many more reminders not being found that have been tagged with “#weekend”, and so we would like for those reminders to show too.

LIKE 1:15

Further, let’s open up our local SQLite database and inspect the rows in the reminderTexts virtual table to see that the tags column is empty for every single row. Once we can populate these rows with each reminder’s tags we will be able to search for tags right along side titles and notes, all in one query.

LIKE 1:36

So, we need another trigger, and again we will start with an AFTER INSERT , but this time the trigger will be on our ReminderTag join table: try ReminderTag.createTemporaryTrigger(after: .insert { new in }) .execute(db)

LIKE 1:49

And remember that ReminderTag is just a simple table with two foreign keys pointing to other tables: @Table struct ReminderTag { let reminderID: Reminder.ID let tagID: Tag.ID }

LIKE 1:55

This allows one to associate any number of tags with any reminder, and vice-versa.

LIKE 2:00

By listening for insertion events on this table we can be notified whenever a tag is freshly associated with a reminder, and so at that moment we can refresh the tags field of the reminderTexts virtual table.

LIKE 2:10

We will start similar to what we did for reminders, where we will construct an update statement for the reminderTexts table and limit it to just the reminder that is being modified: ReminderText .where { $0.reminderID.eq(new.reminderID) } .update { }

LIKE 2:28

Inside this update trailing closure we want to somehow modify the tags state to hold the info for all of the tags associated with the reminder being updated: ReminderText .where { $0.reminderID.eq(new.reminderID) } .update { $0.tags = }

LIKE 2:33

But what do we do here? We need to somehow load all of the tags associated with the reminder being modified, and concatenate their titles into a single string.

LIKE 2:41

Sounds complicated, but SQL excels at this kind of thing, and we are even allowed to construct a sub-query that will be run inside this greater

UPDATE 2:49

We can start by selecting from the tags table: $0.tags = Tag .select { }

UPDATE 2:53

We want to select the titles of all of these tags: $0.tags = Tag .select { $0.title }

UPDATE 2:56

But we want to do a little more. We actually want to concatenate all of these titles together to form a single string, and to make that a sensible thing we should probably have a delimiter between the tags. A space seems like a reasonable choice, and we can use the groupConcat function from SQL to do this: $0.tags = Tag .select { $0.title.groupConcat(" ") }

UPDATE 3:13

This now returns a single string formed by concatenating all tags together and separating them with a comma. We of course will need to ban commas from being used in tags, but that would be easy enough to do in whatever feature we build for creating and editing tags. This invariant can even be baked directly into the definition of the tags table by adding a

CHECK 3:42

The instr function is a shorthand for the phrase “in string” which checks if a substring is contained in a string. In this case we are checking if the substring “,” is contained within the value of the title column. With this check in place, if you ever try inserting a tag into the database that has a comma in its title you will instantly get a database error and the change will be rejected. That will give you a lot of confidence in the integrity of the data in your database.

CHECK 4:17

So, we are getting close, but right now we are going to assign to tags the concatenation of all tags in the database. Even ones that are not associated with the reminder that we are modifying. We have to further join the ReminderTag table to this so that we have a chance at filtering out tags not affiliated: $0.tags = Tag .join(ReminderTag.all) { $0.id.eq($1.tagID) } .select { $0.title.groupConcat(",") }

CHECK 4:43

And with that joined we can now filter out rows of the join that do not belong to the reminder: $0.tags = Tag .join(ReminderTag.all) { $0.id.eq($1.tagID) } .where { $1.reminderID.eq(new.reminderID) } .select { $0.title.groupConcat(",") }

CHECK 4:55

And now that there are potentially multiple rows associated with a single reminder we need to tell SQLite how to group these results together. If we group by the reminderID we will get just a single result of all tags concatenated: $0.tags = Tag .select { $0.title.groupConcat(",") } .join(ReminderTag.all) { $0.id.eq($1.tagID) } .where { $1.reminderID.eq(new.reminderID) } .group { $1.reminderID } And we can upfront the select clause needs to be updated now that we have two tables available to us: $0.tags = Tag .select { $0.title.groupConcat(",") } .join(ReminderTag.all) { $0.id.eq($1.tagID) } .where { $1.reminderID.eq(new.reminderID) } .group { $1.reminderID }

CHECK 5:00

And finally, groupConcat returns an optional string, so we must coalesce the result to an honest one: $0.tags = Tag .select { $0.title.groupConcat(",") } .join(ReminderTag.all) { $0.id.eq($1.tagID) } .where { $1.reminderID.eq(new.reminderID) } .group { $1.reminderID } ?? ""

CHECK 5:21

That is all it takes. When new tags are associated with a reminder we will load up all the tags associated with that reminder, join them together separated by a space, and update the full-text index with that info.

CHECK 5:32

To test let’s delete the app, re-install, and seed the database. Because we just created a whole bunch of reminders and tags from scratch we will have our full-text index fully populated. And now when we search for the term “week” we find that “Groceries”, “Haircut” and “Take a walk” appear because they are tagged with “weekend”, and “Send weekly emails” also appears because its title contains the substring “week”.

CHECK 5:49

However, there are still some holes to plug. If we navigate to “Haircut”, remove its “weekend” tag, and then search for “week” again we will find that it appears in the search results. The row itself does not show “#weekend”, so that’s good. But according to our full-text search index it still thinks that the “Haircut” reminder is associated with the “#weekend” tag.

CHECK 6:11

We need one last trigger that is invoked after a tag is removed from a reminder. When this event happens we need to execute the exact same update statement that we just wrote because we just want to update the ReminderText ’s tags column with the freshest set of tags separated by spaces.

CHECK 6:25

So, this really is just an exercise of copy-and-paste and making the trigger into an AFTER DELETE instead of AFTER INSERT : try ReminderTag.createTemporaryTrigger(after: .delete { old in ReminderText .where { $0.reminderID.eq(old.reminderID) } .update { $0.tags = ReminderTag .group(by: \.reminderID) .where { $0.reminderID.eq(old.reminderID) } .join(Tag.all) { $0.tagID.eq($1.id ) } .select { $1.title.groupConcat(",") } ?? "" } }) .execute(db)

CHECK 6:35

To test this let’s again delete and re-install the app, seed the database, and then remove the “#weekend” tag from “Haircut” and “Groceries”. Now when we search for “week” we only find the “Take a walk” and “Send weekly emails” reminders. We have properly reacted to tags being removed and refreshed the full-text search index.

CHECK 6:55

Before moving on, let’s make a small refactor of our tags triggers, because they mostly share the exact same code: the entire update query. We can extract this to a helper function: func updateTags( for reminderID: some QueryExpression<Reminder.ID> ) -> UpdateOf<ReminderText> { ReminderText .where { $0.reminderID.eq(reminderID) } .update { $0.tags = ReminderTag .group(by: \.reminderID) .where { $0.reminderID.eq(reminderID) } .join(Tag.all) { $0.tagID.eq($1.id ) } .select { $1.title.groupConcat(",") } ?? "" } }

CHECK 7:33

And invoke it from each trigger: try ReminderTag.createTemporaryTrigger(after: .insert { new in updateTags(for: new.reminderID) }) .execute(db) try ReminderTag.createTemporaryTrigger(after: .delete { old in updateTags(for: old.reminderID) }) .execute(db) Ranking

CHECK 7:51

We have now recreated the search functionality of our app using the FTS5 module from SQLite. In some ways it did complicate things, as we now have a new virtual table to contend with, and we need to use triggers to make sure the virtual table stays up-to-date with what is in our database. But, doing that bit of upfront work has allowed us to write a simpler query, with fewer joins, and now we are in a position to start taking advantage of the many superpowers of FTS. Brandon

CHECK 8:14

Let’s start with ranking. In our first, more naive attempt at adding search to our app we had no way to rank the results of searches. All we could do is show all reminders that matched the search term in their title, notes, or tags. But what if we wanted to sort the results by relevance? Or what if we wanted matches to the title to count more than matches to the notes? Or if we have entered multiple search terms, then we should rank reminders higher when they contain those terms in a similar cluster. For example, when searching “Get groceries” we should rank a reminder that is literally titled “Get groceries” over one that has “Get” in the title and “Groceries” in its notes.

CHECK 9:00

And this is where full-text search really starts to shine. It comes with all types of fun tools that we can use to better rank the results for our users. Let’s take a look.

CHECK 9:12

Let’s first take a look at why ranking can be important. If we search our current reminders for “Week” we will find the following: sqlite> select * from reminderTexts where reminderTexts MATCH 'week'; ┌────────────┬───────────────────────┬─────────┬──────────────────┐ │ reminderID │ title │ notes │ tags │ ├────────────┼───────────────────────┼─────────┼──────────────────┤ │ 1 │ Groceries │ Milk │ weekend │ │ │ │ Eggs │ │ │ │ │ Apples │ │ │ │ │ Oatmeal │ │ │ │ │ Spinach │ │ ├────────────┼───────────────────────┼─────────┼──────────────────┤ │ 2 │ Haircut │ │ weekend,easy-win │ ├────────────┼───────────────────────┼─────────┼──────────────────┤ │ 4 │ Take a walk this week │ │ weekend,fun │ ├────────────┼───────────────────────┼─────────┼──────────────────┤ │ 10 │ Send weekly emails │ │ │ └────────────┴───────────────────────┴─────────┴──────────────────┘

CHECK 9:35

Now, how would we personally expect these reminders to be sorted? The first three reminders match because they are tagged with “weekend” and the last one matches because the title has “weekly”. We might expect “weekly” to rank higher than “weekend” since it is closer to “week” in some sense. It has 1 less letter, after all.

CHECK 9:59

Further, let’s edit “Take a walk” to be “Take a walk this week”. sqlite> update reminderTexts ..> set title = 'Take a walk this week' ..> where reminderID = 4;

CHECK 10:22

Now when we search we get the exact same ordering as before, even though now we even have a reminder with the exact word “week”. I think it would be reasonable for “Take a walk this week” to be ranked first, then “Send weekly emails”, and finally “Groceries” and “Haircut” since they are tagged with “weekend”.

CHECK 10:58

Well, amazingly, full-text search comes with a tool that can smartly rank our reminders based on how relevant they are to the search terms we have entered.

CHECK 11:06

Now let’s select an additional column in this query called “rank”. It is a special function implemented by the FTS5 module that computes a relevancy score for each row given the search terms: sqlite> select rank, * from reminderTexts where reminderTexts MATCH 'week'; ┌────────────────────┬────────────┬───────────────────────┬─────────┬──────────────────┐ │ rank │ reminderID │ title │ notes │ tags │ ├────────────────────┼────────────┼───────────────────────┼─────────┼──────────────────┤ │ -0.311275160962883 │ 1 │ Groceries │ Milk │ weekend │ │ │ │ │ Eggs │ │ │ │ │ │ Apples │ │ │ │ │ │ Oatmeal │ │ │ │ │ │ Spinach │ │ ├────────────────────┼────────────┼───────────────────────┼─────────┼──────────────────┤ │ -0.428577105836935 │ 2 │ Haircut │ │ weekend,easy-win │ ├────────────────────┼────────────┼───────────────────────┼─────────┼──────────────────┤ │ -0.511054823046192 │ 4 │ Take a walk this week │ │ weekend,fun │ ├────────────────────┼────────────┼───────────────────────┼─────────┼──────────────────┤ │ -0.450732154386805 │ 10 │ Send weekly emails │ │ │ └────────────────────┴────────────┴───────────────────────┴─────────┴──────────────────┘

CHECK 11:31

OK, so what does these negative floating point numbers mean? Well, they measure the relevancy of the corresponding reminder as it relates to the search term provided. The larger the negative number, the more relevant.

CHECK 12:01

It may seem strange that the number is negative, but this is just a trick that FTS5 employs so that you get the expected behavior when sorting. If you sort by rank you would expect the most relevant result to be first: sqlite> select rank, title, notes, tags from reminderTexts ..> where reminderTexts MATCH 'week' ..> order by rank; ┌────────────────────┬───────────────────────┬─────────┬──────────────────┐ │ rank │ title │ notes │ tags │ ├────────────────────┼───────────────────────┼─────────┼──────────────────┤ │ -0.511054823046192 │ Take a walk this week │ │ weekend,fun │ ├────────────────────┼───────────────────────┼─────────┼──────────────────┤ │ -0.450732154386805 │ Send weekly emails │ │ │ ├────────────────────┼───────────────────────┼─────────┼──────────────────┤ │ -0.428577105836935 │ Haircut │ │ weekend,easy-win │ ├────────────────────┼───────────────────────┼─────────┼──────────────────┤ │ -0.311275160962883 │ Groceries │ Milk │ weekend │ │ │ │ Eggs │ │ │ │ │ Apples │ │ │ │ │ Oatmeal │ │ │ │ │ Spinach │ │ └────────────────────┴───────────────────────┴─────────┴──────────────────┘

CHECK 12:18

But by default sorting in SQLite goes in ascending order, and so that would put the least relevant results first unless you remembered to specifically order in a descending fashion. So, FTS5 sidesteps all of that and just multiplies the score by negative one so that higher ranked results are naturally sorted to the front.

CHECK 12:35

And looking at these sorted results we see that they are in the order we wanted. “Take a walk this week” is ranked highest, “Send weekly emails” is second, followed by “Haircut” and “Groceries”.

CHECK 12:47

And so that is pretty amazing. The full-text search rank function has correctly determined the relevance of the word “week” in each reminder, and has ranked the reminder with an exact match above all others, and the reminder with a closer partial match gets ranked above the reminders with a looser partial match.

CHECK 13:04

The ranking algorithm used by FTS5 is quite complex, and so it can seem pretty opaque from the outside. It takes into account a wide variety of factors, such as term frequency, position in the document, rarity of the terms being searched for, length of the documents relative other documents, among other factors.

CHECK 13:46

To see this concretely, let’s edit the “Haircut” reminder, which is currently ranked 3rd, to add a note that says “Ask if I can reschedule for next week”: sqlite> update reminderTexts ..> set notes = 'Ask if I can reschedule next week' ..> where reminderID = 2;

CHECK 14:57

Now when we search for “week” we see that “Haircut” did not move in the ranks: sqlite> select substr(rank,0,7) as r, title, notes, tags from reminderTexts where reminderTexts MATCH 'week' order by rank; ┌────────┬───────────────────────┬───────────────────────────────────┬──────────────────┐ │ r │ title │ notes │ tags │ ├────────┼───────────────────────┼───────────────────────────────────┼──────────────────┤ │ -0.524 │ Take a walk this week │ │ weekend,fun │ ├────────┼───────────────────────┼───────────────────────────────────┼──────────────────┤ │ -0.463 │ Send weekly emails │ │ │ ├────────┼───────────────────────┼───────────────────────────────────┼──────────────────┤ │ -0.437 │ Haircut │ Ask if I can reschedule next week │ weekend,easy-win │ ├────────┼───────────────────────┼───────────────────────────────────┼──────────────────┤ │ -0.327 │ Groceries │ Milk │ weekend │ │ │ │ Eggs │ │ │ │ │ Apples │ │ │ │ │ Oatmeal │ │ │ │ │ Spinach │ │ └────────┴───────────────────────┴───────────────────────────────────┴──────────────────┘ We can use the substr function to truncate rank to something more readable, as the table’s getting pretty wide, and we can drop the reminder ID from the query.

CHECK 15:02

It’s still 3rd in the list. It seems that even though there is now a direct match for “week” in its search able text, the fact that the term comes late in the document has cause it to not be ranked very high.

CHECK 15:22

But, if we were to rename the reminder to “Haircut next week”: sqlite> update reminderTexts ..> set title = 'Haircut next week' ..> where reminderID = 2;

CHECK 15:37

…then searching for “week” does cause the reminder to bump in the ranks. It is now ranked 2nd in relevancy: sqlite> select substr(rank,0,7) as r, title, notes, tags ..> from reminderTexts ..> where reminderTexts MATCH 'week' ..> order by rank; ┌────────┬───────────────────────┬───────────────────────────────────┬──────────────────┐ │ r │ title │ notes │ tags │ ├────────┼───────────────────────┼───────────────────────────────────┼──────────────────┤ │ -0.528 │ Take a walk this week │ │ weekend,fun │ ├────────┼───────────────────────┼───────────────────────────────────┼──────────────────┤ │ -0.492 │ Haircut next week │ Ask if I can reschedule next week │ weekend,easy-win │ ├────────┼───────────────────────┼───────────────────────────────────┼──────────────────┤ │ -0.466 │ Send weekly emails │ │ │ ├────────┼───────────────────────┼───────────────────────────────────┼──────────────────┤ │ -0.331 │ Groceries │ Milk │ weekend │ │ │ │ Eggs │ │ │ │ │ Apples │ │ │ │ │ Oatmeal │ │ │ │ │ Spinach │ │ └────────┴───────────────────────┴───────────────────────────────────┴──────────────────┘

CHECK 15:43

It may seem strange that “Take a walk this week” is still ranked a little higher, but this is probably because the “Haircut next week” reminder has a lot more words in its document than “Take a walk this week”. The more words in a document, the less it scores when it contains search terms.

CHECK 16:11

To see this in action in another situation, let’s alter the notes of the groceries to put “Check weekly specials” at the top of the groceries notes: sqlite> update reminderTexts ..> set notes = 'Check weekly specials' ..> || char(10) ..> || notes ..> where title = 'Groceries';

CHECK 16:56

Now groceries has a matching word right at the beginning of the notes, but the document text for the groceries reminder is quite big. And so we will see that the ranking did get a bump: sqlite> select substr(rank,0,7) as r, title, notes, tags ..> from reminderTexts ..> where reminderTexts MATCH 'week' ..> order by rank; ┌────────┬───────────────────────┬───────────────────────────────────┬──────────────────┐ │ r │ title │ notes │ tags │ ├────────┼───────────────────────┼───────────────────────────────────┼──────────────────┤ │ -0.535 │ Take a walk this week │ │ weekend,fun │ ├────────┼───────────────────────┼───────────────────────────────────┼──────────────────┤ │ -0.500 │ Haircut next week │ Ask if I can reschedule next week │ weekend,easy-win │ ├────────┼───────────────────────┼───────────────────────────────────┼──────────────────┤ │ -0.472 │ Send weekly emails │ │ │ ├────────┼───────────────────────┼───────────────────────────────────┼──────────────────┤ │ -0.423 │ Groceries │ Check weekly specials │ weekend │ │ │ │ Eggs │ │ │ │ │ Apples │ │ │ │ │ Oatmeal │ │ │ │ │ Spinach │ │ └────────┴───────────────────────┴───────────────────────────────────┴──────────────────┘

CHECK 17:02

But not enough for it to move ahead of any other documents.

CHECK 17:14

So, all of this shows that ranking search results can be quite complex and a lot of factors are taken into account. But now that we are a lot more familiar with the rank function in FTS5, let’s see what it takes to use it in our actual reminders app.

CHECK 17:28

And it may be hard to believe, but we only have to change one line of code. Right now we are sorting results only by their isCompleted state: .order { reminder, reminderText, _, _, _ in reminder.isCompleted }

CHECK 18:02

We still want to do that because we do want incomplete reminders up top and completed ones down below. But, after sorting into those two buckets we will further sort by the relevancy of the search terms, and we do this by using the rank function. Our StructuredQueries library has full support for rank, but if we didn’t know about that we could still make use of it by just using the simple #sql macro escape hatch: .order { reminder, reminderText, _, _, _ in ( reminder.isCompleted, #sql("rank"), ) }

CHECK 18:37

That is all it takes and our app is not sorting by rank.

CHECK 18:41

But of course we want to avoid using the #sql macro if our query builder has support for the SQL we need to write, and fortunately it does. There is a computed property on FTS5 virtual tables called rank and we can use it like so: .order { reminder, reminderText, _, _, _ in ( reminder.isCompleted, reminderText.rank, ) }

CHECK 19:11

That is all it takes to first sort by isCompleted , and then by the relevancy of search terms.

CHECK 19:22

Now when we run the app and search for “week” we will see that “Take a walk this week” is ranked first, followed by “Haircut next week”, and then finally “Send weekly emails” and “Groceries”. The BM25 function

CHECK 20:10

OK, we’ve now see the basics of how we can sort our search results by relevancy. The simple act of sorting by the rank function in full-text search offloads all the hard work to the FTS5 module in SQLite. Of course, the actual relevancy algorithm is a bit opaque to us, but that seems like a small price to pay to unlock such big machinery with such little work. Stephen

CHECK 20:34

But there is a way to slightly customize how documents are ranked. We previously mentioned that it takes into account many factors, such as term frequency, term position in the document, length of documents relative to other documents, and more. But currently the ranking weighs each column of our ReminderText virtual table equally. That is, a search term appearing in the title will count the same as it being found in notes or being found in tags.

CHECK 20:57

The FTS5 module gives us a tool to tweak the weight of these columns, and it’s called bm25 . Let’s take a look.

CHECK 21:06

In the docs for the FTS5 module there is a section on the bm25 function .

CHECK 21:13

Here we get to see the complex formula that computes the rank of a document given a query.

CHECK 21:17

And this bm25 function is what secretly computes the rank column that we just used. We can even hop over to the SQLite console for our database, and run a query to output the rank and bm25 of all rows matching “week”: sqlite> select rank, bm25(reminderTexts) ..> from reminderTexts ..> where reminderTexts MATCH 'week'; ┌────────────────────┬─────────────────────┐ │ rank │ bm25(reminderTexts) │ ├────────────────────┼─────────────────────┤ │ -0.423392196416313 │ -0.423392196416313 │ │ -0.500689991452773 │ -0.500689991452773 │ │ -0.535392218033979 │ -0.535392218033979 │ │ -0.472375917985977 │ -0.472375917985977 │ └────────────────────┴─────────────────────┘

CHECK 21:36

…to see that the values are all the same.

CHECK 21:40

However, this bm25 function takes extra arguments that the rank function does not. The docs call this out in the following passage: Note The first argument passed to bm25() following the table name is the weight assigned to the leftmost column of the FTS5 table. The second is the weight assigned to the second leftmost column, and so on. If there are not enough arguments for all table columns, remaining columns are assigned a weight of 1.0. If there are too many trailing arguments, the extras are ignored. So, the first argument is the name of the FTS virtual table. After that you can provide an additional argument for each column of the table that specifies the weight of that column in the search. By default the weight of every column is 1. If you want to override the weight of a particular column then you can provide extra arguments to this function, but they must go in the order that the columns are defined in the virtual table.

CHECK 22:11

For example, the first column of our virtual table is reminderID , which isn’t a column we even search on since it is marked as

UNINDEXED 22:24

The next column in the virtual table is the title, and so if we want to give it a different weight we can provide another argument. Say we provide 10: sqlite> select rank, bm25(reminderTexts, 0, 10) ..> from reminderTexts ..> where reminderTexts MATCH 'week'; ┌────────────────────┬────────────────────────────┐ │ rank │ bm25(reminderTexts, 0, 10) │ ├────────────────────┼────────────────────────────┤ │ -0.423392196416313 │ -0.423392196416313 │ │ -0.500689991452773 │ -0.701071664734543 │ │ -0.535392218033979 │ -0.740217353000073 │ │ -0.472375917985977 │ -0.755179838393609 │ └────────────────────┴────────────────────────────┘

UNINDEXED 22:35

This roughly means that a search hit in the title column counts the same as if we got ten search hits in the notes and tags column. And we can clearly see that the scores of this modified bm25 calculation differs from the default rank calculation. The last 3 entries got a noticeable bump in their scores.

UNINDEXED 22:50

To see which reminders got that bump, let’s also select the title of each reminder: sqlite> select title, rank, bm25(reminderTexts, 0, 10) ..> from reminderTexts ..> where reminderTexts MATCH 'week'; ┌───────────────────────┬────────────────────┬────────────────────────────┐ │ title │ rank │ bm25(reminderTexts, 0, 10) │ ├───────────────────────┼────────────────────┼────────────────────────────┤ │ Groceries │ -0.423392196416313 │ -0.423392196416313 │ │ Haircut next week │ -0.500689991452773 │ -0.701071664734543 │ │ Take a walk this week │ -0.535392218033979 │ -0.740217353000073 │ │ Send weekly emails │ -0.472375917985977 │ -0.755179838393609 │ └───────────────────────┴────────────────────┴────────────────────────────┘

UNINDEXED 22:55

And now we can clearly see that its precisely the reminders with “week” in their titles that got the bump. Even more interesting, the “Send weekly emails” reminder went from being ranked 3rd in the list to being ranked first! By weighing the title so heavily we have allowed the reminder that has “week” sooner in the title be ranked above reminders that have “week” later in the title.

UNINDEXED 23:12

So right now we are weighing search matches in titles quite high, and that makes sense because titles do tend to hold the most important information about a reminder. But let’s even things out a little bit by adding more weights. If we provide a weight of 5 for the notes column, we will have a situation where a single match in the notes column roughly counts as 5 matches in the tags. And as a consequence of this it will now be the case that a single search match in the title will only count as 2 matches in the notes.

UNINDEXED 23:41

And let’s also further select both the bm25 calculation with the old and new weights, that way we can see how things changed: sqlite> select title, ..> bm25(reminderTexts, 0, 10) as prev, ..> bm25(reminderTexts, 0, 10, 5) as next ..> from reminderTexts ..> where reminderTexts MATCH 'week'; ┌───────────────────────┬────────────────────┬────────────────────┐ │ title │ prev │ next │ ├───────────────────────┼────────────────────┼────────────────────┤ │ Groceries │ -0.423392196416313 │ -0.620593661237979 │ │ Haircut next week │ -0.701071664734543 │ -0.725259745835166 │ │ Take a walk this week │ -0.740217353000073 │ -0.740217353000073 │ │ Send weekly emails │ -0.755179838393609 │ -0.755179838393609 │ └───────────────────────┴────────────────────┴────────────────────┘

UNINDEXED 23:54

OK, we can now see that two reminders got bump in their scores: “Groceries“ got a large bump from -0.42 to -0.62, and “Haircut next week” went from roughly -0.70 to -0.73 and that’s thanks to the fact that their notes field contains the word “week”. So, they got a small bump, but not enough to push either ahead of the other reminders.

UNINDEXED 24:11

We’ve now provided weights for every column except for the tags column, and remember that we mentioned a moment ago that the default weight of an unspecified column is 1. And so we will get the same results if we provide one more argument of 1: sqlite> select title, ..> bm25(reminderTexts, 0, 10) as prev, ..> bm25(reminderTexts, 0, 10, 5, 1) as next ..> from reminderTexts ..> where reminderTexts MATCH 'week'; ┌───────────────────────┬────────────────────┬────────────────────┐ │ title │ prev │ next │ ├───────────────────────┼────────────────────┼────────────────────┤ │ Groceries │ -0.423392196416313 │ -0.620593661237979 │ │ Haircut next week │ -0.701071664734543 │ -0.725259745835166 │ │ Take a walk this week │ -0.740217353000073 │ -0.740217353000073 │ │ Send weekly emails │ -0.755179838393609 │ -0.755179838393609 │ └───────────────────────┴────────────────────┴────────────────────┘

UNINDEXED 24:28

OK, we have now learned quite a bit about the bm25 function in SQLite’s FTS5 module. Let’s try using it in our app. We will alter our order clause so that instead of sorting by rank we will sort by a custom bm25 .

UNINDEXED 24:43

Now our StructuredQueries library has first class support for bm25 , but let’s for a moment forget that fact. One of the nice thing about StructuredQueries is that if you ever find some esoteric functionality in SQLite that we do not fully support, you can always use the #sql macro to get access to it: .order { reminder, reminderText, _, _, _ in ( reminder.isCompleted, #sql("bm25(reminderTexts, 0, 10, 5)") // reminderText.rank ) }

UNINDEXED 25:11

It’s of course a bummer to have to resort to a SQL string here, but it’s still safe from injection attacks. And we can even add a bit of schema-safety by not hard coding the reminderTexts string, and instead interpolate the ReminderText type directly into the SQL fragment: .order { reminder, reminderText, _, _, _ in ( reminder.isCompleted, #sql("bm25(\(ReminderText.self), 0, 10, 5)") // reminderText.rank, ) }

UNINDEXED 25:30

That will protect us from typos.

UNINDEXED 25:33

With that done we would hope everything works, but sadly when we perform a search we get a strange SQL error that is reported as a purple runtime warning: Caught error: SQLite error 1: unable to use function bm25 in the requested context - while executing SELECT "remindersLists"."color" AS "color", (NOT (("reminders"."status" <> ?)) AND (coalesce("reminders"."dueDate", ?) < ?)) AS "isPastDue", "reminders"."id", "reminders"."createdAt", "reminders"."dueDate", "reminders"."isFlagged", "reminders"."notes", "reminders"."priority", "reminders"."remindersListID", "reminders"."status", "reminders"."title", "reminders"."updatedAt" AS "reminder", json_group_array(coalesce("tags"."title", ?)) FILTER (WHERE ("tags"."id" IS NOT NULL)) AS "tags" FROM "reminders" JOIN "reminderTexts" ON ("reminders"."id" = "reminderTexts"."reminderID") LEFT JOIN "reminderTags" ON ("reminders"."id" = "reminderTags"."reminderID") LEFT JOIN "tags" ON ("reminderTags"."tagID" = "tags"."id") JOIN "remindersLists" ON ("reminders"."remindersListID" = "remindersLists"."id") WHERE ("reminderTexts" MATCH ?) GROUP BY "reminders"."id" ORDER BY ("reminders"."status" <> ?), bm25(reminderTexts, 0, 10, 5)

UNINDEXED 25:51

The error message is not very good, but after searching we found that many of FTS5’s auxiliary functions, such as bm25 and we will soon see the same applies to highlighting and snippets, cannot be used in a query that performs aggregations, which is what we are doing to bundle up all the tag titles into a JSON array. There is a workaround where you use a tool called “common table expressions” to upfront compute all of the rows without aggregation, and in that context it is OK to use highlight . And then you tie the knot by aggregating that set of results.

UNINDEXED 26:26

That can definitely be a viable solution, but we have something even better in mind. Thanks to how our FTS index is set up, we soon will not even need to join the tags table in order to aggregate the reminders’ tags. We aren’t yet ready to tackle that refactor, and we will do it in a little bit, so for now let’s just stop joining these tables and hardcode the tags to an empty array: rows: query // .group { reminder, _ in reminder.id } // .leftJoin(ReminderTag.all) { $0.id.eq($2.reminderID) } // .leftJoin(Tag.all) { $2.tagID.eq($3.id) } .join(RemindersList.all) { $0.remindersListID.eq($2.id) } .order { reminder, _, _ in reminder.isCompleted } .select { reminder, reminderText, remindersList in Row.Columns( color: remindersList.color, isPastDue: reminder.isPastDue, reminder: reminder, // tags: tag.jsonTitles, tags: #sql("'[]'") ) }

UNINDEXED 27:34

OK, now when we run the app again and search it all works, and the results come back in the order we saw in the console. First “Send weekly emails”, followed by “Take a walk this week”, and then “Haircut next week”. We of course are no longer showing tags in these results, but we will be able to bring that back soon enough.

UNINDEXED 27:48

And just to really show this is sorting things differently than when we were using only the plain rank function, let’s bring back rank : .order { reminder, reminderText, _ in ( reminder.isCompleted, // #sql("bm25(reminderTexts, 0, 10, 5)"), reminderText.rank, ) }

UNINDEXED 27:55

Now when we search for “week” we see “Take a walk this week”, “Haircut next week” and “Send weekly emails”. So quite a different ordering.

UNINDEXED 28:03

But let’s go ahead and bring back the bm25 function: Type-safe BM25

UNINDEXED 28:12

We have now done a deep dive into weighted rankings of search results in FTS5 using the bm25 function, and we have seen how it can be used in our reminders app. The only bummer was that we had to resort to the #sql macro to use it. On the one hand it really isn’t so bad, and at the end of the day we have never said that 100% of all queries should be written with our query building tools, no exceptions allowed. Really, to the contrary, we have said that one should be intimately familiar with SQL before even using our tools, and any query you write with our tools you should be able to write by hand as a SQL string if asked. Brandon

UNINDEXED 28:46

But on the other hand, this bm25 is particularly difficult to use because it requires us to specify weights of columns in an order that corresponds to the order the columns were specified in the virtual table. That seems really precarious, and something we could easily get wrong in the future if we accidentally forget that notes come before tags.

UNINDEXED 29:06

Well, luckily for us our StructuredQueries library comes with a really nice tool for invoking the bm25 function and specifying the weights for each column using a static description of the column, rather than positional indices of the columns.

UNINDEXED 29:19

Let’s take a look.

UNINDEXED 29:22

All tables defined using the @Table macro and FTS5 protocol from StructuredQueries are endowed with a bm25 function reminderText.bm25()

UNINDEXED 29:37

And if we stop here and don’t provide any arguments then this is equivalent to just using rank as we have remarked before. But, this function does take arguments, and its not just a simple variadic list of integer as is the case in SQL. Instead, it takes a dictionary that maps key paths to integers. This allows you to specify the weights in a type-safe and schema-safe manner.

UNINDEXED 30:13

We can even open a dictionary literal, type the beginning of a key path, and use autocomplete to see what we have at our disposal: reminderText.bm25([\.<#⎋#>

UNINDEXED 30:18

We now see all of the columns of our ReminderText table. We can specify a column, such as \.title , and then map that to the weight of 10: reminderText.bm25([\.title: 10])

UNINDEXED 30:26

This specifies the weight of “title” to be 10, and all other columns will be weighted as their default 1. We don’t have to worry about positional indices or worry about getting the order mixed up in our heads.

UNINDEXED 30:33

And we can specify any subset of columns we want, such as both the title and notes: reminderText.bm25([\.title: 10, \.notes: 5]),

UNINDEXED 30:45

This now matches the query fragment we had before as a SQL string, but now using the static information from our schema. If we search for week we will see that “Send weekly emails” is still top ranked, followed by “Take a walk this week” and then “Haircut next week”.

UNINDEXED 31:20

And there’s just one last thing we want to show off. While it is cool to be able to customize the rank function using bm25 , and even cooler that we can do so in a type-safe and schema-safe way, you probably don’t often have to tweak these parameters. Instead you probably want to just hard code them a single time.

UNINDEXED 31:37

Well, we’ve already shown off one way to cook up little reusable SQL helpers like this. In the past we have extended a table’s TableColumns inner type in order to define helpers, like we did on the Reminder table: extension Reminder.TableColumns { var isCompleted: some QueryExpression<Bool> { status.neq(Reminder.Status.incomplete) } … }

UNINDEXED 32:12

We can do something similar with the ReminderText table, even though it is a virtual table: extension ReminderText.TableColumns { var defaultRank: some QueryExpression<Double> { bm25([\.title: 10, \.notes: 5]) } }

UNINDEXED 32:52

And then instead of using rank we can just use defaultRank : reminderText.defaultRank,

UNINDEXED 33:00

And it is even possible to set the default weights used by rank and bm25 directly in SQLite. There are special

INSERT 33:23

We can even open up the docs and look at the table of contents to see a bunch of configurations.

INSERT 33:32

The one we are interested in is the “ rank configuration option ”.

INSERT 33:35

And here we can see a sample of how this looks.

INSERT 34:03

We can perform an insert like that when our database is provisioned in order to see the default weights for bm25 : INSERT INTO reminderTexts (reminderTexts, rank) VALUES ('rank', 'bm25(0, 10, 5)');

INSERT 34:58

And it’s OK to execute this out of a migration. It can be done every single time the app starts up. Executing it a second time only overrides our previous configuration with the new value, and it’s no big deal if it’s the same value as last time.

INSERT 35:33

Now it would be great if our StructuredQueries library provided first class support for editing FTS5’s configuration, but at this time it does not. And that is totally OK. We still have full access to everything SQLite has to offer us thanks to the #sql macro.

INSERT 35:52

With that done we can even go back to using the plain rank function: reminderText.rank, And the results will still be sorted the same way. That’s because we now have default weights for FTS5 to use under the hood when they are not specified. Next time: highlights and snippets

INSERT 37:01

This is all looking incredible. We are now tapping into the power of full-text search by allowing it to rank our documents based on search terms and a variety of other factors. We can even provide a custom set of weights so that a search term appearing in the title of a reminder scores a little more than if it appears in the notes or tags. And we are doing it all in a type-safe and schema-safe manner. Stephen

INSERT 37:24

And it may be hard to believe, but there is even more power lurking in the shadows of the FTS5 module in SQLite. Let’s start with search term highlighting. The FTS5 module comes with a function called highlight that allows you to highlight the search terms inside the fields you are searching. This makes it possible to create very friendly search UIs where not only are the search results being displayed to the user, but the actual bits of text that were matched from the query are highlighted in the view.

INSERT 37:48

Sounds complex, but FTS5 makes this a breeze. Let’s check it out…next time! References SQLite FTS5 Extension FTS5 is an SQLite virtual table module that provides full-text search functionality to database applications. https://www.sqlite.org/fts5.html SQLiteData Brandon Williams & Stephen Celis A fast, lightweight replacement for SwiftData, powered by SQL. https://github.com/pointfreeco/sqlite-data StructuredQueries A library for building SQL in a safe, expressive, and composable manner. https://github.com/pointfreeco/swift-structured-queries Downloads Sample code 0337-fts-pt4 Point-Free A hub for advanced Swift programming. Brought to you by Brandon Williams and Stephen Celis . Content Become a member The Point-Free Way Beta previews Gifts Videos Collections Free clips Blog More About Us Community Slack Mastodon Twitter BlueSky GitHub Contact Us Privacy Policy © 2026 Point-Free, Inc. All rights are reserved for the videos and transcripts on this site. All other content is licensed under CC BY-NC-SA 4.0 , and the underlying source code to run this site is licensed under the MIT License .