Tries

Notes

A trie is another type of tree structure. The word “trie” comes from the word “retrieval,” but is usually pronounced like “try.” For our purposes, the nodes in a trie are arrays. We might use a trie to store a dictionary of words, as this diagram suggests.

In this trie, each index in the array stands for a letter of the alphabet. Each of those indices also points to another array of letters. In the above, we’re storing the names of famous scientists, e.g. Mendel, Mendeleev, Pavlov, Pasteur, and Turing. The Δ symbol denotes the end of a name. We have to keep track of where words end so that if one word actually contains another word (e.g. Mendeleev and Mendel), we know that both words exist. In code, the Δ symbol could be a Boolean flag in each node.

Here’s a definition of a node.

Note that we’re not actually storing words in the trie, we’re simply storing a series of pointers and a Boolean value.

This simple trie contains two words: “bat” and “zoom”.

Let’s walk through an example of looking up the key “bat” in this trie.

We’ll begin our lookup at the root node.

We’ll take the first letter of our key, “b,” and find the corresponding spot in our children array. We’ll look at the second index -- index 1 -- for “b”. In general, if we have an alphabetic character c, we could determine the corresponding spot in the children array using a calculation like this:

int index = tolower(c) - ‘a’;

In this case, the pointer in our children array at index 1 is not NULL, so we’ll continue looking up the key “bat”. If we ever encountered a NULL pointer at the proper spot in the children array while looking up a key, we’d have to say that we couldn’t find anything for that key.

Now we’ll take the second letter of our key, “a”, and continue following pointers in this way, until we reach the end of our key.

Now we’ll take the third letter of our key, “t”, and continue following pointers in this way, until we reach the end of our key.

If we reach the end of the key without hitting any dead ends (NULL pointers), as is the case here, then we only have to check one more thing: was that key actually stored in the trie? In our diagram, this is represented by a check mark signifying that bool is_word is true.

Note how the key “zoo” is not in the dictionary, even though we could reach the end of this key without hitting a dead end?

Similarly, if we tried to look up the key “bath”, the second-to-last node’s array index corresponding to the letter H would have held a NULL pointer, so “bath” isn’t in the dictionary.

So how do we insert something into a trie? Let’s insert “zoo”.

In some ways, the process to insert something is similar to looking something up. We’ll start with the root node again, following pointers corresponding to the letters of our key.

Luckily, we’re able to follow pointers all the way until we reach the end of the key.

Luckily, we’re able to follow pointers all the way until we reach the end of the key.

Since “zoo” is a prefix of the word “zoom”, which is stored in our dictionary, we don’t need to allocate any new nodes. We can just set is_word to true at the appropriate node.

If we wanted to insert the key “bath” into the trie, we’d start at the root node and follow the pointers again. In this situation, we hit a dead end before we’re able to get to the end of the key.

So, we’ll need to allocate one new node for each remaining letter of our key. In this case, we’ll just need to allocate one new node.

Then we’ll need to make the h index of the previous node reference this new node.

And lastly, we set the new node’s is_word bool to true.

So, if we assume that key length is bounded by a fixed constant, both insertion and lookup are constant time operations for a trie. Notice that the number of items stored in the trie doesn’t affect the lookup or insertion time; it’s only impacted by the length of the key. By contrast, adding entries to a hash table tends to make future lookups slower.

While this may sound appealing at first, we should keep in mind that a favorable asymptotic complexity doesn’t mean that in practice, the data structure is necessarily beyond reproach. We must also consider that to store a word in a trie, we need, in the worst case, a number of nodes proportional to the length of the word itself. Tries tend to use a lot of space. That’s in contrast to a data structure like a hash table, where we only need one new node to store some key-value pair.

Slides ( / )

study50 slide
study50 slide
study50 slide
study50 slide
study50 slide
study50 slide
study50 slide
study50 slide
study50 slide
study50 slide
study50 slide
study50 slide
study50 slide
study50 slide
study50 slide
study50 slide
study50 slide

Videos

study50 video thumbnail

Wednesday, Week 6

An introduction to data structures
study50 video thumbnail

Kevin's Tries Short

Kevin gives a general overview