Over the past several weeks I’ve been writing about the design and implementation of an Elixir / OTP application. This week, I’m going to step back from the global design and focus on a local design feature for Domain Scrapper by adding a cache to the URL Unshortener using ETS.
Erlang Term Storage
ETS, or Erlang Term Storage, is a key value store for the Erlang runtime system. We can make use of it to build a cache in Elixir.
I have not used ETS before; I have read about how to build a cache in Saša Jurić’s ‘Elixir in Action’. So, given my knowledge but lack of experience I want to do two things while approaching this problem.
- I want to TDD my solution.
- I want to focus first on the interface and integration of the cache into the system and second on the details of ETS.
Since ETS is a key / value store it is similar, in interface, to Elixir’s
Maps. I’m familiar with
Map so I want to start by using it as a stand-in for ETS. I want to get a cache working with good tests and integration into my system using a
Map and then switch my implementation over to ETS.
Writing the tests and designing the interfaces
I think of unit tests as the way to design the interface to my module. Then passing the tests becomes an exercise in writing a working implementation of that interface.
Here are the tests I started with:
These tests describe an interface for a server started with
start_link/0. Once started the server can
check for an existing cache entry or
add a cache entry.
add can either add a new entry or replace an existing one. The server also registers itself with the name UnshorteningPool.Cache.
I chose to use a server to serialize access to the cache itself.
Based on these tests I wrote up this simple GenServer implementation:
Using Saša Jurić’s ExActor helps keep our code nice and focused. And we can see that this server is just a simple wrapper over
start_link/0 set’s the server’s initial state to an empty
check/1 looks up an entry from the
Map or returns
false if no entry exists.
add/2 is literally a wrapper around
This implementation is simple and passes the tests.
Integrating the cache into the unshortener
The next step is to integrate our new cache into the unshortening process. The cache check belongs in the worker module. It is the worker’s responsibility to unshorten and return results. So it can do the new work of checking the cache. Let’s take a look at what’s currently in the worker and then figure out how to add in a cache check.
The worker is simply:
It is the
work/1 cast function that’s important. We can simply add a cache check to it like this:
Here we add a
check_cache/2 function which takes the cache key (short url) to be checked and a function. The function computes the value (long URL) to store in the cache in the case of a miss. Hit or miss,
check_cache/2 returns the long URL.
That said, I didn’t really like this if/then imperative style when I wrote it. I ended up rewriting it this way:
There was one more step required to integrate the cache into my system. I had to start up the
Cache server. I did this by adding it as a worker in my supervision tree in unshortening_pool.ex:
This change actually broke my tests. And it broke them in two different ways. The first, was that each test called
Cache.start_link to create a new test. This would now fail because the cache was already registered and started up with the application. To fix this I simply removed the calls to
Cache.start_link from the tests. This was a satisfying change in that it actually makes the tests more focused. That is, they focus on the things they say they test and none of the tests claim to test
The second way that the tests broke is that the first test:
continue to fail after removing the call to
start_link. This was due to a race. Because ExUnit runs all my tests in parallel one of the other tests could add the URL “http:///www.example.com” such that this first test would look it up. To fix this I just used a unique URL in this test:
Arguably, I could change the name of the test as well but I haven’t done that.
With these changes the unit tests pass and when I run the system end-to-end everything works!
Only at this stage did I start working with ETS.
This was very straight forward, just replacing Map calls with ETS calls:
The changes are:
start_link/0, instead of setting state to a new
Map, sets the state to a a new ETS table created with
:ets.new/2. I’ve taken the default options by passing
. By default the table acts as a set which means a term exists as key only once. Also by default, the access allowed is
protected which means that only the cache server can write to the table. Other processes can read from the table though I don’t make use of this here.
check/1, instead of looking up an entry from the
Map, looks up an entry from the ETS table. The format of the return value is a little different so we use the
case special form to parse it and conform to the
Cache’s return semantics.
add/2, instead of wrapping
:ets.insert/2 to insert a new value in the table.
And that’s it. That’s the extent of the changes to switch from a
Map to ETS. I did’t need to change any of the tests or the integration into the system itself. The tests still pass and when I run the system end-to-end it still works.
There are a few things I could choose to do differently here with respect to the access model for the ETS table. First, I could have set the access to
private instead to prevent other processes from reading directly from the table. This would force access to go through the server. Alternatively, I could choose to change my interface and allow workers to read directly from the ETS table. This would save them a round trip of messages through the
Cache server. It should also reduce the load on the
Cache server. This might be a good change to explore if the
Cache server performance needs improvement.
I was very happy with the results of this iteration of the project. The ETS specific part was actually quite small. I’m happy about the way thing worked out with my plan to start by using a
Map and plugin in ETS at the end.
In my next post I’ll continue refining the local design.