Last week I built a process for comparing benchmark results. This week I applied that system to my Game of Life implementation. I must admit that I already did some of the optmization work before putting this tool together. Not having a process for deciding if an optimization was really a win or not was the motivation for going back and building the tools. So, over the last week I automated my process for benchmark comparison and started testing my so-called optmizations.
Automating the Collection Process With Mix
If I was going to start optimizing I needed a convinient way to run the benchmark and collect the results. I decided to add a Mix task. I started by writing a simple task named lifebench
to run my benchmark 10 times and write the results to a file. You can read over the code on github. I didn’t use the simpler name bench
as it conflicts with the wonderful benchfella task which I already have included in my project.
At the time I included my lifebench task directly in my Game of Life project. This certainly wasn’t the right place for it. I did this for simplicty while working out the details. The next step will be to extract the code and generlize it into either a new package, or potentially integrate it into benchfella. I treated this work as a development splike. Once I’ve worked out the details I can go back and write a cleaner version.
Writing the lifebench task was a good learning experience. The actual mechanics of running a function 10 times and writing the data to a file was straightforward but I learned about Mix tasks including how to compile the project before starting the task.
When I was done I could collect simple results files like this:
14920470
15010309
14921889
15186957
14811559
14783809
15121291
14931534
14929569
15176096
The file simply contains the time for each run of the benchmark. Later, I may need more data for but for my purposes this was enough.
Automating the Comparison Process With Mix
After automating collection I was able to collect results but I still needed to automate the comparison process. Last week I built the process for comparing benchmark results as a spreadsheet. But using this process was a little combersome. It meant collecting results and pasting them into a spreadsheet. Including the results here in a blog post meant capturing a screenshot. Including the results in my notes was an even bigger hassle. I wanted an easy to use, plain-text solution to lower the overhead of benchmarking, comparing results and writing about the process.
I wrote another Mix task, this time named lifebench.cmp
which you can find in the github repo. It would take two of the result files produced by lifebench
, compare them, and report on the statistical significance. This task implemented the process I described last week.
This was a fun task to write as it was challenging to pose the task as a parse and transform style execution. But I was very happy with the end result.
I used the statistics package to do some of the stats. Though I found that I had to reimplement some things in order to get accurate results.
First, the Statistics.stdev
, when used on samples from a population, computes the uncorrected sample standard deviation while I needed the corrected version. I was able to write up the following implementation of the corrected sample standard deviation in Elixir:
defp corrected_sample_stdev(list) do
mean = Statistics.mean(list)
(Enum.map(list, fn(x) -> (mean - x) * (mean - x) end) |> Enum.sum) / (Enum.count(list) - 1)
|> :math.sqrt
end
I’m still trying to decide if I like this style of using | > inside longer expressions. |
Next, the Statistics.Distributions
module includes the t-distribution however, the comments say that it is an approximation. I found that it was a bit inaccurate for my needs. For example:
Statistics.Distributions.T.cdf(3.057950289696126, 18)
returned 1.0044. Butm this should have been the proability that a random sample from the t distribution with 18 degrees of freedom is less than 3.057950289696126. The probability is 1.044? That just can’t be.
Now, I can’t do any better at implementing the CDF for the t-distribution than Max Sharples has created in the Statistics
module. As he points out in the comments better math support is required to implement the CDF properly.
However, I could simply automate the look up of critical T values form a table. The tables are reasonably accurate and the critical values are all I really need for my purposes. I quickly put this together by importing a table and writing a set of functions to search through it. When this was done it worked just fine for lifebench.cmp
Comparing Benchmark Results
With my tasks put together I could finally compare benchmark results. I collected a couple of different results based on some optimizations I was working on. (More details later). Here’s an example of a successful optimization.
base: consolidated protocols:
14920470 14693408
15010309 14795711
14921889 14649470
15186957 14683630
14811559 14904153
14783809 14985803
15121291 14870837
14931534 14933960
14929569 14834048
15176096 14776376
$ mix lifebench.cmp notes/base notes/consolidated
14979348.3 -> 14812739.6 (-1.11%) with p < 0.005
t = 2.9010263661034448, 18 degrees of freedom
Consolidated protocols are a statistically significant improvement of 1.11% in runtime. That is, it’s a win.
I tested another potential optimization:
base: precomputed:
14920470 15451593
15010309 15004750
14921889 14722077
15186957 14619587
14811559 14962463
14783809 14758489
15121291 14815883
14931534 15061693
14929569 14875596
15176096 14731563
$ mix lifebench.cmp notes/base-ce19ac6 notes/precomputed
14979348.3 -> 14900369.4 (-0.53%) with p < 1
t = 0.9003947139724158, 18 degrees of freedom
Here there the statistics do not support a significant change in performance based on this code change. I discarded this optimization.
Next Week’s Post
Next week I’ll discuss the optimiations I performed in more detail. I’ll continue to use the lifebench
and lifebench.cmp
tasks in order to measure and compare the results accurately.
I’m also working on extracting lifebench
and lifebench.cmp
into standalone Mix tasks that you and I can use with other projects. I’l post an annoucement and link once they are ready.