Last week I built a process for comparing benchmark results. This week I applied that system to my Game of Life implementation. I must admit that I already did some of the optmization work before putting this tool together. Not having a process for deciding if an optimization was really a win or not was the motivation for going back and building the tools. So, over the last week I automated my process for benchmark comparison and started testing my so-called optmizations.
Automating the Collection Process With Mix
If I was going to start optimizing I needed a convinient way to run the benchmark and collect the results. I decided to add a Mix task. I started by writing a simple task named
lifebench to run my benchmark 10 times and write the results to a file. You can read over the code on github. I didn’t use the simpler name
bench as it conflicts with the wonderful benchfella task which I already have included in my project.
At the time I included my lifebench task directly in my Game of Life project. This certainly wasn’t the right place for it. I did this for simplicty while working out the details. The next step will be to extract the code and generlize it into either a new package, or potentially integrate it into benchfella. I treated this work as a development splike. Once I’ve worked out the details I can go back and write a cleaner version.
Writing the lifebench task was a good learning experience. The actual mechanics of running a function 10 times and writing the data to a file was straightforward but I learned about Mix tasks including how to compile the project before starting the task.
When I was done I could collect simple results files like this:
14920470 15010309 14921889 15186957 14811559 14783809 15121291 14931534 14929569 15176096
The file simply contains the time for each run of the benchmark. Later, I may need more data for but for my purposes this was enough.
Automating the Comparison Process With Mix
After automating collection I was able to collect results but I still needed to automate the comparison process. Last week I built the process for comparing benchmark results as a spreadsheet. But using this process was a little combersome. It meant collecting results and pasting them into a spreadsheet. Including the results here in a blog post meant capturing a screenshot. Including the results in my notes was an even bigger hassle. I wanted an easy to use, plain-text solution to lower the overhead of benchmarking, comparing results and writing about the process.
I wrote another Mix task, this time named
lifebench.cmp which you can find in the github repo. It would take two of the result files produced by
lifebench, compare them, and report on the statistical significance. This task implemented the process I described last week.
This was a fun task to write as it was challenging to pose the task as a parse and transform style execution. But I was very happy with the end result.
I used the statistics package to do some of the stats. Though I found that I had to reimplement some things in order to get accurate results.
Statistics.stdev, when used on samples from a population, computes the uncorrected sample standard deviation while I needed the corrected version. I was able to write up the following implementation of the corrected sample standard deviation in Elixir:
|I’m still trying to decide if I like this style of using||> inside longer expressions.|
Statistics.Distributions module includes the t-distribution however, the comments say that it is an approximation. I found that it was a bit inaccurate for my needs. For example:
returned 1.0044. Butm this should have been the proability that a random sample from the t distribution with 18 degrees of freedom is less than 3.057950289696126. The probability is 1.044? That just can’t be.
Now, I can’t do any better at implementing the CDF for the t-distribution than Max Sharples has created in the
Statistics module. As he points out in the comments better math support is required to implement the CDF properly.
However, I could simply automate the look up of critical T values form a table. The tables are reasonably accurate and the critical values are all I really need for my purposes. I quickly put this together by importing a table and writing a set of functions to search through it. When this was done it worked just fine for
Comparing Benchmark Results
With my tasks put together I could finally compare benchmark results. I collected a couple of different results based on some optimizations I was working on. (More details later). Here’s an example of a successful optimization.
base: consolidated protocols: 14920470 14693408 15010309 14795711 14921889 14649470 15186957 14683630 14811559 14904153 14783809 14985803 15121291 14870837 14931534 14933960 14929569 14834048 15176096 14776376 $ mix lifebench.cmp notes/base notes/consolidated 14979348.3 -> 14812739.6 (-1.11%) with p < 0.005 t = 2.9010263661034448, 18 degrees of freedom
Consolidated protocols are a statistically significant improvement of 1.11% in runtime. That is, it’s a win.
I tested another potential optimization:
base: precomputed: 14920470 15451593 15010309 15004750 14921889 14722077 15186957 14619587 14811559 14962463 14783809 14758489 15121291 14815883 14931534 15061693 14929569 14875596 15176096 14731563 $ mix lifebench.cmp notes/base-ce19ac6 notes/precomputed 14979348.3 -> 14900369.4 (-0.53%) with p < 1 t = 0.9003947139724158, 18 degrees of freedom
Here there the statistics do not support a significant change in performance based on this code change. I discarded this optimization.
Next Week’s Post
Next week I’ll discuss the optimiations I performed in more detail. I’ll continue to use the
lifebench.cmp tasks in order to measure and compare the results accurately.
I’m also working on extracting
lifebench.cmp into standalone Mix tasks that you and I can use with other projects. I’l post an annoucement and link once they are ready.