Last week I built a process for comparing benchmark results. This week I applied that system to my Game of Life implementation. I must admit that I already did some of the optmization work before putting this tool together. Not having a process for deciding if an optimization was really a win or not was the motivation for going back and building the tools. So, over the last week I automated my process for benchmark comparison and started testing my so-called optmizations.

## Automating the Collection Process With Mix

If I was going to start optimizing I needed a convinient way to run the benchmark and collect the results. I decided to add a Mix task. I started by writing a simple task named `lifebench`

to run my benchmark 10 times and write the results to a file. You can read over the code on github. I didn’t use the simpler name `bench`

as it conflicts with the wonderful benchfella task which I already have included in my project.

At the time I included my lifebench task directly in my Game of Life project. This certainly wasn’t the right place for it. I did this for simplicty while working out the details. The next step will be to extract the code and generlize it into either a new package, or potentially integrate it into benchfella. I treated this work as a development splike. Once I’ve worked out the details I can go back and write a cleaner version.

Writing the lifebench task was a good learning experience. The actual mechanics of running a function 10 times and writing the data to a file was straightforward but I learned about Mix tasks including how to compile the project before starting the task.

When I was done I could collect simple results files like this:

```
14920470
15010309
14921889
15186957
14811559
14783809
15121291
14931534
14929569
15176096
```

The file simply contains the time for each run of the benchmark. Later, I may need more data for but for my purposes this was enough.

## Automating the Comparison Process With Mix

After automating collection I was able to collect results but I still needed to automate the comparison process. Last week I built the process for comparing benchmark results as a spreadsheet. But using this process was a little combersome. It meant collecting results and pasting them into a spreadsheet. Including the results here in a blog post meant capturing a screenshot. Including the results in my notes was an even bigger hassle. I wanted an easy to use, plain-text solution to lower the overhead of benchmarking, comparing results and writing about the process.

I wrote another Mix task, this time named `lifebench.cmp`

which you can find in the github repo. It would take two of the result files produced by `lifebench`

, compare them, and report on the statistical significance. This task implemented the process I described last week.

This was a fun task to write as it was challenging to pose the task as a parse and transform style execution. But I was very happy with the end result.

I used the statistics package to do some of the stats. Though I found that I had to reimplement some things in order to get accurate results.

First, the `Statistics.stdev`

, when used on samples from a population, computes the uncorrected sample standard deviation while I needed the corrected version. I was able to write up the following implementation of the corrected sample standard deviation in Elixir:

I’m still trying to decide if I like this style of using | > inside longer expressions. |

Next, the `Statistics.Distributions`

module includes the t-distribution however, the comments say that it is an approximation. I found that it was a bit inaccurate for my needs. For example:

returned 1.0044. Butm this should have been the proability that a random sample from the *t* distribution with 18 degrees of freedom is less than 3.057950289696126. The probability is 1.044? That just can’t be.

Now, I can’t do any better at implementing the CDF for the t-distribution than Max Sharples has created in the `Statistics`

module. As he points out in the comments better math support is required to implement the CDF properly.

However, I could simply automate the look up of critical T values form a table. The tables are reasonably accurate and the critical values are all I really need for my purposes. I quickly put this together by importing a table and writing a set of functions to search through it. When this was done it worked just fine for `lifebench.cmp`

## Comparing Benchmark Results

With my tasks put together I could finally compare benchmark results. I collected a couple of different results based on some optimizations I was working on. (More details later). Here’s an example of a successful optimization.

```
base: consolidated protocols:
14920470 14693408
15010309 14795711
14921889 14649470
15186957 14683630
14811559 14904153
14783809 14985803
15121291 14870837
14931534 14933960
14929569 14834048
15176096 14776376
$ mix lifebench.cmp notes/base notes/consolidated
14979348.3 -> 14812739.6 (-1.11%) with p < 0.005
t = 2.9010263661034448, 18 degrees of freedom
```

Consolidated protocols are a statistically significant improvement of 1.11% in runtime. That is, it’s a win.

I tested another potential optimization:

```
base: precomputed:
14920470 15451593
15010309 15004750
14921889 14722077
15186957 14619587
14811559 14962463
14783809 14758489
15121291 14815883
14931534 15061693
14929569 14875596
15176096 14731563
$ mix lifebench.cmp notes/base-ce19ac6 notes/precomputed
14979348.3 -> 14900369.4 (-0.53%) with p < 1
t = 0.9003947139724158, 18 degrees of freedom
```

Here there the statistics do not support a significant change in performance based on this code change. I discarded this optimization.

## Next Week’s Post

Next week I’ll discuss the optimiations I performed in more detail. I’ll continue to use the `lifebench`

and `lifebench.cmp`

tasks in order to measure and compare the results accurately.

I’m also working on extracting `lifebench`

and `lifebench.cmp`

into standalone Mix tasks that you and I can use with other projects. I’l post an annoucement and link once they are ready.