I’ve been writing a series of posts on optimizing Elixir. In this post I’ll show how to use detailed work analysis to verify that my code isn’t doing more than I think it should be.
Last week I implemented a parallel map to process the Game of Life board. I can compare profiles between the serial and parallel versions:
FUNCTION CALLS % TIME [uS / CALLS] -------- ----- --- ---- [----------] 'Elixir.Access':get/2 419073 0.53 85973 [ 0.21] 'Elixir.Enumerable.List':reduce/3 657360 0.54 86879 [ 0.13] 'Elixir.Access':'impl_for!'/1 419073 0.57 91691 [ 0.22] 'Elixir.Life':'-list_of_neighbors/3-fun-0-'/6 369765 0.60 96884 [ 0.26] code_server:call/2 419077 1.59 256773 [ 0.61] maps:put/3 49302 12.09 1949883 [ 39.55] 'Elixir.Access':impl_for/1 419073 14.65 2363776 [ 5.64] maps:find/2 419067 63.10 10178202 [ 24.29]
FUNCTION CALLS % TIME [uS / CALLS] -------- ----- --- ---- [----------] 'Elixir.Access':get/2 419073 0.53 147563 [ 0.35] 'Elixir.Code':ensure_compiled/1 419073 0.57 158133 [ 0.38] code:ensure_loaded/1 419143 0.61 168440 [ 0.40] 'Elixir.Enumerable.List':reduce/3 698450 0.62 170647 [ 0.24] erlang:spawn_link/3 325 0.65 179508 [ 552.33] 'Elixir.Life':'-list_of_neighbors/3-fun-0-'/6 369765 0.65 179930 [ 0.49] 'Elixir.Access':'impl_for!'/1 419073 0.76 210256 [ 0.50] code_server:call/2 419143 1.99 552844 [ 1.32] maps:put/3 49302 6.95 1928181 [ 39.11] 'Elixir.Access':impl_for/1 419073 12.32 3415162 [ 8.15] maps:find/2 419067 69.05 19148639 [ 45.69]
Profiling Measures Work Done
If you look at the
CALLS column in the profiles then you can see that the amount of work hasn’t changed between the serial and parallel implementations even though the runtime has changed. For example, both versions show 419067 calls to
maps:find/2 and 419073 calls to
'Elixir.Access':get/2. This is expected as I didn’t change the amount of work that was done in the Game of Life I only distributed it across multiple CPUs. All the same calls are still made.
The number of calls listed in the
CALLS column gives a measure of work done by the program. And you can use the data shown in the profile to help confirm your understanding of your code. That is, you can ask the question
Do I expect my program to do the amount of work shown in the profile?
You can answer this question though a process called Work Analysis.
The process of work analysis consists of two steps
- Read your program and come up with an expectation, or hypothesis, about work done by the program.
- Run your program using a profiler and measure the actual work done.
In an ideal situation step 2 confirms your expectation from step 1. But, if the expected and actual work done don’t match then you have made an incorrect assumption about your program. This means that either your analysis was flawed, or your program has a bug that causes it to do the wrong amount of work. Following the process gives you an oppourtunity to improve your understanding of your program’s behavior.
In terms of performance analysis, work analysis can be used to find sources of unexpected work. For example, you may find a bug in your code that leads to duplicated or unnecessary work. Once found this excess work can be removed. It is also possible to find that your program has a bug that causes it not to do enough work. Fixing this kind of bug may lead to more being done and worse performance.
Analysis of Game of Life
I want to try to answer a specific question: Do I expect my Game of Life to make 419067 calls to
maps:find/2? To answer this I first go through some back-of-the-envelope-calculations
- 99x84 cells on the board played for 5 generations implies 99 * 84 * 5 = 41,580 cells updated
- 8 lookups per cell for neighbors during update update implies 42,500 * 8 = 332,640 neightbor lookups
- 1 additional lookup per cell (the cell itself) during update implies 340,000 + 42,500 = 374,220 total lookups
Based on this calculation I expect 374,220 lookups but have 419,067. I need to refine my expectation.
Thinking about my program more, I realaize that I have not accounted for any calls to
Board.display. I know that the bulk of the work is done updating the board to compute each generation and believe the display is much less work. In fact, the current work analysis tells me it can account for 8.7% of the work at most.
At this point I don’t need to do detailed work analysis on
Board.display. Instead I should remove that work from the analysis. The easiest way to disregard the work for
Board.display is to just remove it from the program. I comment out the calls to
Board.display and profile again:
pmap: FUNCTION CALLS % TIME [uS / CALLS] -------- ----- --- ---- [----------] maps:find/2 369765 69.03 17808538 [ 48.16]
Note that this number, 369,765 matches exactly the number of calls to
'Elixir.Life':'-list_of_neighbors/3-fun-0-'/6 in the previous profiles which means that this is exactly the work for counting neighbors.
With that confirmation I feel confident in comparing the expected and actual work. I expect 374,220 calls to
maps:find/2 but measure only 369,765 calls. I’m missing work! Where is the missing work?
- 382,500 - 369,765 = 12,735 missing calls.
Over 5 generations implies 12,735 / 5 = 2,547 missing calls per generation.
- 374,220 - 369,765 = 4,455 missing calls
- Over 5 generations implies 4,455 / 5 = 891 missing calls per generation.
- 891 missing calls per generation = 99 * 9 which is exactly one row in board.
Based on this it seems very likely that my game of life processed only 83 rows of the glider1.dat file. Looking back at the data file I see that my data file isn’t what I expected. It has only 83 rows of data and 2 empty lines at the end of the file.
This Work Analysis has shown me exactly how much work is being done by game of life implementation. It uncovered some suprises along the way and has helped me to understand exactly what I have been measuring all this time. In terms of performance the amount of work done is reasonable. So work analysis hasn’t helped me, in this case, to improve the performance of my program. However, it was valuable to go through the analysis in order to confirm that there was no extra work being done.