Last week I started the Yamlix project - a YAML generator in Elixir. This week I’ll continue working on Yamlix and will write out YAML for Maps and Lists.
I’ve put the code that’s been developed so far on GitHub as joekain/yamlix.
Adding Maps to the Representation Graph
We want to support maps so we write a test
test "it dumps maps of strings" do
map = %{"key1" => "value1", "key2" => "value2"}
assert Yamlix.dump(map) == """
---
key1: value1
key2: value2
...
"""
end
It fails:
1) test it dumps maps of strings (YamlixTest)
test/yamlix_test.exs:12
** (Protocol.UndefinedError) protocol String.Chars not implemented for %{"key1" => "value1", "key2" => "value2"}
stacktrace:
(elixir) lib/string/chars.ex:3: String.Chars.impl_for!/1
(elixir) lib/string/chars.ex:17: String.Chars.to_string/1
(yamlix) lib/yamlix.ex:7: Yamlix.dump/1
test/yamlix_test.exs:14
Finished in 0.03 seconds (0.03s on load, 0.00s on tests)
3 tests, 1 failures
The problem is that Yamlix only supports scalars and we sent the Map through the scalar path. We need to add a new Map subtype for Node
. Actually, we need to build nodes for maps. While writing the code I could not think of any real differences between Map nodes and Scalar nodes in terms of the struct used to hold their internal represntation. So, I need on eNode
thype but I need to construct the node for a Scalar and Map differently. Like this:
defmodule RepresentationGraph do
defmodule Node do
defstruct value: "", tag: ""
def new(%{} = map) do
new_map = Map.keys(map) |> List.foldl %{}, fn (key, acc) ->
value = Map.get(map, key)
Map.put(acc, key, Node.new(value))
end
%Node{value: new_map, tag: ""}
end
def new(scalar) do
%Node{value: scalar, tag: ""}
end
// ...
end
end
Now, there is only a Node
struct. And we have two functions heads for new/1
- one that creates a Node
for a Map
and the one from the last post that creates a Node
for a scalar.
new/1
for a map needs to create a second Map
that will be stored in the map. This new map has keys and values from the old map wrapped in RepresentationGraph.Node
structuress. This way the keys and the values can be serialized out as YAML.
The test still fails but the error message lets us confirm that we did what we set out to do:
1) test it dumps maps of strings (YamlixTest)
test/yamlix_test.exs:12
** (Protocol.UndefinedError) protocol String.Chars not implemented for %{ %RepresentationGraph.Node{tag: "", value: "key1"} => %RepresentationGraph.Node{tag: "", value: "value1"}, %RepresentationGraph.Node{tag: "", value: "key2"} => %RepresentationGraph.Node{tag: "", value: "value2"}}
stacktrace:
(elixir) lib/string/chars.ex:3: String.Chars.impl_for!/1
(elixir) lib/string/chars.ex:17: String.Chars.to_string/1
(yamlix) lib/yamlix.ex:7: Yamlix.dump/1
test/yamlix_test.exs:14
So we have
%{
# key => value
%RepresentationGraph.Node{tag: "", value: "key1"} => %RepresentationGraph.Node{tag: "", value: "value1"},
%RepresentationGraph.Node{tag: "", value: "key2"} => %RepresentationGraph.Node{tag: "", value: "value2"}
}
To get the test to pass we need to work on Yamlix.serialize/1
.
Canonical Strings
Looking more closely at the spec, I see that construction of the canonical string is part of building the representation graph. I think the right way to do this is to just implement the String.Chars
protocol for Node
. This will allow me to can call to_string/1
on any Node
. This is also a good chance for me to learn Elixir Protocols. I’ve yet to implement one. After reading a little bit on Protocols I wrote the following at the end of RepresentationGraph
defimpl String.Chars, for: Node do
def to_string(%Node{value: %{} = map, tag: _}) do
Map.keys(map) |> List.foldl "", fn key, acc ->
acc <> "\n#{key}: #{Map.get(map, key)}"
end
end
def to_string(%Node{value: v, tag: _}) do
Kernel.to_string(v)
end
end
To implement the String.Chars
protocol we use defimpl Strings.Chars, for:
and then the module that implements the protocol. In our case, it’s for Node
.
Then we implement the to_sting/1
functions for Node
. We have separate functions heads for the map and scalar case. The scalar case simply calls Kernel.to_string/1
to convert the scalar value in the Node
to a string using the function from the standard library.
to_string/1
for a map Node
converts each key and value to a string and then builds up a string with the YAML representation.
The only change to the Yamlix
module is to simplify serialize
:
@@ -8,7 +8,7 @@ defmodule Yamlix do
end
defp serialize(node) do
- to_string(Node.value(node))
+ to_string(node)
end
It can just call to_string/1
on the whole Node
instead of having to extract the value from the Node
.
Let’s write another test. Do we handle empty maps?
test "it dumps emtpy maps" do
assert Yamlix.dump(%{}) == """
---
...
"""
end
And this test passes.
Elixir Lists as YAML Sequences
Handling lists should be similar to maps. I’ll add support for Lists next.
First, a new test. We should handle empty lists:
test "it dumps emtpy lists" do
assert Yamlix.dump([]) == """
---
...
"""
end
This already passes. This must hit the scalar case though I’m a little suprised to_string
works.
Next, a non-empty list:
test "it dumps lists of strings" do
list = ["one", "two"]
assert Yamlix.dump(list) == """
---
- one
- two
...
"""
end
And now I have a failing test:
1) test it dumps lists of strings (YamlixTest)
test/yamlix_test.exs:36
Assertion with == failed
code: Yamlix.dump(list) == "--- \n- one\n- two\n...\n"
lhs: "--- onetwo\n...\n"
rhs: "--- \n- one\n- two\n...\n"
stacktrace:
test/yamlix_test.exs:38
So it looks like to_string
works on lists but just not the way I need it to. First, I’ll write a constructor for the list Node
def new([_x | _xs] = list) do
new_list = list |> Enum.map fn val ->
Node.new(val)
end
%Node{value: new_list, tag: ""}
end
and also a head for to_string
in the String.Chars
implementation:
def to_string(%Node{value: [_x | _xs] = list, tag: _}) do
list |> List.foldl "", fn val, acc ->
acc <> "\n- #{val}"
end
end
With this the tests pass.
Now, a little refactoring.
Refactor on Green
I keep writing things like this:
def new([_x | _xs] = list) do end
def to_string(%Node{value: %{} = map, tag: _}) do end
where I use both a structure like %{}
and a bound name like map
. This isn’t necessary, I don’t need the structure. I used the structure to force the match to be against the right type. But I can do this using a guard clause and it would be much clearer. I can rewrite the two example functions like this:
def new(list) when is_list(list) do end
def to_string(%Node{value: map, tag: _}) when is_map(map) do end
The tests still pass.
Other scalars
Our list of types to support continues with:
- Integers
- Floats
- Bool
- Atom
- Strings
1 and 5 are already done. 2 should work already but needs a test.
test "it dumps floats" do
assert Yamlix.dump(5.0) == "--- 5.0\n...\n"
end
So, floats are working. What about bools?
test "it dumps bools (true)" do
assert Yamlix.dump(true) == "--- true\n...\n"
end
test "it dumps bools (false)" do
assert Yamlix.dump(false) == "--- false\n...\n"
end
These two tests also pass.
The remaining type is atom, and I think I need to support tags to handle these. yamerl defines the following tag:
-define(TAG, "tag:yamerl,2012:atom")
So I think this test should do what I need:
test "it dumps atoms with yamerl tag" do
assert Yamlix.dump(:a) == "--- !<tag:yamerl,2012:atom> a\n...\n"
end
To handle this we need a custom constructor for Node
for atoms:
def new(scalar) when is_atom(scalar) do
%Node{value: scalar, tag: "!<tag:yamerl,2012:atom>"}
end
and we need to have scalar Node’s to_string/1
print out any tags if present:
def to_string(%Node{value: v, tag: t}) do
tag_and_space(t) <> Kernel.to_string(v)
end
defp tag_and_space(t) do
case t do
"" -> ""
tag -> tag <> " "
end
end
This allows the new test to pass but breaks the tests for booleans because true
and false
are also atoms. I don’t want to use the atom constructor , with its tag, for booleans:
def new(scalar) when is_atom(scalar) and not is_boolean(scalar) do
%Node{value: scalar, tag: "!<tag:yamerl,2012:atom>"}
end
and now all the tests pass. But I want stronger tests. I just made up the YAML output to compare against. How do I know its correct? I should really verify that yamerl can parse this back in. To do this, I’ll need to add yamerl as a dependency in the test environment. In mix.exs I added
{:yamerl, github: "yakaz/yamerl", only: :test}
to deps
and in test_helper.exs I added the line
Application.start(:yamerl)
to start the yamerl server.
Do you remember back in the last Yamlix post that I mentioned that yamerl accepts a list of extension modules to extend the parser? Well, it turns out that parsing atoms requires such an extension. yamerl includes the extended module yamerl_node_erlang_atom for this purpose. So, to verify my YAML output is loadable I have the following test:
test "it dumps atoms that can be read by yamerl" do
assert [:a] == Yamlix.dump(:a)
|> String.to_char_list
|> :yamerl_constr.string([{:node_mods, [:yamerl_node_erlang_atom]}])
end
The proplist passed to :yamerl_constr.string/2
loads the extension module.
This test passes!
Time to refactor again
I want to clean up the tests. The new :yamerl
based test is too verbose and obscures the intention of the test. I expect to write more tests of this form so I think it would be valuable to clean it up.
def pasre_with_yamerl(str) do
str
|> String.to_char_list
|> :yamerl_constr.string([{:node_mods, [:yamerl_node_erlang_atom]}])
end
test "it dumps atoms that can be read by yamerl" do
assert [:a] == Yamlix.dump(:a) |> pasre_with_yamerl
end
Next Steps
In this post we implemented rudimentary support for lists, maps and basic Elixir types in Yamlix. This means we have basic support for all 3 types supported by YAML: sequences, maps, and scalars. On top of this we will be able to build a complete YAML generator. Future posts will work through the YAML spec to add full support for YAML 1.2.