Yamlix - Lists and Maps

Last week I started the Yamlix project - a YAML generator in Elixir. This week I’ll continue working on Yamlix and will write out YAML for Maps and Lists.

I’ve put the code that’s been developed so far on GitHub as joekain/yamlix.

Adding Maps to the Representation Graph

We want to support maps so we write a test

test "it dumps maps of strings" do
  map = %{"key1" => "value1", "key2" => "value2"}
  assert Yamlix.dump(map) == """
  --- 
  key1: value1
  key2: value2
  ...
  """
end

It fails:

  1) test it dumps maps of strings (YamlixTest)
     test/yamlix_test.exs:12
     ** (Protocol.UndefinedError) protocol String.Chars not implemented for %{"key1" => "value1", "key2" => "value2"}
     stacktrace:
       (elixir) lib/string/chars.ex:3: String.Chars.impl_for!/1
       (elixir) lib/string/chars.ex:17: String.Chars.to_string/1
       (yamlix) lib/yamlix.ex:7: Yamlix.dump/1
       test/yamlix_test.exs:14



Finished in 0.03 seconds (0.03s on load, 0.00s on tests)
3 tests, 1 failures

The problem is that Yamlix only supports scalars and we sent the Map through the scalar path. We need to add a new Map subtype for Node. Actually, we need to build nodes for maps. While writing the code I could not think of any real differences between Map nodes and Scalar nodes in terms of the struct used to hold their internal represntation. So, I need on eNode thype but I need to construct the node for a Scalar and Map differently. Like this:

defmodule RepresentationGraph do
  defmodule Node do
    defstruct value: "", tag: ""

    def new(%{} = map) do
      new_map = Map.keys(map) |> List.foldl %{}, fn (key, acc) ->
        value = Map.get(map, key)
        Map.put(acc, key, Node.new(value))
      end
      %Node{value: new_map, tag: ""}
    end

    def new(scalar) do
      %Node{value: scalar, tag: ""}
    end
    
    // ...
  end
end

Now, there is only a Node struct. And we have two functions heads for new/1 - one that creates a Node for a Map and the one from the last post that creates a Node for a scalar.

new/1 for a map needs to create a second Map that will be stored in the map. This new map has keys and values from the old map wrapped in RepresentationGraph.Node structuress. This way the keys and the values can be serialized out as YAML.

The test still fails but the error message lets us confirm that we did what we set out to do:

    1) test it dumps maps of strings (YamlixTest)
       test/yamlix_test.exs:12
       ** (Protocol.UndefinedError) protocol String.Chars not implemented for %{ %RepresentationGraph.Node{tag: "", value: "key1"} => %RepresentationGraph.Node{tag: "", value: "value1"}, %RepresentationGraph.Node{tag: "", value: "key2"} => %RepresentationGraph.Node{tag: "", value: "value2"}}
       stacktrace:
         (elixir) lib/string/chars.ex:3: String.Chars.impl_for!/1
         (elixir) lib/string/chars.ex:17: String.Chars.to_string/1
         (yamlix) lib/yamlix.ex:7: Yamlix.dump/1
         test/yamlix_test.exs:14

So we have

%{
  # key                                             => value
  %RepresentationGraph.Node{tag: "", value: "key1"} => %RepresentationGraph.Node{tag: "", value: "value1"}, 
  %RepresentationGraph.Node{tag: "", value: "key2"} => %RepresentationGraph.Node{tag: "", value: "value2"}
}

To get the test to pass we need to work on Yamlix.serialize/1.

Canonical Strings

Looking more closely at the spec, I see that construction of the canonical string is part of building the representation graph. I think the right way to do this is to just implement the String.Chars protocol for Node. This will allow me to can call to_string/1 on any Node. This is also a good chance for me to learn Elixir Protocols. I’ve yet to implement one. After reading a little bit on Protocols I wrote the following at the end of RepresentationGraph

defimpl String.Chars, for: Node do
  def to_string(%Node{value: %{} = map, tag: _}) do
    Map.keys(map) |> List.foldl "", fn key, acc ->
      acc <> "\n#{key}: #{Map.get(map, key)}"
    end
  end

  def to_string(%Node{value: v, tag: _}) do
    Kernel.to_string(v)
  end
end

To implement the String.Chars protocol we use defimpl Strings.Chars, for: and then the module that implements the protocol. In our case, it’s for Node.

Then we implement the to_sting/1 functions for Node. We have separate functions heads for the map and scalar case. The scalar case simply calls Kernel.to_string/1 to convert the scalar value in the Node to a string using the function from the standard library.

to_string/1 for a map Node converts each key and value to a string and then builds up a string with the YAML representation.

The only change to the Yamlix module is to simplify serialize:

@@ -8,7 +8,7 @@ defmodule Yamlix do
   end

   defp serialize(node) do
-    to_string(Node.value(node))
+    to_string(node)
   end

It can just call to_string/1 on the whole Node instead of having to extract the value from the Node.

Let’s write another test. Do we handle empty maps?

test "it dumps emtpy maps" do
  assert Yamlix.dump(%{}) == """
  --- 
  ...
  """
end

And this test passes.

Elixir Lists as YAML Sequences

Handling lists should be similar to maps. I’ll add support for Lists next.

First, a new test. We should handle empty lists:

test "it dumps emtpy lists" do
  assert Yamlix.dump([]) == """
  --- 
  ...
  """
end

This already passes. This must hit the scalar case though I’m a little suprised to_string works.

Next, a non-empty list:

test "it dumps lists of strings" do
  list = ["one", "two"]
  assert Yamlix.dump(list) == """
  --- 
  - one
  - two
  ...
  """
end

And now I have a failing test:

  1) test it dumps lists of strings (YamlixTest)
     test/yamlix_test.exs:36
     Assertion with == failed
     code: Yamlix.dump(list) == "--- \n- one\n- two\n...\n"
     lhs:  "--- onetwo\n...\n"
     rhs:  "--- \n- one\n- two\n...\n"
     stacktrace:
       test/yamlix_test.exs:38

So it looks like to_string works on lists but just not the way I need it to. First, I’ll write a constructor for the list Node

def new([_x | _xs] = list) do
  new_list = list |> Enum.map fn val ->
    Node.new(val)
  end
  %Node{value: new_list, tag: ""}
end

and also a head for to_string in the String.Chars implementation:

def to_string(%Node{value: [_x | _xs] = list, tag: _}) do
  list |> List.foldl "", fn val, acc ->
    acc <> "\n- #{val}"
  end
end

With this the tests pass.

Now, a little refactoring.

Refactor on Green

I keep writing things like this:

def new([_x | _xs] = list) do end
def to_string(%Node{value: %{} = map, tag: _}) do end

where I use both a structure like %{} and a bound name like map. This isn’t necessary, I don’t need the structure. I used the structure to force the match to be against the right type. But I can do this using a guard clause and it would be much clearer. I can rewrite the two example functions like this:

def new(list) when is_list(list) do end
def to_string(%Node{value: map, tag: _}) when is_map(map) do end

The tests still pass.

Other scalars

Our list of types to support continues with:

Integers
Floats
Bool
Atom
Strings

1 and 5 are already done. 2 should work already but needs a test.

test "it dumps floats" do
  assert Yamlix.dump(5.0) == "--- 5.0\n...\n"
end

So, floats are working. What about bools?

test "it dumps bools (true)" do
  assert Yamlix.dump(true) == "--- true\n...\n"
end

test "it dumps bools (false)" do
  assert Yamlix.dump(false) == "--- false\n...\n"
end

These two tests also pass.

The remaining type is atom, and I think I need to support tags to handle these. yamerl defines the following tag:

-define(TAG, "tag:yamerl,2012:atom")

So I think this test should do what I need:

test "it dumps atoms with yamerl tag" do
  assert Yamlix.dump(:a) == "--- !<tag:yamerl,2012:atom> a\n...\n"
end

To handle this we need a custom constructor for Node for atoms:

def new(scalar) when is_atom(scalar) do
  %Node{value: scalar, tag: "!<tag:yamerl,2012:atom>"}
end

and we need to have scalar Node’s to_string/1 print out any tags if present:

def to_string(%Node{value: v, tag: t}) do
  tag_and_space(t) <> Kernel.to_string(v)
end

defp tag_and_space(t) do
  case t do
    "" -> ""
    tag -> tag <> " "
  end
end

This allows the new test to pass but breaks the tests for booleans because true and false are also atoms. I don’t want to use the atom constructor , with its tag, for booleans:

def new(scalar) when is_atom(scalar) and not is_boolean(scalar) do
  %Node{value: scalar, tag: "!<tag:yamerl,2012:atom>"}
end

and now all the tests pass. But I want stronger tests. I just made up the YAML output to compare against. How do I know its correct? I should really verify that yamerl can parse this back in. To do this, I’ll need to add yamerl as a dependency in the test environment. In mix.exs I added

{:yamerl, github: "yakaz/yamerl", only: :test}

to deps and in test_helper.exs I added the line

Application.start(:yamerl)

to start the yamerl server.

Do you remember back in the last Yamlix post that I mentioned that yamerl accepts a list of extension modules to extend the parser? Well, it turns out that parsing atoms requires such an extension. yamerl includes the extended module yamerl_node_erlang_atom for this purpose. So, to verify my YAML output is loadable I have the following test:

test "it dumps atoms that can be read by yamerl" do
  assert [:a] == Yamlix.dump(:a) 
  |> String.to_char_list 
  |> :yamerl_constr.string([{:node_mods, [:yamerl_node_erlang_atom]}])
end

The proplist passed to :yamerl_constr.string/2 loads the extension module.

This test passes!

Time to refactor again

I want to clean up the tests. The new :yamerl based test is too verbose and obscures the intention of the test. I expect to write more tests of this form so I think it would be valuable to clean it up.

  def pasre_with_yamerl(str) do
    str
    |> String.to_char_list
    |> :yamerl_constr.string([{:node_mods, [:yamerl_node_erlang_atom]}])
  end

  test "it dumps atoms that can be read by yamerl" do
    assert [:a] == Yamlix.dump(:a) |> pasre_with_yamerl
  end

Next Steps

In this post we implemented rudimentary support for lists, maps and basic Elixir types in Yamlix. This means we have basic support for all 3 types supported by YAML: sequences, maps, and scalars. On top of this we will be able to build a complete YAML generator. Future posts will work through the YAML spec to add full support for YAML 1.2.

Joseph Kain