I have to admit that I almost didn’t get a post out this week. I’ve been trying to write about my Blocking Queue and ways of using it in a larger application. But I’ve been blocked on this topic for some time. Then @Lectrick asked about generating YAML with Elixir in a comment on my recent post on parsing YAML in Elixir. I wasn’t able to find a module for generating YAML, so I decided to write one.
Going through the YAML specification (more on this later) showed me that writing a YAML generator is more involved than I thought. But, I was able to get a good start on it in this post and hopefully I will be able to continue developing it over several more posts.
Anyway, I’ve decided to call the module Yamlix, here’s how I went about starting it.
First, I put together a basic Mix project:
Then I added a LICENSE file for MIT license.
There’s a specification for YAML 1.2 here. I’ll do my best to follow this specification.
I’m going to need lots of tests. I’ll start with a simple integeration test and work my way down to unit tests.
Of couse, this test fails
because I haven’t written the
dump will be the public interface to Yamlix, so let’s write it. The YAML spec, section 3.1 describes a process for dumping YAML. Following this process,
dump should look something like this:
The scalar (or any input) is converted to an internal representation on the form of a graph. The graph, in turn, is serialized and turned into a linear stream. That stream is presented as text forming the YAML output.
Now, I just need to write the functions
Of course, this test still doesn’t pass:
So, let’s do something simple to make it pass. This will insure we keep it passing as we write a more complete implementation:
Note, I still have the warning
lib/yamlix.ex:7: warning: variable scalar is unused which is great. Warnings remind me that I still have work to do. In this case I need to actually dump YAML for the passed in scalar rather than just hard code a result of “— 5”.
Warnings remind me that I still have work to do.
To work through this I need another test:
To pass both this test and the previous test we will convert
scalar to a string and then insert it into the document. Conversion to string should happen in
serialize. So we end up with:
But at this point we have violated the type specification. We really need checking for this.
I’ve been carying this script from project to project:
I’ll add this to the project along with Dialyxer and inch. For more information on working with these projects seem
Strange, Dialyzer passes my code:
I realize now, that Dialyzer doesn’t analyze the tests which is where the type violation is. I’ll just have to fix the problem anyway. We’ll do our best to handle any type, so we have this spec:
Ok, so at this point we have a couple of basic end-to-end tests and the basic structure for our YAML generator. We have lots of test infrastructure to help us. The next step is to start working with serializing more types, and working through the features listed in the YAML spec.
Stepping back and looking at the design
At first I dove into the YAML spec. Section 3.2 describes the YAML information model and starts with the Representation Graph. The graph consists of Nodes and Tags. The Nodes simply represent data to be serialized. The Tag for a node contains metadata that describes the type of the data in the Node. The Tags will allow Yamlix to serialize different data types like Structs and Tuples. However, for the Tags to be useful the YAML parser has to recognize them.
I need to take a step back and think about this project. My original intent was that yamerl would parse the YAML generated by yamlix. But, if yamlix generates Tags for Elixir specific structures yamerl won’t recognize them.
I need to do a little research into yamerl to understand what Tags it does recognize and if it provides a way to extend the Tag support.
Based on the yamerl reference there does seem to be a way to provide a “[l]ist of Erlang modules to extend supported node types”. So, part of the yamlix project may need to include writing node modules to yamerl.
Based on the source files, yamerl contains support for:
- bool_ext (accepts more values for true (“y”, “Y”, etc.) and false)
- Erlang atoms
- Erlang functions
- int_ext (accepts more bases)
- IP Address
From Elixir, the types that yamlix will serialize, initially, will be:
- Streams ?
I’m not sure how to handle Streams just yet. I think
Yamlix.dump/1 should dump something for streams. But the best it can do is to dump a list and it must evaluate the entire stream in order to generate YAML. What happens when the YAML is read back in? It might make sense for a new stream to be created so that the types match with what was passed to
Yamlix.dump/1 originally. I think it will take some thought and experimentation to design the right API for handling Streams.
I’m going to prioritize the following order for supporting types:
Items 1-7 are all supported by yamerl. Items 9-10 will require Elixir or Yamlix specific tags and extensions to yamerl for parsing. So, I’ll save them for the end.
Even if I am using a BDD style of development I think it will pay off to think through a design. The YAML spec recommends a design which I have already started trying to follow. The steps should be
- represent - convert native Elixir data types into a graph of nodes representing the same data types. This means, for example, iterating over all pairs in
Mapand recusively generating graph nodes for the data stored in the
Map. Imagine a Map of Lists of more Maps.
- serialize - This step linearizes the graph and generates canonical string values. It arranges the nodes into linear order that could be written out as YAML. If there are loops in the graph then aliases have to be built (to refer back to prior nodes).
- present - This is the process of writing out the serialized graph as a formatted string.
In our current implementation our steps work like this:
- represent - This stage does nothing. It accepts only scalars and passes them through to the next step.
- serialize - This stage accepts only scalars and converts them to canonical strings.
- present - This stage writes out the YAML header and footer and it writes the string content received from serialized inbetween them.
I think the next step should be to focus on building the graph representation of input structure. Based on the spec’s figure 3.3 Representation Model, we can build a series of types
- Has the canonical string value
- Has a list of Nodes for each value
- Has a map of key -> nodes
Since our test suite consists of two tests of scalars I’ll start with ScalarNode. I wrote up this module as a start:
RepresentationGraph.Node module will be an abstraction over the different specializations of nodes: scalar, sequence, and map. It provides the function
Node.new/1 to create a new node and an accessor
Node.value/1 to query the value from the
Node. Currently, I support only one type of node,
Node.Scalar which holds a value and tag.
RepresentationGraph module takes over the function
represent/1 (fomerly in
Yamlix). Currently, it just creates a new
Node.Scalar and returns it.
I use the
RepresentationGraph set of modules like this:
I now call
RepresentationGraph.represent/1 to build a representation. Then, in
serialize/1 I use
Node.value/1 to extract the value when building the cannonical string.
present/1 is unchanged.
This has turned into a pretty long post and there is a lot more to do in Yamlix. The next steps are
- Get this on GitHub
- Add support for maps - maps will require a recrusive traversal through the Map which will help flesh out the design for Yamlix
- Work through the YAML spec
These are things I’ll wprk on in next week’s post.