Joseph Kain bio photo

Joseph Kain

Professional Software Engineer learning Elixir.

Twitter LinkedIn Github

It’s been a few weeks since I last wrote about Domain Scrapper. I’ve covered fetching data from services like Reddit and Twitter and for a few posts I’ve been promising to start looking at aggregation of the collected data. In this post I want to start on that task.

Ecto

As I’m sure you’ve heard already, Ecto is the database library to use with Elixir. I’m going to set it up within Domain Scrapper.

I’ve used Ecto in a number of Phoenix based projects before and Phoenix graciously sets up Ecto as part of its initial project. But, recently I wrote a simple script to fetch Elixir github repos as part of my Idiomatic Elixir research. This was the first time I had to set up Ecto by hand and it was a learning experience.

In this post I’m going to repeat the process with Domain Scrapper.

Create a New Aggregator Application Under the Umbrella

Domain Scrapper is an umbrella application and aggregation will be handled in a new application. So first we’ll create the aggregator:

$ mix new aggregator
* creating README.md
* creating .gitignore
* creating mix.exs
* creating config
* creating config/config.exs
* creating lib
* creating lib/aggregator.ex
* creating test
* creating test/test_helper.exs
* creating test/aggregator_test.exs

Your Mix project was created successfully.
You can use "mix" to compile it, test it, and more:

cd aggregator
mix test

Run "mix help" for more commands.

With this, we can begin to setup and configure Ecto within the aggregator application.

Install and Start Ecto

We’ll be using Ecto with postgres so I’ll use the postgrex adapter. To set this up we have to install Ecto and its dependencies by adding these deps to apps/aggregator/mix.exs:

defp deps do
  [
    {:postgrex, ">= 0.0.0"},
    {:ecto, "~> 1.0"},
  ]
end

Also, we have to start both the Ecto and postgrex applications:

def application do
  [applications: [:logger, :postgrex, :ecto]]
end

Then run mix deps.get from the project root.

Configure the Database

The next step is to configure the database in apps/aggregator/config.exs.

I’ll start with the development configuration. At some point I’ll need to come back to this and add a production configuration.

I’ve added the following to config.exs:

config :aggregator, Aggregator.Repo,
  adapter: Ecto.Adapters.Postgres,
  database: "domain_scrapper",
  username: "postgres",
  password: "postgres"

You will need to use a valid username / password for your postgres installation.

Write the Repo Module

Next, we have to create a Repo module which describes the OTP application in use. I believe this is used to find our configuration above:

defmodule Aggregator.Repo do
  use Ecto.Repo,
  otp_app: :aggregator
end

We can now run

mix ecto.create -r Aggregator.Repo

from the root of the project. This will will create the database. The “-r” option was necessary to specify which Repo to create. From an umbrella app we could easily end up with multiple repos across a number of applications. Though for Domain Scrapper this is our first.

I verified the database was created using PG Commander to see the newly created database.

Create a Model and a Migration

Next, we need to create a model but I haven’t fully thought through what’s going into the aggregator yet. I know at the very least that we have a URL to save. So, let’s start with that much by creating apps/aggregator/lib/aggregator/domain.ex:

defmodule Aggregator.Domain do
  use Ecto.Schema

  schema "domain" do
    field :url

    timestamps
  end
end

We use Ecto.Schema to pull in all the Ecto code for defining a model. Then we define our “domain” schema with a single url field.

Now, before we can use this we need to write and run a migration to build the associated table in the database.

We can use the Ecto’s migration generator to help us get started. I ran the following from within the aggregator application:

$ mix ecto.gen.migration add_domain
* creating priv/repo/migrations
* creating priv/repo/migrations/20151215005708_add_domain.exs

This gives us an empty migration in priv/repo/migrations/20151215005708_add_domain.exs. Of course your filename may differer somewhat. We start with:

defmodule Aggregator.Repo.Migrations.AddDomain do
  use Ecto.Migration

  def change do
  end
end

We need to fill this in so that it contains:

defmodule Aggregator.Repo.Migrations.AddDomain do
  use Ecto.Migration

  def change do
    create table(:domain) do
      add :url, :string

      timestamps
    end
  end
end

This creates the table and configures the columns to match our schema. With this we can run the migration to update the database:

$ mix ecto.migrate

06:33:47.015 [info]  == Running Aggregator.Repo.Migrations.AddDomain.change/0 forward

06:33:47.015 [info]  create table domain

06:33:47.024 [info]  == Migrated in 0.0s

I can use PG Commander again to verify the new table looks as I expect.

Use iex to Build a Model

With this let’s try out the model using iex

iex(1)> Aggregator.Repo.insert! %Aggregator.Domain{url: "http://about.joekain.com"}
** (ArgumentError) repo Aggregator.Repo is not started, please ensure it is part of your supervision tree
    (ecto) lib/ecto/adapters/sql.ex:250: Ecto.Adapters.SQL.query/6
    (ecto) lib/ecto/adapters/sql.ex:222: Ecto.Adapters.SQL.query/5
    (ecto) lib/ecto/adapters/sql.ex:484: Ecto.Adapters.SQL.model/6
    (ecto) lib/ecto/repo/model.ex:253: Ecto.Repo.Model.apply/4
    (ecto) lib/ecto/repo/model.ex:83: anonymous fn/10 in Ecto.Repo.Model.do_insert/4
    (ecto) lib/ecto/repo/model.ex:14: Ecto.Repo.Model.insert!/4

Well, that didn’t work.

But, Ecto is kind enough to tell us that we have to start the Repo and suggests we do this in our supervision tree. This should be easily done except that when I went to do it I realized that I had not created the aggregator application as supervised using mix new --sup.

Supervise an Elixir Application from Scratch

At first I thought to start over with a supervised application and go through all the steps again. But then I thought this would be a good learning experience to understand how to setup a supervised application by hand.

To figure out what I needed to do I created a dummy supervised app in my umbrella project:

$ mix new --sup foo
* creating README.md
* creating .gitignore
* creating mix.exs
* creating config
* creating config/config.exs
* creating lib
* creating lib/foo.ex
* creating test
* creating test/test_helper.exs
* creating test/foo_test.exs

Your Mix project was created successfully.
You can use "mix" to compile it, test it, and more:

    cd foo
    mix test

Run "mix help" for more commands.

Looking at lib/foo.ex I see

defmodule Foo do
  use Application

  # See http://elixir-lang.org/docs/stable/elixir/Application.html
  # for more information on OTP Applications
  def start(_type, _args) do
    import Supervisor.Spec, warn: false

    children = [
      # Define workers and child supervisors to be supervised
      # worker(Foo.Worker, [arg1, arg2, arg3]),
    ]

    # See http://elixir-lang.org/docs/stable/elixir/Supervisor.html
    # for other strategies and supported options
    opts = [strategy: :one_for_one, name: Foo.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

contrast this with aggregator/lib/aggregator.ex:

defmodule Aggregator do
end

So, we need to use Application to pull in the Application boilerplate and then write a start/2 function to start up the aggregator’s supervision tree. Something like this should work:

defmodule Aggregator do
  use Application

  def start(_type, _args) do
    import Supervisor.Spec, warn: false

    children = [
      worker(Aggregator.Repo, [])
    ]

    opts = [strategy: :one_for_one, name: Aggregator.Supervisor]
    Supervisor.start_link(children, opts)
  end
end

Our start/2 function defines a single child worker for Aggregator.Repo since this is the module we wanted to start up in the first place. Then it starts up a Supervisor with the same :one_for_one strategy that the mix would generate by default. Given that we have only one child the choice of strategy shouldn’t make much of a difference.

Continuing to follow the dummy app’s lead, we need to add the Aggregator module to the application definition in app/aggregator/mix.exs like this:

def application do
  [applications: [:logger, :postgrex, :ecto],
   mod: {Aggregator, []}]
end

We added the mod: {Aggregator, []} element to describe the application’s module callback. This tells mix to use our Aggregator.start/2 function to start up our application’s supervision tree.

Finally, we need to start the :aggregator application itself from the main application. We do this by adding it to the application list in apps/main/config.exs:

def application do
  [applications: [:logger, :unshortening_pool, :producer, :aggregator]]
end

Use iex to Build a Model (again)

With our supervision tree setup we again we try using iex to insert a record:

iex(1)> Aggregator.Repo.insert! %Aggregator.Domain{url: "http://about.joekain.com"}

07:01:21.704 [debug] INSERT INTO "domain" ("inserted_at", "updated_at", "url") VALUES ($1, $2, $3) RETURNING "id" [nil, nil, "http://about.joekain.com"] ERROR query=86.9ms queue=4.9ms
** (Postgrex.Error) ERROR (not_null_violation): null value in column "inserted_at" violates not-null constraint
    (ecto) lib/ecto/adapters/sql.ex:493: Ecto.Adapters.SQL.model/6
    (ecto) lib/ecto/repo/model.ex:253: Ecto.Repo.Model.apply/4
    (ecto) lib/ecto/repo/model.ex:83: anonymous fn/10 in Ecto.Repo.Model.do_insert/4
    (ecto) lib/ecto/repo/model.ex:14: Ecto.Repo.Model.insert!/4
iex(1)>
07:01:25.020 [error] GenServer #PID<0.315.0> terminating
** (FunctionClauseError) no function clause matching in BlockingQueue.handle_call/3
    lib/blocking_queue.ex:53: BlockingQueue.handle_call(:pop, {#PID<0.337.0>, #Reference<0.0.4.2>}, {1, [], :pop, {#PID<0.325.0>, #Reference<0.0.6.437>}})
    (stdlib) gen_server.erl:629: :gen_server.try_handle_call/4
    (stdlib) gen_server.erl:661: :gen_server.handle_msg/5
    (stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Last message: :pop
State: {1, [], :pop, {#PID<0.325.0>, #Reference<0.0.6.437>}}

This gets a little further but fails to insert because the “inserted_at” column is null. It looks like we have to set this manually:

iex(1)> Aggregator.Repo.insert! %Aggregator.Domain{url: "http://about.joekain.com", inserted_at: Ecto.DateTime.local, updated_at: Ecto.DateTime.local}
07:05:01.034 [debug] INSERT INTO "domain" ("inserted_at", "updated_at", "url") VALUES ($1, $2, $3) RETURNING "id" [{ {2015, 12, 15}, {7, 5, 0, 0}}, { {2015, 12, 15}, {7, 5, 0, 0}}, "http://about.joekain.com"] OK query=74.2ms queue=2.9ms
%Aggregator.Domain{__meta__: #Ecto.Schema.Metadata<:loaded>, id: 2,
 inserted_at: #Ecto.DateTime<2015-12-15T07:05:00Z>,
 updated_at: #Ecto.DateTime<2015-12-15T07:05:00Z>,
 url: "http://about.joekain.com"}

Success!

Next steps

We have our basic setup for Ecto finished. Next we need to make it a little easier to work with and then start doing real work with the model.

In the next post I hope to handle making things easier to work with. This means making those timestamps automatic, and setting up the right aliases and imports to make working Ecto a little less verbose.