It’s been a few weeks since I last wrote about Domain Scrapper. I’ve covered fetching data from services like Reddit and Twitter and for a few posts I’ve been promising to start looking at aggregation of the collected data. In this post I want to start on that task.
Ecto
As I’m sure you’ve heard already, Ecto is the database library to use with Elixir. I’m going to set it up within Domain Scrapper.
I’ve used Ecto in a number of Phoenix based projects before and Phoenix graciously sets up Ecto as part of its initial project. But, recently I wrote a simple script to fetch Elixir github repos as part of my Idiomatic Elixir research. This was the first time I had to set up Ecto by hand and it was a learning experience.
In this post I’m going to repeat the process with Domain Scrapper.
Create a New Aggregator Application Under the Umbrella
Domain Scrapper is an umbrella application and aggregation will be handled in a new application. So first we’ll create the aggregator:
$ mix new aggregator
* creating README.md
* creating .gitignore
* creating mix.exs
* creating config
* creating config/config.exs
* creating lib
* creating lib/aggregator.ex
* creating test
* creating test/test_helper.exs
* creating test/aggregator_test.exs
Your Mix project was created successfully.
You can use "mix" to compile it, test it, and more:
cd aggregator
mix test
Run "mix help" for more commands.
With this, we can begin to setup and configure Ecto within the aggregator application.
Install and Start Ecto
We’ll be using Ecto with postgres so I’ll use the postgrex adapter. To set this up we have to install Ecto and its dependencies by adding these deps to apps/aggregator/mix.exs:
defp deps do
[
{:postgrex, ">= 0.0.0"},
{:ecto, "~> 1.0"},
]
end
Also, we have to start both the Ecto and postgrex applications:
def application do
[applications: [:logger, :postgrex, :ecto]]
end
Then run mix deps.get
from the project root.
Configure the Database
The next step is to configure the database in apps/aggregator/config.exs.
I’ll start with the development configuration. At some point I’ll need to come back to this and add a production configuration.
I’ve added the following to config.exs:
config :aggregator, Aggregator.Repo,
adapter: Ecto.Adapters.Postgres,
database: "domain_scrapper",
username: "postgres",
password: "postgres"
You will need to use a valid username / password for your postgres installation.
Write the Repo Module
Next, we have to create a Repo
module which describes the OTP
application in use. I believe this is used to find our configuration
above:
defmodule Aggregator.Repo do
use Ecto.Repo,
otp_app: :aggregator
end
We can now run
mix ecto.create -r Aggregator.Repo
from the root of the project. This will will create the database. The “-r” option was necessary to specify which Repo to create. From an umbrella app we could easily end up with multiple repos across a number of applications. Though for Domain Scrapper this is our first.
I verified the database was created using PG Commander to see the newly created database.
Create a Model and a Migration
Next, we need to create a model but I haven’t fully thought through what’s going into the aggregator yet. I know at the very least that we have a URL to save. So, let’s start with that much by creating apps/aggregator/lib/aggregator/domain.ex:
defmodule Aggregator.Domain do
use Ecto.Schema
schema "domain" do
field :url
timestamps
end
end
We use Ecto.Schema
to pull in all the Ecto code for defining a model. Then we define our “domain” schema with a single url
field.
Now, before we can use this we need to write and run a migration to build the associated table in the database.
We can use the Ecto’s migration generator to help us get started. I ran the following from within the aggregator application:
$ mix ecto.gen.migration add_domain
* creating priv/repo/migrations
* creating priv/repo/migrations/20151215005708_add_domain.exs
This gives us an empty migration in priv/repo/migrations/20151215005708_add_domain.exs. Of course your filename may differer somewhat. We start with:
defmodule Aggregator.Repo.Migrations.AddDomain do
use Ecto.Migration
def change do
end
end
We need to fill this in so that it contains:
defmodule Aggregator.Repo.Migrations.AddDomain do
use Ecto.Migration
def change do
create table(:domain) do
add :url, :string
timestamps
end
end
end
This creates the table and configures the columns to match our schema. With this we can run the migration to update the database:
$ mix ecto.migrate
06:33:47.015 [info] == Running Aggregator.Repo.Migrations.AddDomain.change/0 forward
06:33:47.015 [info] create table domain
06:33:47.024 [info] == Migrated in 0.0s
I can use PG Commander again to verify the new table looks as I expect.
Use iex to Build a Model
With this let’s try out the model using iex
iex(1)> Aggregator.Repo.insert! %Aggregator.Domain{url: "http://about.joekain.com"}
** (ArgumentError) repo Aggregator.Repo is not started, please ensure it is part of your supervision tree
(ecto) lib/ecto/adapters/sql.ex:250: Ecto.Adapters.SQL.query/6
(ecto) lib/ecto/adapters/sql.ex:222: Ecto.Adapters.SQL.query/5
(ecto) lib/ecto/adapters/sql.ex:484: Ecto.Adapters.SQL.model/6
(ecto) lib/ecto/repo/model.ex:253: Ecto.Repo.Model.apply/4
(ecto) lib/ecto/repo/model.ex:83: anonymous fn/10 in Ecto.Repo.Model.do_insert/4
(ecto) lib/ecto/repo/model.ex:14: Ecto.Repo.Model.insert!/4
Well, that didn’t work.
But, Ecto is kind enough to tell us that we have to start the Repo
and suggests we do this in our supervision tree. This should be easily done except that when I went to do it I realized that I had not created the aggregator application as supervised using mix new --sup
.
Supervise an Elixir Application from Scratch
At first I thought to start over with a supervised application and go through all the steps again. But then I thought this would be a good learning experience to understand how to setup a supervised application by hand.
To figure out what I needed to do I created a dummy supervised app in my umbrella project:
$ mix new --sup foo
* creating README.md
* creating .gitignore
* creating mix.exs
* creating config
* creating config/config.exs
* creating lib
* creating lib/foo.ex
* creating test
* creating test/test_helper.exs
* creating test/foo_test.exs
Your Mix project was created successfully.
You can use "mix" to compile it, test it, and more:
cd foo
mix test
Run "mix help" for more commands.
Looking at lib/foo.ex I see
defmodule Foo do
use Application
# See http://elixir-lang.org/docs/stable/elixir/Application.html
# for more information on OTP Applications
def start(_type, _args) do
import Supervisor.Spec, warn: false
children = [
# Define workers and child supervisors to be supervised
# worker(Foo.Worker, [arg1, arg2, arg3]),
]
# See http://elixir-lang.org/docs/stable/elixir/Supervisor.html
# for other strategies and supported options
opts = [strategy: :one_for_one, name: Foo.Supervisor]
Supervisor.start_link(children, opts)
end
end
contrast this with aggregator/lib/aggregator.ex:
defmodule Aggregator do
end
So, we need to use Application
to pull in the Application
boilerplate and then write a start/2
function to start up the aggregator’s supervision tree. Something like this should work:
defmodule Aggregator do
use Application
def start(_type, _args) do
import Supervisor.Spec, warn: false
children = [
worker(Aggregator.Repo, [])
]
opts = [strategy: :one_for_one, name: Aggregator.Supervisor]
Supervisor.start_link(children, opts)
end
end
Our start/2
function defines a single child worker for Aggregator.Repo
since this is the module we wanted to start up in the first place. Then it starts up a Supervisor
with the same :one_for_one
strategy that the mix would generate by default. Given that we have only one child the choice of strategy shouldn’t make much of a difference.
Continuing to follow the dummy app’s lead, we need to add the Aggregator
module to the application definition in app/aggregator/mix.exs like this:
def application do
[applications: [:logger, :postgrex, :ecto],
mod: {Aggregator, []}]
end
We added the mod: {Aggregator, []}
element to describe the application’s module callback. This tells mix to use our Aggregator.start/2
function to start up our application’s supervision tree.
Finally, we need to start the :aggregator
application itself from the main application. We do this by adding it to the application list in apps/main/config.exs:
def application do
[applications: [:logger, :unshortening_pool, :producer, :aggregator]]
end
Use iex to Build a Model (again)
With our supervision tree setup we again we try using iex to insert a record:
iex(1)> Aggregator.Repo.insert! %Aggregator.Domain{url: "http://about.joekain.com"}
07:01:21.704 [debug] INSERT INTO "domain" ("inserted_at", "updated_at", "url") VALUES ($1, $2, $3) RETURNING "id" [nil, nil, "http://about.joekain.com"] ERROR query=86.9ms queue=4.9ms
** (Postgrex.Error) ERROR (not_null_violation): null value in column "inserted_at" violates not-null constraint
(ecto) lib/ecto/adapters/sql.ex:493: Ecto.Adapters.SQL.model/6
(ecto) lib/ecto/repo/model.ex:253: Ecto.Repo.Model.apply/4
(ecto) lib/ecto/repo/model.ex:83: anonymous fn/10 in Ecto.Repo.Model.do_insert/4
(ecto) lib/ecto/repo/model.ex:14: Ecto.Repo.Model.insert!/4
iex(1)>
07:01:25.020 [error] GenServer #PID<0.315.0> terminating
** (FunctionClauseError) no function clause matching in BlockingQueue.handle_call/3
lib/blocking_queue.ex:53: BlockingQueue.handle_call(:pop, {#PID<0.337.0>, #Reference<0.0.4.2>}, {1, [], :pop, {#PID<0.325.0>, #Reference<0.0.6.437>}})
(stdlib) gen_server.erl:629: :gen_server.try_handle_call/4
(stdlib) gen_server.erl:661: :gen_server.handle_msg/5
(stdlib) proc_lib.erl:240: :proc_lib.init_p_do_apply/3
Last message: :pop
State: {1, [], :pop, {#PID<0.325.0>, #Reference<0.0.6.437>}}
This gets a little further but fails to insert because the “inserted_at” column is null. It looks like we have to set this manually:
iex(1)> Aggregator.Repo.insert! %Aggregator.Domain{url: "http://about.joekain.com", inserted_at: Ecto.DateTime.local, updated_at: Ecto.DateTime.local}
07:05:01.034 [debug] INSERT INTO "domain" ("inserted_at", "updated_at", "url") VALUES ($1, $2, $3) RETURNING "id" [{ {2015, 12, 15}, {7, 5, 0, 0}}, { {2015, 12, 15}, {7, 5, 0, 0}}, "http://about.joekain.com"] OK query=74.2ms queue=2.9ms
%Aggregator.Domain{__meta__: #Ecto.Schema.Metadata<:loaded>, id: 2,
inserted_at: #Ecto.DateTime<2015-12-15T07:05:00Z>,
updated_at: #Ecto.DateTime<2015-12-15T07:05:00Z>,
url: "http://about.joekain.com"}
Success!
Next steps
We have our basic setup for Ecto finished. Next we need to make it a little easier to work with and then start doing real work with the model.
In the next post I hope to handle making things easier to work with. This means making those timestamps automatic, and setting up the right aliases and imports to make working Ecto a little less verbose.