Articles - Diego Guerra

On docker as a development environment Posted on 18 Mar 2015

A few weeks ago I got a new laptop, and, even though I’ve been doing quite a bit of development work with it, I haven’t bothered to install either virtualenv, rvm, bundler or any other dev management tool… Besides Docker and Compose

Compose (formerly known as Fig) is a layer on top of Docker that, in a very simple way, allows for both runtime configuration of containers and linking between them. Runtime configuration is the kind of stuff you may pass in the command line to a Docker image when running it: port forwardings, volume mounting, entrypoints… Linking of containers allows them to see each other without the need to expose any ports. And all this is done using a really simple YAML configuration file that you can check into your repository to make the whole environment replicable.

So, if you have a Ruby web app that uses MongoDB for persistence and Memcache as a caching service, instead of installing every single dependency on your dev machine, you may just create a Dockerfile for the app, and then create a Compose configuration that mounts the source of the app as a volume (so that changes you make in your code are immediately available on the container), sets up the port forwaring so you can see the app from your browser and that links it with containers for Memcache and Mongodb. You probably won’t even need to do anything special with these two since they are already built on the Docker Hub. Then just by doing docker-compose up you’ll have your whole environment up and ready to use. Make sure to check Compose’s quickstart for an example of all this in action.

Impressive as this is (and it is, not only in what it does but in how easy and painless the whole process is) it isn’t enough for a development machine. The first issue I encountered was doing stuff inside the container besides the entrypoint declared on the Dockerfile, like running a REPL, a debugger or a db console. Compose’s default way of doing this is through docker-compose run SERVICE COMMAND, which will start a new container for the same image and run COMMAND. This is not that great, since it takes a while for the container to start (a simple echo for the container I use for this blog takes about 1.5 seconds). Docker already has exec, which runs the command on the same container (the same test using docker exec takes 0.33 seconds). If I understand correctly there’s already work to integrate it into Compose, so this hopefully won’t be an issue for much longer, in the meantime I’ve been playing around with an idea to tie commands to containers that can be then run using docker exec in Fede (keep in mind this is extremely alpha software)

The next problem I found is a permissions one. Even though I use Docker as a non-root user, files created in mounted volumes inside the container are created as root (since I just use root as the user for the container), so checking logs or uploaded files sometimes means chmodding them. Luckily this was solved a couple of weeks ago in Docker, so it should be included in Docker’s next release.

And finally, the last issue I’ve been having is long build times when using external package managers like bundler or pip. Since Docker’s cache is per command, adding a new dependency to your requirements.txt means that when rebuilding the image, once it gets to RUN pip install requirements.txt it won’t have any package cached, so if you have enough packages this will take a long time. In one project where I was using scipy, it got to be so bad that I just had to add a RUN pip install scipy before the requirements line so that at least this package was cached. While I’m not the first one to notice it, I haven’t yet seen any good solution to this problem. I’d like to explore the possibility of having this kind of tools emit a series of Docker commands that could be run sequentially and that Docker could cache (a kind of Docker meta command), so pip install requirements.txt --docker would return a series of RUN pip install <package> that Docker could then run and cache independently for future builds.

All things considered, I’m quite happy with using Docker for the kind of hobby stuff I do at home. The UX can certainly be improved, but being able to tap into Docker’s hub for all kinds of services and base images for tons of programming languages, that I can link together and bring up and down with a single command, without any discernible performance penalty is just so great that I’m willing to put up with it while the kinks are ironed out, which in my experience with Docker will probably be pretty soon.

Templating in bash Posted on 06 Apr 2014

A templating language or library should be on the toolbox of every programming environment. While mostly used in web development to output HTML, they are useful in plenty other situations, from creating dynamic configuration files to doing meta programming and preprocessing.

Today we are going to take a look at a couple of ways to do templates in bash, half for fun and half because it’s actually useful: creating a configuration file with dynamic fields by running a simple script can be a pain without it.

m4

If you google for bash templates, you’ll eventually find m4. m4 is a pretty old school macro preprocessor that has found it’s way in the standard unix toolset, mostly due to its use in autotools. It’s mode of operation can be quite complex, with directives to control output, rule rewriting, recursive looping… Have a look at the example on wikipedia for a small taste. This complexity mostly kills it for me, it seems that once you start using m4 you are going to be in one of those Now you have 2 problems scenarios.

However, if our templating needs are limited to some string substitutions, m4 can be a good option. With a template like this:

GREET, USER, your home is HOME

We can simply run m4 -DUSER=$USER -DHOME=$HOME -DGREET=Hi template.m4 (careful here, the order of the parameters is important, -DRULE declarations always go before the template file) and get

Hi, diego, your home is /home/diego

sed

If we are limiting ourselves to simple substitutions we have at least to make a passing mention of sed. Let’s replicate the simple m4 example with sed. The template is the same, and our command line would be something like this:

sed template.m4 -e "s/USER/$USER/g" -e "s/HOME/$HOME/g" -e "s/GREET/Hi/g"

However using double quotes to allow for string evaluation of $HOME makes the expression invalid, as after substitution it’ll be s/HOME/home/diego/g. We can avoid this error by changing the separator like this:

sed template.m4 -e "s/USER/$USER/g" -e "s|HOME|$HOME|g" -e "s/GREET/Hi/g"

Which works, but we’ve arrived at a solution that is both more verbose and more brittle when compared to the m4 one, while having no obvious benefits, so I think we can safely ignore sed for this use case.

cat, heredocs, and string interpolation

For now we are stuck with m4, which is very powerful but complex, and involves learning a new language. Can’t we simply use bash and have it work like PHP, where plain strings are just outputted and code is executed and then outputted?

The best idea for this I’ve come up with is hackish, not as pretty as a pure templating language, but it works. The idea is to use bash’s string interpolation to output environment variables that are passed to the template. To output we’ll use cat, and to provide multi line strings heredocs. Let’s have a look:

cat <<EOF
$GREET, $USER, your home is $HOME
EOF

This is our template, while there’s a little bit of noise at the beginning and end of it, it’s mostly plain strings with the vars marked with $ in front. To run this template we can simply do this:

GREET=Hi . template.sh

More complex stuff is possible, but it may not be very pretty. For example loops:

for i in {1..5}
do 
cat <<EOF
$GREET $USER
EOF
done

cat <<EOF
your home is $HOME
EOF

Which outputs

Hi diego
Hi diego
Hi diego
Hi diego
Hi diego
your home is /home/diego

As I said, it’s not the prettiest, and it’s not quite the same as a pure templating language since the primary mode of operation is execution + output, not plain output, but for simple use cases, it can work pretty well.

Other options

So far I’ve tried to stay true to the standard unix toolbox, however since we are in bash no one is preventing us from using PHP, or perl or ruby or whatever. Setting up the environment for each template can be more complex, but the power and simplicity they provide probably can’t be matched by any of the options we’ve talked about.

And this piece for you Posted on 16 Mar 2014

If you remember last time we were in the spooky mansion of the evil toymaker, cutting equivalent pieces of cake. We already have a way to generate adjacent states to a given one and check for solutions, however we haven’t yet explored how to actually use these functions to iterate and solve the problem.

We first used best first to solve the sliding tiles puzzle with great results, so let’s try it here:

 (let [initial (build-initial-state 6 cake)
       solution? (build-solution-checker 6 cake)
       get-next-states (build-get-next-states 6 cake)]
    (bfser/solve initial solution? get-next-states))

Which arrives to the following solution in almost no time:

[[3 8 8 8 7 6] 
 [3 3 3 8 7 6] 
 [4 4 3 8 7 6] 
 [4 4 4 7 7 6] 
 [5 5 5 5 5 6]]

However, let’s remember for a moment something we said when we first introduced best-first:

Since we know that our solution must lie somewhat close to the initial node (It’s unlikely that any game designer would be so evil to have us make hundreds of moves to solve the puzzle), our best bet is to use a breadth-first approach.

Does the solution for our current problem lie closer to the root node, or is it nearer the leaves? Think about the tree we are generating, it starts with all chunks being available for taking and in each level we add one chunk to the active piece, until in the last level there are no available chunks, that is, in the last level all the pieces are complete for every branch, or yet again in other way, all the solutions are in the last level!.

Thinking about it this way exploring a full level before moving on to the next seems like a waste. If we explore the tree in a depth-first manner, always moving down before moving sideways, we’ll probably find a solution sooner, since we’ll be getting to the last level of the tree regularly, while in a breath-first solution we’ll only get to the last level once we have explored all the previous ones.

As for the implementation of a depth-first search, thanks to the magic arts of recursion it’s as simple as:

(defn solve [state solution? get-next-states]
  (if (solution? state)
    [state]
    (if-let [states (some identity (map #(solve % solution? get-next-states) (get-next-states state)))]
      (conj states state))))

For consistency with the other solvers we are returning a list (a vector here) with the history of all the states that lead to the solution, we don’t need it this time, but who knows what future puzzles we’ll find in our quest?

The solution this solver comes up with is the same as the solution the best-first provided, and since it’s a pretty simple problem the difference in execution time it’s negligible, but for bigger cakes the best-first solver would probably get stuck much sooner than the backtracker.

Piece of cake Posted on 08 Mar 2014

What would you expect if an evil toy maker invited you and six other random third rate actors to spend a night in a campy early-nineties CGI haunted house for the chance to win a fortune? The answer is obvious: horror themed logic puzzles!

Let’s delve into 1993’s the 7th Guest (now playable almost anywhere with ScummVM), a series of puzzles barely held together by a “horror” storyline. One of the first puzzles we’ll have to solve involves a cake with skull, tombstone and plain toppings, which we will have to cut into six pieces with the same type and number of toppings.

Let’s start by getting a representation for the cake, I don’t think anyone will be surprised when we turn to our trusty matrix, with 1s being skulls, 2s tombstones and 0s plains:

(def cake [[1 2 2 1 2 1]
           [0 2 1 1 2 1] 
           [0 2 2 0 1 0]
           [1 1 2 0 1 2]
           [2 2 1 0 1 2]])

For the solution, we can use numbers from 3 to 8 to represent each of the pieces, so this would be one of the multiple solutions:

(def solution [[3 8 8 8 7 6] 
               [3 3 3 8 7 6] 
               [4 4 3 8 7 6] 
               [4 4 4 7 7 6] 
               [5 5 5 5 5 6]])

As you can see each piece has 2 skulls, 2 tombstones and a plain topping. For starters, we need to get the number of each type of topping that a piece will have, something like this:

{0 1, 1 2, 2 2}

Starting from the cake and the number of pieces we want to make, here’s a way of getting this piece spec

(defn get-spec [pieces cake]
  (into {} (map #(vector (first %) (-> % last count (/ pieces))) 
                (group-by identity (flatten cake)))))

Basically we group the toppings, count them and divide by the number of pieces for each, creating a hash with the result. With this spec it now becomes easier to think about how we are going to solve the puzzle. Starting from a fixed position, in each iteration we are going to add a chunk to a piece, making sure that it conforms to the spec.

For the next iteration the cake will have one less available chunk (the one we added to the piece) and the spec will reflect that it needs one less of the topping we added. Once a piece is complete, we’ll reset the spec to start creating a new piece.

The following piece of code updates all these parts. Note that we are using a map to store all the different pieces of information and moving them around, and we are merging it with the updated versions of each of them:

The cake with the piece updated.
The spec with one less of the taken topping
The position in the cake we have just updated, so that in the next iteration we now where to start from

(defn update-position [pos state]
  (let [{:keys [cake piece-num spec]} state]
    (merge state {:cake (assoc-in cake pos piece-num)
                  :last-pos pos
                  :spec (update-in spec [(get-in cake pos)] dec)})))

With this the next part should be pretty familiar for us, from a position we gather all the adjacent chunks that could be added to the current piece and return a list of these new states:

(defn valid-next [{:keys [last-pos spec cake]}]
  (filter #(valid-val? (spec (get-in cake %)))
          (for [coord [[-1 0] [0 -1] [1 0] [0 1]]] 
            (map + last-pos coord))))

(defn get-next-states [original-state state]
  (let [state (update-spec original-state state)]
    (map #(update-position % state) (valid-next state))))

Next up, we need a way of checking for solutions. One way would be to check that each piece conforms to the spec, however, since we are ensuring that each piece conforms while we are creating it, checking for the solution is as simple as ensuring that there are no toppings left on it:

(defn solution? [{piece-num :piece-num} state]
  (every? #(>= % piece-num) (flatten (:cake state))))

We are almost done, if you check the repo you’ll see some more functions that, given a cake and a number of pieces create the map we use to iterate and curry the solution? and get-next-states functions so that they can be used in our solvers.

I think we’ve had enough cake for one sitting, come back next time to see how our generic solvers fare when confronted with this sweet problem.

Do you even proc? Posted on 04 Mar 2014

One neat little feature in clojure is that sets and hashes can be used as functions. While weird at first, it’s actually pretty useful, more so when combined with higher order functions like filter:

(#{:a :b} :b); => :b
(#{:a :b} :c); => nil

({:a 1 :b 2} :a); => 1
({:a 1 :b 2} :c); => nil

(filter #{1 3} [1 2 3]); => (1 3)

I hadn’t seen this data-structures-as-functions construct before, but I really like it, it exploits the fact that there’s a clear default action associated with the type, looking up keys in a hash or elements in a set in the examples, and uses it to make code both terser and more self-explanatory.

Actually, come to think about it, this is not as alien as it may seem. In ruby we use Symbol#to_proc probably once every 5 lines. In fact we are so used to it that we probably don’t think much about what it really means:

[1, 2, 3].inject(&:+)

What this code does is take the symbol +, and use the unary operator & to call to_proc on it and ‘turn’ it into a block, which is then used as a block for inject. The key thing to realize here is that we can provide a default way for a type (Symbol in this case) to become a block, by using & and to_proc. This does sound similar to what clojure is doing, doesn’t it? It’s really just a matter of some monkeypatching to create hashes and sets that work like functions:

class Hash
  def to_proc
    Proc.new {|x| self[x]}
  end
end

class Set
  def to_proc
    Proc.new {|x| self.contains?(x) }
  end
end

Let’s see how it works:

[1, 2, 3].select(&Set.new(2)) # => [2]

h = {:a => 1, :b => 2, :c => 3}

[:a, :b, :c].map(&h) # => [1, 2, 3]

Again, while unfamiliar at first, the more I look at it the more I like it, especially the hash example. Another type that has a well defined default action is the regular expression:

class Regexp
  def to_proc
    Proc.new {|x| x.match(self)}
  end
end

Which makes code like this possible:

["a", "b", "c"].detect(&/a/) # => "a"

["a", "b", "c"].select(&/a/) # => ["a"]
# Note that this is the same as ["a", "b", "c"].grep(/a/)

["a", "b", "c"].reject(&/a/) # => ["b", "c"]

This one I like even better, it’s much easier to spot when using literal regexes and, as evidenced by the existence of Enumerable#grep, it’s a pretty useful feature.

Array may seem like another candidate, but what should the action be, look up elements by position, or check for element presence? Having to choose probably means that it’s better to leave it alone. As for others I can’t really think of more default types with action semantics clear enough to merit a to_proc, but I’d gladly hear any suggestions.