Dockerfile Basics
Goal
The goal is to understand how Dockerfiles and the build process work and how to create a custom image from scratch. For this a minimal web server application written in Go will be containerized using a Dockerfile. Go was chosen because it’s easy to build a native binary with it and already contains “net/http” in the standard library.
First switch to an empty directory and run:
go mod init goapp
In the same directory create the following file:
package main
import (
"fmt"
"log"
"net/http"
)
func main() {
http.HandleFunc("/", func(w http.ResponseWriter, r *http.Request) {
fmt.Fprintf(w, "%s %s %s\n%s", r.Host, r.Method, r.URL.Path, r.UserAgent())
})
fmt.Println("Starting server on :8080 ...")
if err := http.ListenAndServe(":8080", nil); err != nil {
log.Fatal(err)
}
}
Run the application using:
go run goapp.go
To build a native binary:
go build goapp.go
BuildKit
A Dockerfile contains instructions to build a docker image by using docker build
which can then be uploaded to a repository or directly shared as a file. BuildKit is set to replace the legacy builder and included since docker version v18.06. It is much faster and comes with many new features to improve security and flexibility.
BuildKit is enabled by default on Docker Desktop. It can be enabled by either:
- Setting the environment variable
DOCKER_BUILDKIT=1
- Adding
{ "features": { "buildkit": true } }
to/etc/docker/daemon.json
- Using
docker buildx
instead ofdocker build
#syntax
The first thing to do in a Dockerfile for BuildKit is to specify the syntax being used. At the core of BuildKit is a Low-Level Build(LLB) format which is a binary format that allows developers to implement a custom frontend/syntax. It’s an interesting topic but also complex enough to warrant it’s own post.
The definition looks like this and yes, docker/dockerfile:1.5 is a docker image. It’s job is to translate the human readable text to LLB.
# syntax=docker/dockerfile:1.5
FROM
After defining the syntax it’s time to define the base image using the FROM
instruction. The only instruction that may precede FROM
is ARG
. ARG
are build time variables and not available in the final image or container.
Example:
FROM alpine:3.17
When choosing a tag you should be careful when using :latest since it’s nothing more than just another tag. It is not dynamic, it does not gurantee anything and is simply the default value if no tag is provided during the build process. In case of an error it can be hard to figure out what version was used during that time if the tag was just “:latest”.
COPY, ADD
Both ADD
and COPY
serve the purpose of copying local files into the image/container. ADD
provides some extra functionality such as local-only tar extraction and remote URL support. In general COPY
is preferred since it’s reduced features make it immediately obvious what happens.
Example:
ADD rootfs.tar.xz /
COPY requirements.txt /app
Layers
Docker images are build up of layers stacked on top of each other. A layer is created whenever a change occurs (copy-on-write strategy) and allows for faster builds by skipping/caching/reusing layers based on the changes made.
Consider following Dockerfile:
# syntax=docker/dockerfile:1.5
FROM alpine:3.17
COPY goapp /
The resulting image consists of all layers from the alpine:3.17 image and one additional layer caused by the changes made to the filesystem by the COPY instruction.
WORKDIR
Sets the working directory for any following ADD, COPY, CMD, RUN or ENTRYPOINT instructions.
Example:
WORKDIR /app
COPY . .
This copies all files and folders from the local directory (where the Dockerfile resides) to the container path /app.
RUN
RUN
does just that, it runs a command. It can be used for various tasks such as installing dependencies, configuration or file manipulation.
Example:
RUN apt install python3 -y
CMD, ENTRYPOINT
Both CMD
and ENTRYPOINT
basically provide the default command to run when the container starts. While ENTRYPOINT
defaults to /bin/sh -c
there is no default for CMD. ENTRYPOINT
always executes when the container starts. CMD
specifies arguments that will be passed to the ENTRYPOINT
. Since this can be a bit confusing here is an example:
FROM alpine:3.17
ENTRYPOINT [ "/bin/echo" ]
CMD ["Hello, World!"]
Build the image:
docker build -t dockertest
Now run if running without any arguments “Hello, World!” is printed:
docker run --rm dockertest
If passing any arguments, those are passed to the ENTRYPOINT instead and “arg” is printed instead:
docker run --rm dockertest arg
Above CMD was used to pass default paramters to the entrypoint, but the preferred form of specifying CMD is the exec form which looks like that:
CMD ["/bin/echo", "Hello, World!"]
Finally there is a third form called the shell form which passes the command to the shell.
CMD echo Hello, World!
.dockerignore
Whenever you build an image docker creates a build context which by default includes everything in the directory of the Dockerfile. This build context is copied by docker and this can quickly become a performance issue, especially when connected to a remote docker instance over a slow connection.
Consider following Dockerfile for a node application:
FROM node:19.8
WORKDIR /app
COPY package.json package.json
COPY package-lock.json package-lock.json
COPY src/ .
RUN npm install
Even though node_modules
is not copied explicitly in the Dockerfile it is still part of the build context. This folder usually contains dependencies installed by npm
and can easily contain tens of thousands of files that are not needed since RUN npm install
will fetch them anyways. To mitigate this issue a .dockerfile
can be created to ignore certain files and directories. The syntax is similar to .gitignore
:
node_modules/
A complete example
Time to dockerize the Go application mentioned at the start of this post.
# syntax=docker/dockerfile:1.5
FROM golang:1.17-alpine AS builder
WORKDIR /app
COPY go.mod ./
RUN go mod download
COPY . .
RUN CGO_ENABLED=0 GOOS=linux GOARCH=amd64 go build -a -o /goapp
FROM scratch
COPY /goapp /
CMD ["/goapp"]
This Dockerfile introduces another new concept - multi-stage builds. Remember how each modification to the filesystem creates a new layer? This can make it difficult to write efficient Dockerfiles and often caused developers to write seperate Dockerfiles for development and production.
With multi-stage builds multiple FROM instructions can be used and artifacts can be selectively copied from one stage to another leaving behind anything unwanted.
Here the first stage takes care of compiling the Go application to a native binary which is then copied to a scratch image using COPY --from=builder /goapp /
. The result is an extremely minimal image containing nothing but “goapp”.
Build and run the image:
docker build . -t goapp
docker run --rm -p 8081:8080 goapp
The webserver should be available at http://localhost:8081/.
A last word on FROM scratch
- It’s efficient, but I prefer to add an extra 1-5Mb and use FROM busybox
to include some basic utilities like a shell. Also there is nothing wrong with using bigger base images like FROM debian
. Due to the layer architecture of docker, containers can reuse layers and use space in an efficient manner.
Conclusion
With the ability to write Dockerfiles you can deploy and run your applications in an isolated and predictable environment. To get the most out of docker you should look into docker compose next. This enables you to define and run multi-container docker applications, mount volumes, expose ports, define networks, setup dependencies and configure restart behaviour.