Published on

Containerized Database Development Workflow

Authors
  • avatar
    Name
    David Gordon
    Twitter

Resources:

  1. Postgres Docker Container Documentation
  2. Postgres SQL Dump
  3. Postgres pg_dump
  4. Podman Command Documentation
    • though I will be using Podman, many of the same commands work identically in Docker. You'll want to refer to Docker's documentation to be sure however.

In this post I will be covering a basic workflow for developing a database using containers. The workflow is essentially just starting a container of our chosen database sever, developing as we need to, then dumping the schema to a file on the host machine that is suitable for a version control system like git. I am using PostgreSQL here and while other SQL-like databases will likely involve many of the same processes there will be some differences. As always, refer to your documentation. Document or Key-Value databases may work in a completely different manner in terms of schema development (or lack thereof) and file backup options (or lack thereof).

Starting a Container

podman run --name testdb -d -e POSTGRES_PASSWORD=root -p 5432:5432 postgres

Take note of the following options:

  • d: start the container in detatched mode, this allows us to reuse the shell we are working in.
  • --name testdb: Podman (and other OCI runtimes like Docker) give containers randomized names when they are created, we can choose now to give our container a name. This name can be used to refer to the container in future operations (like starting, stopping, removing, etc.).
  • -e POSTGRES_PASSWORD=root: We are setting an environment variable for the container. You'll need to refer back to the container's documentation to see what environment variables are need, are supported, and what they do. In this case, this environment variable sets a password for the database server's admin user.
  • -p 5432:5432: Postgres, by default, listens to port 5432. We can tell our container runtime to allow this port to be open. Optionally we can map any host port to any internal container port using the following syntax: <HOST>:<CONTAINER>. This is nice in the case that we are running many instances of containers that internally listen to the same port.

Open a client shell (or make any other admin connection)

If you prefer or need something other than a shell (like SSRS or something simillar), you can connect to this running instance in the same way as any other given over localhost or network. In this case, I am just using psql, a PostgreSQL shell client. PGAdmin is a nice alternative if you need a gui DBMS for PostgreSQL.

psql -U postgres -d postgres -p 5432 -h localhost

  • -U postgres: This option sets the desired username to connect as. By default, the PostgreSQL container sets a postgres username (this is the default admin username).
  • -d postgres: This option sets the desired database to connect to. By default, PosgreSQL has a postgres database created.
  • -p 5432: This sets the port to connect through.
  • -h localhost: This is the hostname used in the connection.

At this point the database can be developed and used as normal.

Restarting a Container

The state of the filesystem in our development container is saved unless we specifically remove the container. What this means is that if we stop the running container and/or shutdown the machine we're working on, the container will still be in the same state. We will simply need to restart the container to continue where we left off:

podman start testdb

Backup and Recreating the Database

We will likely want to take snapshots of the database as we develop. What I'm proposing here is a manual workflow, which has the primary disadvantage of needing to be done manually to get any benefit from it. There are many ways to automate this process and how you choose to automate will likely depend on many other factors.

Backing up the database is as simple as running a pg_dump command from the host machine. Note that pg_dump is essentially a PostgreSQL client that can connect over a network, and can be used in much the same way as psql.

pg_dump -s -h localhost -p 5432 -U postgres -d postgres > my_db_dump.sql

Many of the options and commands are the same as psql, with the following changes:

  • -s: This runs the command in schema only mode, i.e. it does not include data. By default, pg_dump will dump both schema and data. To include only the schema use the -s option, and to include only data use the --data-only option.
  • > my_db_dump.sql: By default pg_dump will output the file over STDOUT. This allows you to use the output in any number of ways on a unix system (tee, |, <(), etc.). Here I am simple creating (or overwriting if it exists) a file called my_db_dump.sql.

Example Workflow: A Workout Database

Imagine we are designing a simple system to log a workout plan. We wish to keep track of sets of exercises -- like 'cardio', 'benchpress', 'deadlift' -- and link them with a set of workouts -- we can use days of the week like 'tuesday workout' or 'friday weightlifting'. At this stage, we know we need the following schema:

workouts

rowtype
iduuid
namevarchar(50)

exercises

rowtype
iduuid
namevarchar(50)

workout_exercises

rowtype
idserial
workoutuuid references exercises(id)
exerciseuuid references workouts(id)

We can spin up our development container:

podman start testdb (I've created this container already)

Then shell into the container and being issuing commands and schema creations:

host$: psql -U postgres -h localhost -p 5432

psql$: CREATE DATABASE workoutdb;

psql$: \c workoutdb

psql$: CREATE TABLE workouts(
	id UUID PRIMARY KEY, 
	name VARCHAR(50));

psql$: CREATE TABLE exercises(
	id UUID PRIMARY KEY, 
	name VARCHAR(50));

psql$: CREATE TABLE workout_exercises(
	id UUID PRIMARY KEY, 
	workout UUID REFERENCES workouts(id), 
	exercise UUID REFERENCES exercises(id));

psql$: CREATE VIEW workout_exercises_v AS
	SELECT
		workouts.name as Workout,
		exercises.name as Exercise
	FROM workouts_exercises
	JOIN	workouts
		ON workouts_exercises.workouts=workouts.id
	JOIN exercises
		on workout_exercises.exercise=exercises.id;

psql$: CREATE PROCEDURE workout_exercises_insert
	(workoutName varchar, exerciseName varchar)
	LANGUAGE SQL
	AS $$
		INSERT INTO
			workout_exercises (workout, exercise)
		SELECT workouts.id, exercises.id
			FROM workouts
			JOIN exercises
			ON workouts.name=$1 AND excersises.name=$2;
	$$;

Once we've put our simple database together, we simple exit the psql client (or open a separate host shell), and issue a pg_dump command. After this the file can be added to a git repository like any other bit of source code.

psql$ \q

host$ pg_dump -U postgres -d workoutdb -h localhost -p 5432 -s > workout_db_schema.sql