Prune Your Monorepo’s Docker Build

If your JavaScript monorepo deploys node modules, you may be deploying more than you need. Here’s how my software team and I slimmed ours down.

Monorepo and Workspaces

Though the term “monorepo” technically has a more general definition, in the JavaScript ecosystem it typically refers to a git repository holding a small number of related projects, usually organized with the “workspaces” feature of a popular package manager. This post (and my current project) use pnpm, but the same concepts apply to other package managers (yarn, npm).

Here’s an example workspace structure, holding a shared library and two apps:

├── package.json
├── pnpm-lock.yaml
├── shared-lib
│   └── package.json
├── cloud-app
│   └── package.json
└── client-app
    └── package.json

Note that the projects depend on each other: both client-app and cloud-app reference shared-lib.

Shared node_modules

Each project’s package.json file expresses its own set of third-party dependencies. For efficiency, package managers prefer to produce a single shared lock file and a single shared node_modules directory at the top level. Typically, when you run e.g. pnpm install, that will pull down all the dependencies for all the projects. This is usually what you want as a developer.

For distribution, though, this behavior is likely not what you want. Say you’re publishing client-app in the example above: there’s no need to bring along the third-party dependencies of cloud-app.

Side note: bundling. This post is about optimizing the node_modules you’re publishing. If your distributed artifacts don’t contain any, good for you! You don’t need this.

Pruning Projects

So, given a multi-workspace pnpm project, and an intent to build and publish a single project, how can we narrow this down? I’ve often wanted a command like package-manager install --subtree-beginning-with client-app, that would pull down dependencies for client-app and shared-lib, but exclude cloud-app. I’m sure this will exist someday, but as far as I know, it doesn’t yet.

Thankfully, Turbo has filled this need with a prune command:

> pnpm turbo prune client-app
Generating pruned monorepo for client-app in /Users/jrr/example-project/out
 - Added client-app
 - Added shared-lib

This produces a reduced copy of your repo, excluding unneeded projects. It even filters down the lockfile!

Pruning devDependencies

A second kind of pruning has been around longer: separating development dependencies from production dependencies. I’ve seen many projects that never bothered with this distinction, but if you’re at all sensitive to the size of your deployed node_modules, you should look into it. In short, package managers can be asked to skip installing devDependencies to avoid deploying development tools like compilers, test frameworks, etc.

I should warn you to be careful when you make this change in an existing project. The first time you slice off what you think are development dependencies, you may learn that some of them are, in fact, needed by the deployed app at runtime :).

Docker

On my current project, our Git repository contains two applications: one deployed to the cloud, and another running on devices. The device deployment artifacts are Docker images, with conspicuous file sizes and deploy times.

We recently applied both of the above pruning techniques to great effect, using a multistage Dockerfile based on Turbo’s guidance.

In hopes that this may be useful for your project, here’s a walk through the many stages of our Dockerfile structure:

Base

There’s not much here. This is the lowest common denominator of all the stages.


FROM node:20-bookworm-slim as base

WORKDIR /usr/src/app

RUN npm i -g [email protected]

Project Pruning

First, we copy in the project sources from disk, then run turbo prune to produce a couple of pruned subtrees of the project, and copy them into separate named stages:


FROM base as prune

RUN npm i -g [email protected]

COPY . .

RUN turbo prune --docker device-app

FROM base as pruned_project_files

COPY --from=prune /usr/src/app/out/json/ .

FROM base as pruned_sources

COPY --from=prune /usr/src/app/out/full/ .
COPY --from=prune /usr/src/app/tsconfig.json .
COPY --from=prune /usr/src/app/out/pnpm-lock.yaml .

node_modules

This is where the devDependencies vs dependencies installation happens. First, we start with the smaller set (--prod true):


FROM pruned_project_files as node_modules_for_run

RUN install_packages wget python3 make g++

RUN pnpm install --prod true

Then we derive another stage from that and install the rest (--prod false):


FROM node_modules_for_run as node_modules_for_build

RUN pnpm install --prod false --force

Build

To build the application, we take all the sources and copy in the needed node_modules.


FROM pruned_sources as build

COPY --from=node_modules_for_build /usr/src/app/node_modules /usr/src/app/node_modules
COPY --from=node_modules_for_build /usr/src/app/modules/shared/node_modules /usr/src/app/modules/shared/node_modules
COPY --from=node_modules_for_build /usr/src/app/apps/device/node_modules /usr/src/app/apps/device/node_modules

ARG example_build_input 
ENV EXAMPLE_BUILD_INPUT=${example_build_input}}

RUN pnpm turbo --filter device-app build

Run

The final stage copies in the smaller set of node_modules and the built artifacts.


FROM pruned_sources as run

RUN install_packages vim

ENV PORT=80

HEALTHCHECK --start-period=30s --timeout=30s --interval=30s --retries=3 \
    CMD curl --silent --fail localhost:80

COPY --from=node_modules_for_run /usr/src/app/node_modules /usr/src/app/node_modules
COPY --from=node_modules_for_run /usr/src/app/apps/device/node_modules /usr/src/app/apps/device/node_modules
COPY --from=node_modules_for_run /usr/src/app/modules/shared/node_modules /usr/src/app/modules/shared/node_modules

COPY --from=build /usr/src/app/apps/device/build /usr/src/app/apps/device/build

CMD ["pnpm", "--filter", "device-app", "start"]

So many stages

In case you lost track, here’s a diagram:

Yes, it’s a lot of stages, but they’re cheap. I value being able to name intermediate states, and they work nicely with Docker’s caching.

Results

Applying these optimizations to our Docker build reduced the image size by half (!). A working example of this structure can be found on GitHub at jrr/prune-monorepo-example.

Conversation

Join the conversation

Your email address will not be published. Required fields are marked *