1 Comment

A Docker Pipeline for Mirroring & Archiving a Website

I monitor a handful of websites with some critical information on them. I also have a Synology NAS. Here’s how I created a pipeline for mirroring and archiving these key sites onto my NAS.

My goal for the website backups is to have zipped archives created once a week. The Synology task scheduler software can take care of the weekly jobs. I’ve also chosen to build my backup process on Docker, as Synology’s Docker plugin manages it nicely.

The Process

For any given site, the backup and archival pipeline goes like this:

  1. Use HTTrack to mirror the website to the local filesystem.
  2. Bundle the mirrored content into a zip with 7-Zip.
  3. Move the zip artifact to my backup directory.

A visual representation of the mirror and archive workflow.
A visual representation of the mirror and archive workflow. Image credit : me.

To facilitate the above, I’ve created a shell script for the Synology task scheduler to run. The script:

  1. Uses the ralfbs/httrack image to mirror the website to a local, private Docker volume.
  2. Uses the crazy-max/7zip image to compress the website content into a single artifact.
  3. Uses the Alpine Linux 3.8 image to move the artifact to the host filesystem.

The script itself is not particularly interesting–it’s essentially three docker run commands for the above steps, plus a few lines of supporting code for configuration.

Tradeoffs and Considerations

No tool like this is perfect, and I’ve chosen to make some compromises and arbitrary decisions.

Using Docker

My use of Docker for this purpose is arguably inappropriate; a script running local utilities would be just fine.

Ultimately, I chose to go with Docker for two reasons:

  1. It’s a self-contained way of running arbitrary packages on the Synology NAS.
  2. It’s given me a chance to fiddle with Docker.

Use of a private volume instead of the host filesystem

Private volumes are probably overkill for this application, but I like using them for two reasons:

  1. I like the way the private volume helps keep intermediate artifacts self-contained within the environment of the script. There are no leftover intermediate files sitting in my way.
  2. It’s given me a chance to fiddle with private volumes in Docker.

In both cases, I feel like reason #2 justifies my decisions.

Conclusion

Overall, I’m pretty happy with the way my mirroring and archival pipeline turned out. Not only does it serve an important need, but it gave me some experience with Docker.

More importantly, the experience gives me confidence in speaking with my colleagues about Docker day-to-day.