18 Sep 23

Introduction to DevOps for experimental research

I recently gave a full-day training session called Introduction to DevOps for experimental research with several colleagues.

Why DevOps in research?

At first glance, DevOps and experimental research in computer science have widely different goals: most of the time, researchers perform experiments using software but they don't have to maintain production systems in the long run!

However, we do have to build, deploy and configure complex software ecosystems to be able to do experimental research. In this process, we often need very fine control over the deployment process, and we need to be extra sure about our results. Imagine: you measure the impact of a specific parameter of a PostgreSQL database on the performance of various applications, and you happily write and publish a paper about it. Two years later, somebody writes you an email stating that they obtain completely different performance results with this parameter. What happened? Are you sure that you didn't mess up your deployment or forgot to restart the database after changing its configuration? Do you still have your deployment scripts around, and can you still get them to run in the same conditions? Maybe a newer version of PostgreSQL changes the performance results completely?

I believe that these kind of problems in research can (partially) be solved by adapting ideas and tools from the DevOps ecosystem. In particular, automation is a core concept in DevOps and is definitely 100% necessary in research: this is the first step to improve the quality of research works that are based on deploying complex software.

DevOps tools in research

Researchers are more and more using classical DevOps tools: Git, Docker, Terraform, Kubernetes, Ansible... For general automation, Gitlab CI is particularly appreciated in research because it can be run on independant on-premise infrastucture, which is often a requirement for publicly-funded research (see this Gitlab CI gallery for many examples)

But researchers also develop tools that are specific to research problems. To give just a few examples:

Kameleon: a very flexible tool to build customized software appliance (VM images, Docker images, bare-metal images...). Similar to Packer but much more flexible.
EnOSlib: a Python library allowing to easily provision hardware resources on research platforms, and then configure them through Ansible
scientific workflow engines such as Nextflow, Pydra, SnakeMake and many others

Focus on reproducibility

A specific concept we need in research is reproducibility. Reproducibility means that somebody else should be able to take your code and your data and use it to reproduce your research results. It looks simple, but this is very hard to achieve in practice: your results also depend on the software environment (OS, libraries, compiler, interpreter...) and the hardware you run it on. In addition, this environment evolves very rapidly: if you write moderately complex Python code today, there is a good chance that you won't be able to run it again five years from now. The language itself and its interpreter will have changed, your OS will have new versions of everything, your Python dependencies will no longer exist or will have broken compatibility, etc.

DevOps tools are not specifically designed for reproducibility: sure, you can build a Docker image with your code, its dependencies, and the right version of the Python interpreter. This image will probably still run fine in five years. But what if you or somebody else wants to extend this work, for instance changing a small parameter or updating a single dependency? In a production system, how would you update dependencies to fix known security issues that were uncovered in the last five years? For both problems, you would need to rebuild the Docker image: hopefully, you still have a Dockerfile around, but it will most likely fail to build (see the examples in the training material below).

If you think about it, DevOps is all about deploying and iterating as fast as possible, and this does not particularly encourage reproduciblity. In a modern production systems where a new version is deployed every day, why would you even try to rebuild your software stack from two years ago?

However, the DevOps tooling situation is improving: tools that focus on full reproducibility such as Nix and Guix are getting mainstream, and initiatives around the "Software Supply Chain" such as OpenSSF are targeting the provenance of software dependencies, which helps improve reproducibility.

Training material

This blog article is purposefully short: you can find all material used for the training here. This includes slides but also example code for several of the tools that were demonstrated: Docker, Packer, Nix, Guix, Ansible, Terraform and EnOSlib. Many thanks to the colleagues that contributed to the training and the associated material.

The training does not go a lot in-depth into each tool: the goal is to give an overview of the ecosystem and understand what each DevOps tool can and cannot do. It is then up to you to choose the right tool for your specific research problem, and then learn to use this tool.