Hello, and welcome!
A while back, I signed up for a course on High-Performance Computing with Containers through NVIDIA’s Deep Learning Institute (note this course was discontinued in July and isn’t available anymore). However, I only just recently finished working through the material presented in a computational notebook running in a GPU-enabled cloud computing instance. The course introduced me to NVIDIA’s HPC Container Maker (or HPCCM). This open-source tool allows you to specify (HPC) container recipes as Python scripts, which HPCCM can then translate to Docker and Singularity container instruction/definition files. I think this is pretty cool because you only need to specify a single HPCCM recipe that can be used for both container platforms. I also appreciate the flexibility of the programmatic approach that allows you to embed other Python operations in your recipe if needed (e.g., parsing user input or reading data from files).
In this post, I’ll introduce you to HPCCM and apply it to generating container specifications for the GAlibrate package. The most recent 0.7.z releases of this package have a new option for a Julia-accelerated backend, which I touched on in my last post. Since it combines Python and Julia programming environments, I thought it would be a good scenario for containerization that could be used to demonstrate HPCCM.
Contents
- What you need to follow along
- Quick intro to HPC Container Maker
- Defining our HPCCM recipe
- Creating a Dockerfile and Building a Docker image
- Testing out our Docker container.
- Closing Thoughts
- Acknowledgements
What you need to follow along
If you want to follow along, you’ll need to have Python 3 with hpccm installed in your Python environment:
pip install hpccm
You’ll also need Docker, which can be installed from the docker website.
Computational Notebook Version
Content in this post was exported from a Jupyter IPython notebook, which is available at: https://github.com/blakeaw/blog-posts/blob/new-post/hpccm-intro/notebooks/0003_hpccm-intro/post.ipynb
Quick intro to HPC Container Maker
As noted in the introduction, HPC Container Maker (or HPCCM)is an open-source tool that allows you to specify (HPC) container recipes as Python scripts. HPCCM can then translate the recipes to Docker and Singularity container instruction/definition files, so you only need to specify a single HPCCM recipe that can be used for both container platforms. Another nice thing about HPCCM is that it automatically applies various best practices in the generated container instruction files, so you don’t have to worry about optimizing them yourself.
HPCCM has two primary components used to build recipes: primitives and building blocks. Primitives are more basic operations such as base image specification and file operations (e.g., copying files into the container). Building Blocks are bundled instruction sets that take care of building/installing a specific tool, package, or library in the container, such as GNU compilers and OpenMPI.
There are two different ways you can use HPCCM. The simplest, and the way we will use it in this post, is to define a simplified recipe file (without all the boilerplate for importing primitives and building blocks and defining the stage objects) that is parsed by the hpccm
command line interface (CLI) to generate a container specification. The second is to use the hpccm
module (i.e., import hpccm
), which gives you more programmatic flexibility but requires the user to manage input/output. You can check out the Getting Started with HPC Container Maker page for more on the two different ways to use HPCCM.
Defining our HPCCM recipe
This section will cover making the HPCCM recipe for the GAlibrate package. We want a container with Python and Julia programming environments so that GAlibrate can operate with its Julia-accelerated backend. I’ve already written the recipe in hpccm_recipe.py, so let’s take a look and then we will go over each part:
!cat hpccm_recipe.py
"""HPCCM Recipe for GAlibrate with Julia backend acceleration.
GAlibrate source:
https://github.com/blakeaw/GAlibrate
Usage:
$ hpccm.py --recipe hpccm_recipe.py --format docker
# hpccm.py --recipe hpccm_recipe --format singularity
"""
# Choose a base image - Ubuntu
Stage0 += baseimage(image="ubuntu:16.04") # primitive
# Install Conda with dependencies.
Stage0 += conda(
eula=True, packages=["python=3.10.11", "pip", "numpy=1.23.5", "scipy"]
) # building block
# Install GAlibrate version 0.7.2 with Julia integration
# and PyJulia (julia on PyPI)
Stage0 += shell(
commands=[
". ~/.bashrc", # Explicitly activate the bashrc
"conda activate base", # Activate the conda environment
"pip --no-cache-dir install https://github.com/blakeaw/GAlibrate/archive/refs/tags/v0.7.2.zip", # Install GAlibrate
"pip --no-cache-dir install julia==0.6.1", # Install PyJulia
]
) # primitive
# Install Julia programming environment - PyJulia needs
# the PyCall.jl package, so we install that here.
Stage0 += julia(version="1.9.2", packages=["PyCall"]) # building block
1.
The first step of any HPCCM recipe is specifying the base image we want for our container. I’ve used Ubuntu the most in recent years, so I opted for the Ubuntu 16.04 image available on Docker hub:
Stage0 += baseimage(image="ubuntu:16.04")
Note that baseimage
is an HPCCM primitive component. Stage0
represents the container stage we are adding all the components to. HPCCM allows you to write multistage recipes if needed, but in this case, it’s just boilerplate since we aren’t using multiple stages.
2.
Next, I opted to use Conda to manage the Python environment in the container since that is what I typically use on my local machine. For this, we can use the conda
building block:
Stage0 += conda(
eula=True, packages=["python=3.10.11", "pip", "numpy=1.23.5", "scipy"]
)
I used the optional packages
list to specify the Python version (3.10.11) and other dependencies (pip, numpy, scipy). The eula
option stands for End-User License Agreement; setting it to True
means you accept it, which is required. HPCCM also has a python
building block, but you can’t pin the exact Python version it installs since it just pulls from the upstream Linux distro.
3.
Then, I used the shell
primitive to specify some additional shell commands to install GAlibrate and PyJulia:
Stage0 += shell(
commands=[
". ~/.bashrc", # Explicitly activate the bashrc
"conda activate base", # Activate the conda environment
"pip --no-cache-dir install https://github.com/blakeaw/GAlibrate/archive/refs/tags/v0.7.2.zip", # Install GAlibrate
"pip --no-cache-dir install julia==0.6.1", # Install PyJulia
]
)
4.
Lastly, I used the julia
building block to incorporate the Julia programming environment:
Stage0 += julia(version="1.9.2", packages=["PyCall"])
I included PyCall
in the optional packages
list so it would installed in the Julia environment; PyJulia
requires PyCall
on the Julia side.
Creating a Dockerfile and Building a Docker image
In this section, we’ll start out by using the hpccm
CLI to convert our recipe into a Dockerfile:
!hpccm --recipe hpccm_recipe.py --format docker > Dockerfile
And, then we can view the contents of the Dockerfile:
!cat Dockerfile
FROM ubuntu:16.04
# Anaconda
RUN apt-get update -y && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
ca-certificates \
wget && \
rm -rf /var/lib/apt/lists/*
RUN mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp http://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh && \
bash /var/tmp/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh -b -p /usr/local/anaconda && \
/usr/local/anaconda/bin/conda init && \
ln -s /usr/local/anaconda/etc/profile.d/conda.sh /etc/profile.d/conda.sh && \
. /usr/local/anaconda/etc/profile.d/conda.sh && \
conda activate base && \
conda install -y numpy=1.23.5 pip python=3.10.11 scipy && \
/usr/local/anaconda/bin/conda clean -afy && \
rm -rf /var/tmp/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
RUN . ~/.bashrc && \
conda activate base && \
pip --no-cache-dir install https://github.com/blakeaw/GAlibrate/archive/refs/tags/v0.7.2.zip && \
pip --no-cache-dir install julia==0.6.1
# Julia version 1.9.2
RUN apt-get update -y && \
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
tar \
wget && \
rm -rf /var/lib/apt/lists/*
RUN mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp https://julialang-s3.julialang.org/bin/linux/x64/1.9/julia-1.9.2-linux-x86_64.tar.gz && \
mkdir -p /var/tmp && tar -x -f /var/tmp/julia-1.9.2-linux-x86_64.tar.gz -C /var/tmp -z && \
cp -a /var/tmp/julia-1.9.2 /usr/local/julia && \
JULIA_DEPOT_PATH=/usr/local/julia/share/julia /usr/local/julia/bin/julia -e 'using Pkg; Pkg.add([PackageSpec(name="PyCall")])' && \
rm -rf /var/tmp/julia-1.9.2-linux-x86_64.tar.gz /var/tmp/julia-1.9.2
ENV LD_LIBRARY_PATH=/usr/local/julia/lib:$LD_LIBRARY_PATH \
PATH=/usr/local/julia/bin:$PATH
Just to go ahead and demonstrate how easy it is to switch we can generate the Singularity version too (we just won’t redirect the output to a file):
!hpccm --recipe hpccm_recipe.py --format singularity
BootStrap: docker
From: ubuntu:16.04
%post
. /.singularity.d/env/10-docker*.sh
# Anaconda
%post
apt-get update -y
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
ca-certificates \
wget
rm -rf /var/lib/apt/lists/*
%post
cd /
mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp http://repo.anaconda.com/miniconda/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
bash /var/tmp/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh -b -p /usr/local/anaconda
/usr/local/anaconda/bin/conda init
ln -s /usr/local/anaconda/etc/profile.d/conda.sh /etc/profile.d/conda.sh
. /usr/local/anaconda/etc/profile.d/conda.sh
conda activate base
conda install -y numpy=1.23.5 pip python=3.10.11 scipy
/usr/local/anaconda/bin/conda clean -afy
rm -rf /var/tmp/Miniconda3-py310_23.1.0-1-Linux-x86_64.sh
%post
cd /
. ~/.bashrc
conda activate base
pip --no-cache-dir install https://github.com/blakeaw/GAlibrate/archive/refs/tags/v0.7.2.zip
pip --no-cache-dir install julia==0.6.1
# Julia version 1.9.2
%post
apt-get update -y
DEBIAN_FRONTEND=noninteractive apt-get install -y --no-install-recommends \
tar \
wget
rm -rf /var/lib/apt/lists/*
%post
cd /
mkdir -p /var/tmp && wget -q -nc --no-check-certificate -P /var/tmp https://julialang-s3.julialang.org/bin/linux/x64/1.9/julia-1.9.2-linux-x86_64.tar.gz
mkdir -p /var/tmp && tar -x -f /var/tmp/julia-1.9.2-linux-x86_64.tar.gz -C /var/tmp -z
cp -a /var/tmp/julia-1.9.2 /usr/local/julia
JULIA_DEPOT_PATH=/usr/local/julia/share/julia /usr/local/julia/bin/julia -e 'using Pkg; Pkg.add([PackageSpec(name="PyCall")])'
rm -rf /var/tmp/julia-1.9.2-linux-x86_64.tar.gz /var/tmp/julia-1.9.2
%environment
export LD_LIBRARY_PATH=/usr/local/julia/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/julia/bin:$PATH
%post
export LD_LIBRARY_PATH=/usr/local/julia/lib:$LD_LIBRARY_PATH
export PATH=/usr/local/julia/bin:$PATH
Now that we have Dockerfile ready, we can generate our container image using the docker build
command:
!docker build -t hpccm/galibrate-julia .
#0 building with "desktop-linux" instance using docker driver
#1 [internal] load build definition from Dockerfile
#1 transferring dockerfile: 1.93kB done
#1 DONE 0.0s
#2 [internal] load metadata for docker.io/library/ubuntu:16.04
#2 DONE 0.9s
#3 [internal] load .dockerignore
#3 transferring context: 2B done
#3 DONE 0.0s
#4 [1/6] FROM docker.io/library/ubuntu:16.04@sha256:1f1a2d56de1d604801a9671f301190704c25d604a416f59e03c04f5c6ffee0d6
...
...
...
#9 100.8 9 dependencies successfully precompiled in 33 seconds. 3 already precompiled.
#9 DONE 101.5s
#10 exporting to image
#10 exporting layers
#10 exporting layers 9.2s done
#10 writing image sha256:2245059940e33abf9710e39c3f1c101eadcc9827a03ca226e5906482a7c7bbca done
#10 naming to docker.io/hpccm/galibrate-julia done
#10 DONE 9.2s
View build details: docker-desktop://dashboard/build/desktop-linux/desktop-linux/c81cwrpr08tgl2qmh50ji89h9
[33m1 warning found (use --debug to expand):
[0m - UndefinedVar: Usage of undefined variable '$LD_LIBRARY_PATH' (line 35)
And then check the list of images using the docker images
command:
!docker images
REPOSITORY TAG IMAGE ID CREATED SIZE
hpccm/galibrate-julia latest 2245059940e3 9 seconds ago 3.05GB
Testing out our Docker container
Now that we have our Docker container image ready, we need to check that GAlibrate will run with Julia backend support. For this part, I used Powershell.
I executed the following call to docker run
to start the container:
docker run -it --name hpccm-demo hpccm/galibrate-julia
This will start the container and bring up an interactive shell in it.
Next, from the container shell I ran python -c 'from galibrate import GAO'
to test importing the optimizer object (GAO
) from GAlibrate:
This produced output:
We can see from the highlighted bit that we got the RuntimeWarning
alerting us that the optimizer will run with Julia optimization, which means it is working as expected.
You can quit the container by issuing the exit
command.
Closing Thoughts
With its library of HPC building blocks, HPC Container Maker (HPCCM) is a great tool for programming container recipes in Python. It provides a flexible alternative to working directly with container specifications, and it can be used to generate container files in both Docker and Singularity formats. In this post, we reviewed an example HPCCM recipe for the GAlibrate package that combines Python and Julia programming environments. Then, we used the recipe to generate a Dockerfile, which we built and tested to ensure our container was successfully configured to allow the GAlibrate genetic algorithm optimizer to run with Julia acceleration.
I hope this post was helpful for all you Python, containerization, and HPC enthusiasts out there. And, if you’re not one of those but still found your way here, I hope you also learned something new!
Well, that’s all I have to say for now. Thanks for reading, and have a nice day! Feel free to message me if you have any questions or want to discuss. I’m also open to feedback and suggestions for future posts, so if there is something you think I should cover, let me know! You can reach me by email or feel free to hit me up on LinkedIn.
Lastly, please share this post with anyone else who might be interested. It is a massive help to me, as it increases the blog’s visibility and can help more people discover my work. Also, be sure to come back for future posts.
Until next time – Blake
Acknowledgements
Grammarly was used for proofreading and editing.
Source code images were generated using carbon, and GIMP was used for image editing.
Like this content? You can follow this blog and get updated about new posts via my blog’s RSS/Atom Feed.
If you are so inclined, you can also be a financial supporter of my open-source work through Ko-fi:
Other related posts you might like:
Embedding Julia in a Python-based Genetic Algorithm Optimizer using PyJulia
Harnessing the speed of Julia in Python with PyJulia