Docker¶
Baleen ships two Dockerfiles and a CI workflow that builds and pushes both
images on every push to main/dev. Both variants live in a single
repository py-baleen; the variant is a tag suffix (-cpu / -gpu):
| Dockerfile | Tag suffix | Base |
|---|---|---|
Dockerfile.cpu |
-cpu |
python:3.11-slim, krill CPU wheel. |
Dockerfile.gpu |
-gpu |
nvidia/cuda:12.2.2-runtime-ubuntu22.04, krill cu122 GPU wheel. |
Tags follow <ref>-<variant>: latest-* is published only from main;
branch (dev-*) and long-SHA tags are published for every build. Both images
bundle the krill engine and slow5tools, and set
ENTRYPOINT ["baleen"] with a /data working directory.
Published to two registries:
- Docker Hub —
btrspg/py-baleen - GHCR (public) —
ghcr.io/loganylchen/py-baleen
Pull a published image¶
# Docker Hub
docker pull btrspg/py-baleen:latest-cpu
docker pull btrspg/py-baleen:latest-gpu # requires the NVIDIA Container Toolkit
docker pull btrspg/py-baleen:dev-gpu # latest dev build
# GHCR (public)
docker pull ghcr.io/loganylchen/py-baleen:latest-gpu
Build locally¶
If you prefer to build from source — or are running a fork — build the Dockerfile directly:
# CPU
docker build -f Dockerfile.cpu -t py-baleen:cpu .
# GPU
docker build -f Dockerfile.gpu -t py-baleen:gpu .
Both builds are pure Python (no C-extension compilation): they pip install
baleen, then install the appropriate krill wheel (CPU vs cu122) from the
project index. The GPU image's krill is GPU-capable only at run time when a
device is visible — see the verification step below.
Run the pipeline in a container¶
The entrypoint is baleen, so pass sub-command arguments directly. Mount your
data into the container's /data working directory:
# CPU
docker run --rm \
-v "$PWD":/data \
py-baleen:cpu run \
--native-bam native.bam --native-fastq native.fq.gz --native-blow5 native.blow5 \
--ivt-bam ivt.bam --ivt-fastq ivt.fq.gz --ivt-blow5 ivt.blow5 \
--ref ref.fa -o results/
# GPU — add --gpus all
docker run --rm --gpus all \
-v "$PWD":/data \
py-baleen:gpu run \
--native-bam native.bam --native-fastq native.fq.gz --native-blow5 native.blow5 \
--ivt-bam ivt.bam --ivt-fastq ivt.fq.gz --ivt-blow5 ivt.blow5 \
--ref ref.fa -o results/
File ownership
Add -u $(id -u):$(id -g) so output files under results/ are owned by your
host user rather than root.
Verify the GPU image sees the device¶
docker run --rm --gpus all --entrypoint python3 py-baleen:gpu \
-c "from baleen._dtw import backend, is_available; \
print('backend:', backend(), 'gpu:', is_available())"
# Expected: backend: gpu gpu: True
If it prints backend: cpu, the container cannot see the GPU — check the
NVIDIA Container Toolkit installation and that you passed --gpus all.