CLI Reference
Baleen exposes two sub-commands:
Command
Purpose
baleen run
Full pipeline: read-ID intersection → krill eventalign → DTW → HMM → site aggregation.
baleen aggregate
Re-run HMM and/or site aggregation from a saved .pkl, skipping DTW.
baleen --help
baleen run --help
baleen aggregate --help
baleen run
Flag
Description
--native-bam
Native BAM (sorted + indexed).
--native-fastq
Native FASTQ (.fq.gz).
--native-blow5
Native BLOW5 signal.
--ivt-bam
IVT control BAM.
--ivt-fastq
IVT control FASTQ.
--ivt-blow5
IVT control BLOW5 signal.
--ref
Reference FASTA (indexed with .fai).
Output
Flag
Default
Description
-o, --output-dir
baleen_output
Output directory.
Pipeline parameters
Flag
Default
Description
--padding
1
Flanking positions concatenated into each DTW window.
--min-depth
15
Minimum depth per contig; interpretation set by --depth-mode.
--depth-mode
read_count
read_count: total mapped reads on the contig ≥ --min-depth. mean_coverage: per-base coverage averaged over all positions ≥ --min-depth.
--min-mapq
0
Minimum mapping quality.
--threads
8
Parallel workers for contig processing.
--target
—
Contig name, comma-separated list, or path to a file with one contig per line.
--keep-intermediate
off
Keep the per-contig intermediate directory after merge.
--resume
off
Reuse per-contig slices under <output_dir>/per_contig/. Aborts if the saved parameter fingerprint disagrees; starts fresh if missing.
--no-subsample
off
Disable per-condition, per-contig read subsampling.
--subsample-n
300
Max reads per condition per contig.
--legacy-scoring
off
Per-position EM calibration (less sensitive at low stoichiometry).
--mod-threshold
0.9
Per-read P(mod) above which a read is counted modified.
DTW options
Flag
Default
Description
--cuda [DEVICES]
auto-detect
CUDA device(s): 0, 0,1, 0-3, or all.
--no-cuda
off
Force the CPU backend.
--gpu-memory-limit BYTES
auto-detect
GPU memory budget for concurrent DTW workers.
HMM options
Flag
Default
Description
--hmm-params
3-state unsupervised
Path to a trained HMM parameters JSON. See HMM Training Modes .
--no-hmm
off
Skip HMM smoothing; output V2 scores only.
eventalign options
Flag
Default
Description
--pore
rna002
krill pore model for eventalign.
--no-rna
off
Disable RNA mode for eventalign.
--kmer-model
—
Reserved; currently unused by the krill engine.
Miscellaneous
Flag
Default
Description
--no-primary-only
off
Include secondary/supplementary alignments.
--keep-temp
off
Do not clean up temporary files.
--no-read-bam
off
Skip writing read_results.bam.
--no-read-intersection
off
Skip the BAM ∩ FASTQ ∩ BLOW5 read-ID intersection. See Inputs › Read-ID intersection .
Example
baleen run \
--native-bam native.bam --native-fastq native.fq.gz --native-blow5 native.blow5 \
--ivt-bam ivt.bam --ivt-fastq ivt.fq.gz --ivt-blow5 ivt.blow5 \
--ref ref.fa -o results/ \
--threads 16 --subsample-n 100
baleen aggregate
Re-runs the HMM and/or site aggregation from a saved pipeline pickle without
recomputing DTW — useful for sweeping --mod-threshold or applying trained HMM
parameters.
Flag
Default
Description
-i, --input
required
Saved pipeline results (.pkl).
-o, --output
required
Output TSV path.
--score-field
p_mod_hmm
Per-read score to aggregate: p_mod_hmm, p_mod_knn, or p_mod_raw.
--hmm-params
—
Trained HMM JSON; re-runs the HMM before aggregation.
--no-read-bam
off
Skip writing mod-BAM output.
--ref
—
Reference FASTA (required for mod-BAM output).
--native-bam
—
Native BAM (required for mod-BAM output).
--ivt-bam
—
IVT BAM (required for mod-BAM output).
--legacy-scoring
off
Per-position EM calibration (legacy).
--mod-threshold
0.9
Per-read P(mod) modified-call threshold.
Example
baleen aggregate \
-i results/pipeline_results.pkl \
-o results/sites.tsv \
--mod-threshold 0 .8