Read I/O¶
Writing and loading per-read modification calls in mod-BAM format. For the tag layout see Outputs › read_results.bam.
Writing¶
write_mod_bam
¶
write_mod_bam(
hierarchical_results: dict[
str, "ContigModificationResult"
],
native_bam: PathLike,
ivt_bam: PathLike,
ref_fasta: PathLike,
output_path: PathLike,
) -> Path
Write mod-BAM with MM/ML tags from HMM results + original BAM reads.
Compatibility wrapper: loops :func:flush_contig_to_bam over each
contig into a tempdir, then calls :func:merge_contig_bams.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
hierarchical_results
|
dict[str, 'ContigModificationResult']
|
Per-contig HMM pipeline output (contains p_mod_hmm arrays). |
required |
native_bam
|
PathLike
|
Path to the original native BAM file. |
required |
ivt_bam
|
PathLike
|
Path to the original IVT control BAM file. |
required |
ref_fasta
|
PathLike
|
Reference FASTA (used for BAM header). |
required |
output_path
|
PathLike
|
Destination path for the output BAM file. |
required |
Returns:
| Type | Description |
|---|---|
Path
|
Path to the written (sorted, indexed) BAM file. |
Source code in baleen/eventalign/_read_bam.py
flush_contig_to_bam
¶
flush_contig_to_bam(
cmr: "ContigModificationResult",
native_bam: PathLike,
ivt_bam: PathLike,
header: AlignmentHeader,
out_path: PathLike,
) -> Path
Write a single contig's read-level mod calls to one coordinate-sorted BAM slice.
Uses fetch(contig) to scan only reads from this contig in the input
BAMs — requires the input BAMs to be indexed. Native and IVT reads are
written interleaved, then pysam.sort orders the slice by coordinate
before the atomic rename to out_path.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
cmr
|
'ContigModificationResult'
|
|
required |
native_bam
|
PathLike
|
Original input BAM paths. Must be indexed. |
required |
ivt_bam
|
PathLike
|
Original input BAM paths. Must be indexed. |
required |
header
|
AlignmentHeader
|
Output BAM header (built once via :func: |
required |
out_path
|
PathLike
|
Destination path for the per-contig BAM slice (coordinate-sorted). |
required |
Returns:
| Type | Description |
|---|---|
Path
|
The written per-contig BAM path. |
Source code in baleen/eventalign/_read_bam.py
merge_contig_bams
¶
merge_contig_bams(
per_contig_bams: list[Path],
output_path: PathLike,
threads: int = 4,
batch_size: int = _MERGE_BATCH_SIZE,
) -> Path
Merge a list of per-contig BAM slices into one sorted, indexed BAM.
Uses samtools merge — each per-contig slice has already been
coordinate-sorted by :func:flush_contig_to_bam (which calls
pysam.sort internally), so a streaming k-way merge produces a
globally sorted output without the cost of a whole-file re-sort.
For large transcriptomes (thousands of contigs), opening every input
BAM in one samtools merge call hits filesystem-level limits
(file-descriptor limits, NFS handle caps, FUSE quirks). We therefore
merge in batches of batch_size into intermediate BAMs in a
tempdir, then merge those intermediates — repeating until a single
merge call covers the remaining inputs. Output is bit-identical to
a flat one-shot merge.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
per_contig_bams
|
list[Path]
|
Sorted (alphabetically by contig name) list of per-contig BAMs. |
required |
output_path
|
PathLike
|
Destination path for the merged BAM. |
required |
threads
|
int
|
Merge threads ( |
4
|
batch_size
|
int
|
Maximum number of inputs per |
_MERGE_BATCH_SIZE
|
Returns:
| Type | Description |
|---|---|
Path
|
Final BAM path (or output_path if input list is empty). |
Source code in baleen/eventalign/_read_bam.py
162 163 164 165 166 167 168 169 170 171 172 173 174 175 176 177 178 179 180 181 182 183 184 185 186 187 188 189 190 191 192 193 194 195 196 197 198 199 200 201 202 203 204 205 206 207 208 209 210 211 212 213 214 215 216 217 218 219 220 221 222 223 224 225 226 227 228 229 230 231 232 233 234 235 236 237 238 239 240 241 242 243 244 245 246 247 248 249 250 251 252 253 254 255 256 257 258 259 260 261 262 263 264 265 266 267 268 269 270 271 272 273 | |
Loading¶
load_read_results
¶
load_read_results(
bam_path: PathLike,
contig: str | None = None,
start: int | None = None,
end: int | None = None,
) -> "pd.DataFrame"
Load read-level results into a DataFrame.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bam_path
|
PathLike
|
Path to the mod-BAM file. |
required |
contig
|
str | None
|
Filter to this contig (optional). |
None
|
start
|
int | None
|
Filter to this region within contig (0-based, optional). |
None
|
end
|
int | None
|
Filter to this region within contig (0-based, optional). |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
Columns: contig, position, read_name, is_native, p_mod_hmm. |
Source code in baleen/eventalign/_read_bam.py
load_read_results_iter
¶
load_read_results_iter(
bam_path: PathLike,
contig: str | None = None,
start: int | None = None,
end: int | None = None,
) -> Iterator[dict[str, Any]]
Iterate read-level results as dicts from a mod-BAM file.
Parses MM:Z / ML:B:C tags to reconstruct per-position modification probabilities. Falls back to legacy MP:f tag format if MM is absent.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
bam_path
|
PathLike
|
Path to the mod-BAM file. |
required |
contig
|
str | None
|
Optional region filter (0-based coordinates). |
None
|
start
|
str | None
|
Optional region filter (0-based coordinates). |
None
|
end
|
str | None
|
Optional region filter (0-based coordinates). |
None
|
Yields:
| Type | Description |
|---|---|
dict
|
Keys: contig, position, read_name, is_native, p_mod_hmm. |
Source code in baleen/eventalign/_read_bam.py
506 507 508 509 510 511 512 513 514 515 516 517 518 519 520 521 522 523 524 525 526 527 528 529 530 531 532 533 534 535 536 537 538 539 540 541 542 543 544 545 546 547 548 549 550 551 552 553 554 555 556 557 558 559 560 561 562 563 564 565 566 567 568 569 570 571 572 573 574 575 576 577 578 579 580 581 582 583 584 585 586 587 588 589 590 591 592 593 594 595 596 597 598 599 600 601 602 603 604 605 606 607 608 609 610 611 612 613 614 615 | |