polars-bio
CommunityBlazing-fast genomic interval ops on Polars
AuthorJosephWoodall
Version1.0.0
Installs0
System Documentation
What problem does it solve?
polars-bio provides a high-performance toolkit for genomic interval arithmetic and bioinformatics file I/O on Polars DataFrames, enabling fast overlap, nearest, merge, coverage, and related analyses with streaming and cloud-native capabilities to scale to large datasets.
Core Features & Use Cases
- Genomic interval operations: overlap, nearest, merge, cluster, coverage, complement, and subtract for BED/VCF/BAM/GFF data.
- Bioinformatics file I/O: read and write BED, VCF, BAM, CRAM, GFF/GTF, FASTA, and FASTQ with cloud storage and streaming support.
- SQL data processing: register files as SQL tables and query them with DataFusion SQL to combine with Polars pipelines.
- Pileup/depth: compute per-base or block depth from BAM/CRAM files with pb.depth.
- API styles: functional pb.* operations and method-chaining via LazyFrame.pb for fluent pipelines.
- Streaming/out-of-core: scan_* functions enable out-of-core processing for datasets larger than memory.
- Cross-cloud I/O: direct cloud URIs for S3/GCS/Azure read/write.
Quick Start
Overlap two interval datasets with pb.overlap and collect the results.
Dependency Matrix
Required Modules
None requiredComponents
references
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: polars-bio Download link: https://github.com/JosephWoodall/noosphere/archive/main.zip#polars-bio Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.