polars-bio

Community

Blazing-fast genomic interval ops on Polars

AuthorJosephWoodall
Version1.0.0
Installs0

System Documentation

What problem does it solve?

polars-bio provides a high-performance toolkit for genomic interval arithmetic and bioinformatics file I/O on Polars DataFrames, enabling fast overlap, nearest, merge, coverage, and related analyses with streaming and cloud-native capabilities to scale to large datasets.

Core Features & Use Cases

  • Genomic interval operations: overlap, nearest, merge, cluster, coverage, complement, and subtract for BED/VCF/BAM/GFF data.
  • Bioinformatics file I/O: read and write BED, VCF, BAM, CRAM, GFF/GTF, FASTA, and FASTQ with cloud storage and streaming support.
  • SQL data processing: register files as SQL tables and query them with DataFusion SQL to combine with Polars pipelines.
  • Pileup/depth: compute per-base or block depth from BAM/CRAM files with pb.depth.
  • API styles: functional pb.* operations and method-chaining via LazyFrame.pb for fluent pipelines.
  • Streaming/out-of-core: scan_* functions enable out-of-core processing for datasets larger than memory.
  • Cross-cloud I/O: direct cloud URIs for S3/GCS/Azure read/write.

Quick Start

Overlap two interval datasets with pb.overlap and collect the results.

Dependency Matrix

Required Modules

None required

Components

references

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: polars-bio
Download link: https://github.com/JosephWoodall/noosphere/archive/main.zip#polars-bio

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.