pyspark-structured-streaming
CommunityBuild reliable real-time pipelines with PySpark
Software Engineering#kafka#pyspark#delta lake#structured streaming#watermarking#foreachbatch#stateful aggregation
Authorivanshamaev
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps you design and operate PySpark Structured Streaming pipelines for reliable real-time processing, including correct handling of event time, late data, stateful aggregations, and durable restarts.
Core Features & Use Cases
- End-to-end streaming pipeline setup: configure a streaming job in PySpark for Kafka/file/rate sources and run it as a managed streaming query.
- Event-time correctness: define watermarks and windowing (tumbling/sliding/session) to handle late events and bound state.
- Production-grade reliability patterns: apply checkpointing, deduplication, foreachBatch for custom sinks, and production state store tuning with RocksDB; covers Kafka source/sink configuration, stream-stream joins with watermarks, and fault tolerance.
- Typical use cases: Kafka → Delta/Iceberg for silver-lake upserts, near-real-time dashboards with update/append modes, and debugging/monitoring with streaming query progress metrics.
Quick Start
Ask the assistant to generate a PySpark Structured Streaming job that reads JSON events from Kafka, applies an event-time watermark with a tumbling window, deduplicates by a natural key, and writes results to a Delta Lake table using foreachBatch with a durable checkpoint location.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: pyspark-structured-streaming Download link: https://github.com/ivanshamaev/de-agent-skills/archive/main.zip#pyspark-structured-streaming Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.