V-JEPA 2 Video & Masking
CommunityMaster spatiotemporal masking for video training.
Authorsovr610
Version1.0.0
Installs0
System Documentation
What problem does it solve?
Enables end-to-end setup for V-JEPA 2 style video pretraining by providing tubelet tokenization, multi-block 3D masking, video patch embeddings, and a collator that efficiently handles multi-FPC batching. This Skill abstracts the orchestration of tokenization, masking, and sequence-wrangling so practitioners can focus on model design and experiments rather than the plumbing.
Core Features & Use Cases
- Tubelet tokenization via PatchEmbed3D for efficient spatiotemporal embedding.
- Multi-block 3D masking algorithm to create structured enc/pred targets.
- MaskGenerator and MaskCollator support for deterministic, memory-efficient training on variable-length video clips.
- MultiSequenceEncoder / MultiSequencePredictor wrappers to handle heterogeneous clip lengths in a single step.
- Reference templates and assets to reproduce experiments and benchmarks for V-JEPA 2 pretraining.
Quick Start
Configure and deploy the V-JEPA 2 video masking workflow on a sample video dataset.
Dependency Matrix
Required Modules
torch
Components
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: V-JEPA 2 Video & Masking Download link: https://github.com/sovr610/refffiy/archive/main.zip#v-jepa-2-video-masking Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.