V-JEPA 2 Video & Masking

Community

Master spatiotemporal masking for video training.

Authorsovr610
Version1.0.0
Installs0

System Documentation

What problem does it solve?

Enables end-to-end setup for V-JEPA 2 style video pretraining by providing tubelet tokenization, multi-block 3D masking, video patch embeddings, and a collator that efficiently handles multi-FPC batching. This Skill abstracts the orchestration of tokenization, masking, and sequence-wrangling so practitioners can focus on model design and experiments rather than the plumbing.

Core Features & Use Cases

  • Tubelet tokenization via PatchEmbed3D for efficient spatiotemporal embedding.
  • Multi-block 3D masking algorithm to create structured enc/pred targets.
  • MaskGenerator and MaskCollator support for deterministic, memory-efficient training on variable-length video clips.
  • MultiSequenceEncoder / MultiSequencePredictor wrappers to handle heterogeneous clip lengths in a single step.
  • Reference templates and assets to reproduce experiments and benchmarks for V-JEPA 2 pretraining.

Quick Start

Configure and deploy the V-JEPA 2 video masking workflow on a sample video dataset.

Dependency Matrix

Required Modules

torch

Components

scriptsreferencesassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: V-JEPA 2 Video & Masking
Download link: https://github.com/sovr610/refffiy/archive/main.zip#v-jepa-2-video-masking

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.