V-JEPA 2 Vision Transformer
CommunityBuild and probe V-JEPA 2 Vision Transformer.
Software Engineering#video-analysis#cross-attention#rope#vit#vision-transformer#3d-patch-embedding#attentive-pooler
Authorsovr610
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill standardizes how to implement and probe the V-JEPA 2 Vision Transformer across image and video tasks, reducing integration friction and providing a repeatable workflow.
Core Features & Use Cases
- ViT variants coverage (Tiny to Gigantic) with 2D and 3D patch embeddings
- RoPE-based attention, Cross-Attention, and AttentivePooler for downstream probing
- Positional embeddings interpolation, token masking, and activation checkpointing for large models
Quick Start
Instantiate a small ViT variant from the config factory and run a forward pass on a sample image to verify shapes.
Dependency Matrix
Required Modules
torchnumpy
Components
scriptsreferencesassets
💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: V-JEPA 2 Vision Transformer Download link: https://github.com/sovr610/refffiy/archive/main.zip#v-jepa-2-vision-transformer Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.