V-JEPA 2 Vision Transformer

Community

Build and probe V-JEPA 2 Vision Transformer.

Authorsovr610
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill standardizes how to implement and probe the V-JEPA 2 Vision Transformer across image and video tasks, reducing integration friction and providing a repeatable workflow.

Core Features & Use Cases

  • ViT variants coverage (Tiny to Gigantic) with 2D and 3D patch embeddings
  • RoPE-based attention, Cross-Attention, and AttentivePooler for downstream probing
  • Positional embeddings interpolation, token masking, and activation checkpointing for large models

Quick Start

Instantiate a small ViT variant from the config factory and run a forward pass on a sample image to verify shapes.

Dependency Matrix

Required Modules

torchnumpy

Components

scriptsreferencesassets

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: V-JEPA 2 Vision Transformer
Download link: https://github.com/sovr610/refffiy/archive/main.zip#v-jepa-2-vision-transformer

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.