Ego-Centric Dataset V4.0 Now Available

Train from Humans.
Not Simulations.

We process egocentric video through AI pipelines that extract hand poses, grasp types, and behavioral annotations—converting first-person footage into training data for manipulation learning.

30 FPS

21-Point Hand Pose

Auto-Labeled

DATA_STREAM

REFINERY

Live Processing Stream

Clip_850c95c2 (Kitchen)

REC

120 frames • SAM3 + MediaPipe

Batch_Upload_47

30s ago

55 frames with ego-hands

QA Processing

2m ago

Grasp: lateral_pinch 95%

Annotation Fidelity

Multi-Layer

Synchronized object segmentation, hand keypoints, and interaction labels.

RoboData Refinery

Cloud pipeline processing egocentric video with SAM3 segmentation and MediaPipe hand tracking.

From raw footage to
behavioral understanding.

General-purpose robots need more than pixels—they need to understand human intent. VastLabs extracts manipulation behaviors from egocentric video at scale.

120 Frames/Clip

21 Hand Keypoints

The Egocentric Advantage

First-person video captures what matters: hand-object interactions from the manipulator's perspective, not a distant observer.

Grasp Classification

Every hand detection includes grasp type—power grip, precision pinch, lateral pinch—derived from finger geometry analysis.

Behavior Extraction

Beyond detection: we annotate hesitation moments, decision points, and task phases for richer training signal.

Data Pipeline

From egocentric video to neural network.

Capture

Face-mounted cameras record POV video as humans perform manipulation tasks naturally.

Process & Annotate

SAM3 segments objects. MediaPipe extracts 21-point hand skeletons. AI labels grasp types and interactions.

Export & Train

COCO, YOLO, and behavioral JSON formats ready for imitation learning pipelines.

DATASET: CLIP_1B06B70B

task ="dishwashing" frames ="120" ego_hands ="55"

VALIDATED

refinery_client.py — Python

import vastlabs as vl

# Load processed clip with full annotations

clip = vl.load("1b06b70b-573c-4bbc-b3a6")

# Access ego-hand tracking data

hands = clip.ego_hand_tracking

print(f"Left hand: {hands.left.totalVisibleFrames} frames")

print(f"Right hand: {hands.right.totalVisibleFrames} frames")

> Left hand: 73 frames

> Right hand: 17 frames

# Get grasp classifications

for frame in clip.ego_hands.frames:

if frame.leftHand:

print(f"Frame {frame.frameIndex}: {frame.leftHand.grasp.graspType}")

> Frame 2: lateral_pinch

> Frame 4: power_grip

# Export for training

clip.export("coco_keypoints", include_hands=True)

import vastlabs as vl

# Load processed clip with full annotations

clip = vl.load("1b06b70b-573c-4bbc-b3a6")

# Access ego-hand tracking data

hands = clip.ego_hand_tracking

print(f"Left hand: {hands.left.totalVisibleFrames} frames")

print(f"Right hand: {hands.right.totalVisibleFrames} frames")

> Left hand: 73 frames

> Right hand: 17 frames

# Get grasp classifications

for frame in clip.ego_hands.frames:

if frame.leftHand:

print(f"Frame {frame.frameIndex}: {frame.leftHand.grasp.graspType}")

> Frame 2: lateral_pinch

> Frame 4: power_grip

# Export for training

clip.export("coco_keypoints", include_hands=True)

Behavioral annotations.
Any framework.

Don't settle for bounding boxes. VastLabs provides hand keypoints, grasp types, motion vectors, and interaction labels in standardized formats.

COCO-Keypoints for hand pose
Compatible with LeRobot & Robomimic
Full JSON with behavioral layers

Data Access

Start training with richly-annotated manipulation data.

Academic

Non-commercial license
100 Hours of Data
Object annotations only

Popular

Startup

$499 /mo

Commercial License
1,000 Hours / month
Full behavioral annotations

Robotics Lab

Custom

Full Dataset Access
Custom Task Collection
Dedicated annotation pipeline

Train from Humans. Not Simulations.