Ego-Centric Dataset V4.0 Now Available

Train from Humans.
Not Simulations.

We process egocentric video through AI pipelines that extract hand poses, grasp types, and behavioral annotations—converting first-person footage into training data for manipulation learning.

30 FPS
21-Point Hand Pose
Auto-Labeled
Live Processing Stream
Clip_850c95c2 (Kitchen)
REC
120 frames • SAM3 + MediaPipe
Batch_Upload_47
30s ago
55 frames with ego-hands
QA Processing
2m ago
Grasp: lateral_pinch 95%
Annotation Fidelity

Multi-Layer

Synchronized object segmentation, hand keypoints, and interaction labels.

RoboData Refinery

Cloud pipeline processing egocentric video with SAM3 segmentation and MediaPipe hand tracking.

From raw footage to
behavioral understanding.

General-purpose robots need more than pixels—they need to understand human intent. VastLabs extracts manipulation behaviors from egocentric video at scale.

120 Frames/Clip
21 Hand Keypoints

The Egocentric Advantage

First-person video captures what matters: hand-object interactions from the manipulator's perspective, not a distant observer.

Grasp Classification

Every hand detection includes grasp type—power grip, precision pinch, lateral pinch—derived from finger geometry analysis.

Behavior Extraction

Beyond detection: we annotate hesitation moments, decision points, and task phases for richer training signal.

Data Pipeline

From egocentric video to neural network.

Capture

Face-mounted cameras record POV video as humans perform manipulation tasks naturally.

Process & Annotate

SAM3 segments objects. MediaPipe extracts 21-point hand skeletons. AI labels grasp types and interactions.

Export & Train

COCO, YOLO, and behavioral JSON formats ready for imitation learning pipelines.

DATASET: CLIP_1B06B70B
task ="dishwashing" frames ="120" ego_hands ="55"
VALIDATED
refinery_client.py — Python
import vastlabs as vl
 
# Load processed clip with full annotations
clip = vl.load("1b06b70b-573c-4bbc-b3a6")
 
# Access ego-hand tracking data
hands = clip.ego_hand_tracking
print(f"Left hand: {hands.left.totalVisibleFrames} frames")
print(f"Right hand: {hands.right.totalVisibleFrames} frames")
> Left hand: 73 frames
> Right hand: 17 frames
 
# Get grasp classifications
for frame in clip.ego_hands.frames:
    if frame.leftHand:
        print(f"Frame {frame.frameIndex}: {frame.leftHand.grasp.graspType}")
> Frame 2: lateral_pinch
> Frame 4: power_grip
 
# Export for training
clip.export("coco_keypoints", include_hands=True)
import vastlabs as vl
 
# Load processed clip with full annotations
clip = vl.load("1b06b70b-573c-4bbc-b3a6")
 
# Access ego-hand tracking data
hands = clip.ego_hand_tracking
print(f"Left hand: {hands.left.totalVisibleFrames} frames")
print(f"Right hand: {hands.right.totalVisibleFrames} frames")
> Left hand: 73 frames
> Right hand: 17 frames
 
# Get grasp classifications
for frame in clip.ego_hands.frames:
    if frame.leftHand:
        print(f"Frame {frame.frameIndex}: {frame.leftHand.grasp.graspType}")
> Frame 2: lateral_pinch
> Frame 4: power_grip
 
# Export for training
clip.export("coco_keypoints", include_hands=True)

Behavioral annotations.
Any framework.

Don't settle for bounding boxes. VastLabs provides hand keypoints, grasp types, motion vectors, and interaction labels in standardized formats.

  • COCO-Keypoints for hand pose
  • Compatible with LeRobot & Robomimic
  • Full JSON with behavioral layers

Data Access

Start training with richly-annotated manipulation data.

Academic
$0
  • Non-commercial license
  • 100 Hours of Data
  • Object annotations only
Popular
Startup
$499 /mo
  • Commercial License
  • 1,000 Hours / month
  • Full behavioral annotations
Robotics Lab
Custom
  • Full Dataset Access
  • Custom Task Collection
  • Dedicated annotation pipeline