/ Enterprise-grade · IP-clean · Ready to license

The Source of Ground-Truth Human Motion & Behavior Data

Multimodal datasets for Physical AI - from world models to humanoid control.

142,200 motion sequences structured and optimized for humanoid control research. Focused on locomotion, object manipulation, gestures, and everyday human behavior — the motions robots actually need to learn.

Watch Full Video

BONES-SEED ‑ Skeletal Everyday Embodied Dataset.

Available in SOMA, Unitree G1 (MuJoCo compatible), and Vicon skeleton formats. Full semantic metadata with up to 6 natural language descriptions per motion and hierarchical categorization. Open source for academic use and qualifying startups. Gated access on Hugging Face.

Access on hugging face

learn more

700+ hours of labeled and annotated human 3D animations. Performed by 170+ physically diverse performers — including professionals like stuntmen, soldiers, and dancers.

Watch Full Video

BONES RP01 ‑ Ground Truth at Scale.

Every motion captured in 20+ styles — from emotions to physical conditions. Optical motion capture (Vicon) at 120fps with submillimeter accuracy. Unified skeleton retarget across all files. Available in BVH, FBX. 3 multiview videos per take. Rich metadata with up to 5 descriptions per motion.

research paper

gear sonic

/ RP 01

Systematically designed taxonomy across 21 categories :

When you license BONES data, you get full IP clearance. Enterprise-ready from day one

/ Locomotion

Basic, advanced, unusual, complex actions - from walking to parkour

/ Interaction

Objects, manipulation, household, environments - grasping, opening, kitchen tasks

/ Communication

Gestures, pointing, consuming - body language, everyday behaviors

/ Performance

Dancing, sports, martial arts, stunts, combat - full-body dynamics

/ Proof

BONES RP1 data powered SONIC - a foundation model for humanoid whole-body control.

The only dataset where 3D motion, video, audio, face capture, and 3D scene reconstruction are synchronized frame-by-frame across the same performance. Not stitched from separate sources — captured together, in one take.

Every take captured simultaneously across all modalities ‑ frame-accurate

The depth to train. The precision to evaluate.wherever your pipeline needs spatial awareness, physics verification, or human behavior understanding.

Skeletal 3D motion
Full hand articulation
Face video capture + FACS
Egocentric stereo vision
3D body scans & models
Audio & voice
8× 4K multi-view video
Temporal annotations ( action-level )