Animation File Formats for 3D Avatars: FBX, VRMA, BVH, glTF, and .anim Compared

Animating a 3D avatar is not just about the model — it is about the motion. And motion comes in files. Different file formats store skeletal animation data in different ways, with different tradeoffs in compatibility, file size, tooling support, and whether the animation will play on your model without extra work.

This is a practical guide to the five major animation formats used in real-time 3D avatar systems, with a focus on VRM models — the open standard for virtual characters.

The Basics: How Skeletal Animation Works

Before comparing formats, it helps to understand what they all store.

A 3D character model has a skeleton — a hierarchy of bones (hips, spine, chest, shoulders, arms, legs) nested inside the mesh. Animation is a sequence of keyframes, each recording the rotation (and sometimes position) of every bone at a point in time. Play the keyframes back at 30 or 60 frames per second, and the skeleton moves. The mesh deforms to follow.

The challenge is that different models have different skeletons. A Mixamo character has bones named mixamorigHips, mixamorigSpine, mixamorigLeftArm. A VRM model has J_Bip_C_Hips, J_Bip_C_Spine, J_Bip_L_UpperArm. Same concept, different names, different rest poses. Making an animation from one skeleton play on another is called retargeting — and the format you choose determines how much retargeting work you need to do.

FBX (Autodesk Filmbox)

FBX is the workhorse of 3D animation. Developed by Autodesk, it is a binary format that stores skeleton hierarchies, bone animations, meshes, and materials. If you have used Mixamo — Adobe's free motion capture library with thousands of animations — you have used FBX.

What is inside an FBX animation file:
- A skeleton hierarchy with named bones
- Keyframe data: rotation quaternions (and optional position/scale) per bone per frame
- Timing metadata (frame rate, duration)

Strengths:
- The largest animation library in the world (Mixamo alone has 2,500+ motions)
- Industry standard — every 3D tool exports FBX
- Compact binary format

Limitations:
- Proprietary binary format (not human-readable)
- Always requires retargeting for VRM models
- Skeleton naming conventions vary between sources

Retargeting FBX to VRM:

When an FBX animation is loaded in the browser, a retargeting step maps Mixamo bones to VRM bones in real time:

Strip the mixamorig prefix and match to VRM humanoid bone names
For each keyframe, convert the rotation from the FBX rest pose into world space, then back into the VRM rest pose
Scale the hip height to match the target model's proportions
Handle coordinate system differences between VRM versions

The core mapping covers 22 bones — hips through toes, spine through head, both arms and hands. Finger bones are typically excluded from basic retargeting since they require per-model calibration.

When to use FBX: When you need a large library of ready-made animations. Mixamo is free and covers nearly every common motion — walking, waving, dancing, idle poses, combat, expressions. The retargeting cost is paid once at load time and cached.

VRMA (VRM Animation)

VRMA is the native animation format for VRM models, created by Pixiv (the same organization behind the VRM standard). It is built on glTF 2.0 — the modern open standard for 3D — with VRM-specific extensions.

What is inside a VRMA file:
- A glTF container (JSON metadata + binary buffer)
- Skeletal animation keyframes using VRM humanoid bone identifiers
- Optional blend shape (expression) animation tracks

Strengths:
- No retargeting required — bones already use VRM naming conventions
- Open format, human-readable JSON metadata
- Can include facial expression animations alongside body motion
- Designed for the VRM ecosystem

Limitations:
- Smaller animation library compared to FBX/Mixamo
- Larger file size than equivalent FBX (glTF overhead)
- Tooling is still maturing

How VRMA loading works:

The VRM Animation Loader Plugin reads the glTF extensions and produces animation clips that are already compatible with the target VRM model. No bone mapping, no quaternion conversion, no rest-pose math. The animation just plays.

When to use VRMA: When you are building specifically for VRM avatars and want zero retargeting overhead. VRMA is also the best choice for animations that include facial expressions, since the blend shape tracks are baked in. As the format matures, it will increasingly replace FBX for VRM workflows.

BVH (Biovision Hierarchy)

BVH is the oldest format on this list and the simplest. Created by Biovision for motion capture data, it is a plain text format that anyone can open in a text editor.

What is inside a BVH file:

HIERARCHY
ROOT Hips
{
  OFFSET 0.0 0.0 0.0
  CHANNELS 6 Xposition Yposition Zposition Zrotation Xrotation Yrotation
  JOINT Spine
  {
    OFFSET 0.0 5.21 0.0
    CHANNELS 3 Zrotation Xrotation Yrotation
    ...
  }
}
MOTION
Frames: 120
Frame Time: 0.0333333
0.0 35.2 0.0 0.5 -1.2 0.0 ...
0.0 35.3 0.0 0.6 -1.1 0.0 ...

The first section defines the skeleton. The second section is raw rotation values, one line per frame. That is the entire format.

Strengths:
- Human-readable plain text
- Lightweight — no binary overhead, no metadata bloat
- Direct output format for many motion capture systems
- Easy to parse and modify programmatically

Limitations:
- No mesh, material, or blend shape data — motion only
- Bone names vary wildly between sources
- Euler angles (not quaternions) can cause gimbal lock artifacts
- Fewer curated animation libraries

Retargeting BVH to VRM:

BVH files can come from many sources with different bone naming. The loader first checks whether the bone names already match VRM humanoid conventions. If they do, the animation plays directly. If they do not, the retargeting pipeline maps them through the same Mixamo-to-VRM conversion, since many BVH packs use Mixamo-compatible naming.

When to use BVH: When you are working with raw motion capture data, building custom animation pipelines, or need a format you can inspect and edit by hand. BVH is also common in academic research and open-source mocap datasets.

glTF / GLB (Graphics Library Transmission)

glTF is the "JPEG of 3D" — an open standard by the Khronos Group designed for efficient transmission of 3D content. GLB is the binary container variant (all data in one file). VRM models themselves are GLB files with VRM extensions.

What is inside a GLB with animations:
- 3D mesh geometry and materials
- Skeleton hierarchy
- One or more animation clips embedded in the file
- Textures, blend shapes, and metadata

Strengths:
- Modern open standard with broad industry support
- Animations are embedded with the model — one file, everything included
- Compressed and optimized for web delivery
- Blend shape animations supported natively

Limitations:
- Embedded animations are tied to the model's specific rig
- Not designed for standalone animation distribution (unlike FBX or VRMA)
- Converting animations from other formats requires offline tooling (Blender scripts)

How embedded GLB animations work:

When a VRM model (which is a GLB file) is loaded, any animation clips baked into the file are registered directly. No retargeting needed — they were authored for this exact skeleton. This is ideal for character-specific animations like signature idle poses, entrance animations, or unique expressions.

Offline baking with Blender:

For production workflows, a Python script can batch-convert FBX animations into a single GLB:

Import the VRM model into Blender
Import each FBX animation
Map bone names between skeletons (trying direct match, case-insensitive, and VRoid-specific prefixes)
Retarget each animation frame by frame
Export one GLB with all animations embedded

This front-loads the retargeting work so the browser does not have to do it at runtime.

When to use GLB: When you want animations permanently attached to a specific model with zero runtime cost. Best for character-specific motions that will never be reused on other models.

.anim (Unity Animation)

If you have worked with Unity, you have seen .anim files. This is Unity's proprietary animation clip format — a YAML-serialized asset that stores keyframe curves for any animatable property, including bone transforms.

What is inside a .anim file:

%YAML 1.1
%TAG !u! tag:unity3d.com,2011:
--- !u!74 &7400000
AnimationClip:
  m_Name: Idle
  m_Curves:
    - path: Hips
      attribute: m_LocalRotation.x
      keys:
        - time: 0
          value: 0.0
        - time: 0.5
          value: 0.02

Each curve targets a specific bone property (rotation, position, scale) at specific keyframe times. Unity interpolates between keyframes at runtime.

Strengths:
- Human-readable YAML (unlike FBX)
- Deep integration with Unity's animation state machine (Animator Controller)
- Can animate any property, not just bones — materials, blend shapes, custom scripts
- Supports animation events (trigger code at specific frames)
- Large ecosystem of animation packs on the Unity Asset Store

Limitations:
- Unity-only format — not portable to other engines or web renderers
- Cannot be loaded directly in Three.js or any non-Unity runtime
- Tightly coupled to Unity's serialization system and object hierarchy
- Path references are scene-specific (break if you rename GameObjects)

Using .anim with VRM avatars:

You cannot use .anim files directly in a web-based avatar system. They must be converted first. The typical workflow:

Import the .anim into Unity with a VRM model
Play the animation and verify it looks correct on the VRM rig
Export as FBX (via Unity's animation export or a plugin like UniVRM)
Or export as VRMA using the UniVRM VRM Animation exporter
Use the exported FBX or VRMA in your web pipeline

If you are sourcing animations from the Unity Asset Store, this conversion step is unavoidable. The good news is that many Unity animation packs are also available in FBX format from their original creators, which saves the round-trip.

When to use .anim: Only within Unity itself. If your avatar system runs in Unity (VRChat, cluster, many VTuber apps), .anim is the native choice. For web-based systems using Three.js or similar, convert to FBX or VRMA before use.

Format Comparison

	FBX	VRMA	BVH	GLB	.anim
Format	Binary	glTF + extensions	Plain text	Binary (glTF)	YAML (text)
Human-readable	No	Partially	Yes	No	Yes
VRM retargeting	Required	Not needed	Sometimes	Not needed	N/A (convert first)
Animation libraries	Huge (Mixamo)	Growing	Moderate	N/A	Large (Unity Store)
Facial expressions	No	Yes	No	Yes	Yes
File size	Small	Medium	Small	Large	Medium
Web-compatible	Yes	Yes	Yes	Yes	No (convert)
Best for	Library animations	VRM-native	Mocap data	Embedded	Unity projects

Loop Modes

Regardless of format, every animation needs a loop mode — what happens when it finishes playing:

Loop — repeats indefinitely. Used for idle poses, walking cycles, and breathing.
Once — plays once and holds the final frame. Used for one-shot gestures like waving.
Once then idle — plays once, then automatically transitions back to the idle animation. Used for reactions, celebrations, and emotes. This is the most common mode for interactive avatars, since the character should always return to a natural resting state.

The Retargeting Problem

The fundamental challenge in avatar animation is that animations and models are authored independently. A Mixamo dance was recorded on a Mixamo skeleton. Your VRM character has a different skeleton. The bones have different names, different rest rotations, and different proportions.

Retargeting solves this by finding the semantic correspondence — "this bone is the left upper arm in both skeletons" — and converting the rotation data between rest poses. The math involves quaternion multiplication: take the bone's rotation relative to its rest pose in the source skeleton, and apply it relative to the rest pose in the target skeleton.

The result is not always perfect. Proportional differences mean a tall character's walk cycle might look slightly different on a short character. Extreme poses can clip through the mesh. But for the vast majority of everyday animations — idle, talking, gestures, reactions — retargeting produces natural-looking results with no manual adjustment.

Choosing the Right Format

For most avatar systems, the practical answer is: use all of them.

Start with FBX from Mixamo for your animation library. Thousands of free, high-quality animations covering every common motion.
Use VRMA when available, especially for animations authored specifically for VRM models. As the format matures, it will increasingly replace FBX for VRM workflows.
Use BVH when working with custom motion capture data or research datasets.
Embed critical animations (idle, entrance) directly in the GLB model file for zero-latency playback on load.
Convert .anim files from the Unity Asset Store to FBX or VRMA when bringing Unity content into web-based systems.

The format matters less than you might think. What matters is the retargeting pipeline — the ability to take motion from any source and make it play naturally on any character. Get that right, and every animation library in the world becomes available to every avatar you build.