Skip to content
Real-Time Avatars/Learn
End-to-End|GaussianMetaHumanVideo Gen
Graphics-based~30 min

MetaHuman Pipeline

UE 5.7 rigged avatars with source-backed control paths

1Introduction2Pipeline3Key Concepts4Build It5Trade-offs

What is MetaHuman?

MetaHuman is Epic Games' framework for creating photorealistic digital humans in Unreal Engine. Think of it as a sophisticated puppet system: a skeleton of virtual bones controls a high-detail 3D mesh, while blendshapes handle subtle facial deformations. The magic comes from live face tracking - your iPhone's TrueDepth camera captures 52 different facial muscles at 60 FPS, streaming this data directly to drive the MetaHuman's expressions. Combined with Audio2Face for lip sync, you get real-time digital humans with precise, frame-by-frame control.

The Core Idea in 30 Seconds

Skeletal Rig

700+ bones control mesh deformation hierarchically

52 Blendshapes

ARKit standard for facial expressions at 60 FPS

Live Link

iPhone face tracking streams directly to UE5

The Marionette Metaphor

Bones

Wooden crossbars

Joints

Strings connecting bars

Mesh

Puppet's cloth/skin

Blendshapes

Facial expressions overlay

MetaHuman uses classical graphics techniques rather than neural rendering. This means predictable performance (always 60 FPS on target hardware), full artistic control (every parameter is adjustable), and no training required. The tradeoff is the need for expensive motion capture or manual animation.

Skeletal Animation

Bone Hierarchy

Child bones inherit parent transforms

Shoulder

Child bones inherit parent transforms

Skinning Weights

Vertices follow bones based on weight

Vertices follow bone based on weight (purple = high)

FK vs IK

Forward vs Inverse Kinematics

Rotate joints → end position

Bone Transforms

Rotation propagation down chain

Parent rotation: 0°

Joint Limits

Angle constraints prevent bad poses

45° (range: -30° to 120°)

Quaternion Rotation

Smooth interpolation without gimbal lock

Facial Animation

Blendshape Interpolation

Linear blend between stored poses

Neutral0%Smile
FACS Action Units

Anatomical facial muscle controls

AU1
AU2
AU12
AU25
Morph Targets

Combine multiple deformations

A: 50%
B: 30%
Wrinkle Maps

Dynamic skin detail based on expression

Forehead

Wrinkle intensity: 0%

Phoneme to Viseme

Audio phonemes map to mouth shapes

/a/ → jaw: 80%, lips: 30%

Lip Sync Weights

Blendshape weights per viseme

jaw
upperLip
lowerLip
mouthWidth

Audio to Animation

Mel Spectrogram

Frequency-time representation

Low freqTime →High freq
Audio Envelope

Amplitude over time for lip sync

Realistic Materials

PBR Materials

Physically-based roughness/metallic

Rough
Metal
Normal Mapping

Surface detail without geometry

Surface
Ambient Occlusion

Soft contact shadows

Performance Optimization

LOD Switching

Detail level based on camera distance

Full
50K
Medium
15K
Low
3K
NearFar

Distance: 50m → LOD1

Interpolation Types

Linear, ease, step transitions

ARKit 52 Blendshape Playground

Apple ARFaceAnchor.BlendShapeLocation — 52 coefficients mapped to FACS Action Units.

Each wi ∈ [0,1] linearly blends a basis deformation Bi onto the neutral mesh x0. Drag to orbit.

Expression Presets0/52 active
eyeBlinkLeftAU45L 0.00
eyeBlinkRightAU45R 0.00
eyeLookDownLeftAU61L 0.00
eyeLookDownRightAU61R 0.00
eyeLookInLeftAU62L 0.00
eyeLookInRightAU62R 0.00
eyeLookOutLeftAU63L 0.00
eyeLookOutRightAU63R 0.00
eyeLookUpLeftAU64L 0.00
eyeLookUpRightAU64R 0.00
eyeSquintLeftAU6L 0.00
eyeSquintRightAU6R 0.00
eyeWideLeftAU5L 0.00
eyeWideRightAU5R 0.00
mouthCloseAU24 0.00
mouthDimpleLeftAU14L 0.00
mouthDimpleRightAU14R 0.00
mouthFrownLeftAU15L 0.00
mouthFrownRightAU15R 0.00
mouthFunnelAU22 0.00
mouthLeftAU30L 0.00
mouthLowerDownLeftAU16L 0.00
mouthLowerDownRightAU16R 0.00
mouthPressLeftAU23L 0.00
mouthPressRightAU23R 0.00
mouthPuckerAU18 0.00
mouthRightAU30R 0.00
mouthRollLowerAU28L 0.00
mouthRollUpperAU28U 0.00
mouthShrugLowerAU17L 0.00
mouthShrugUpperAU17U 0.00
mouthSmileLeftAU12L 0.00
mouthSmileRightAU12R 0.00
mouthStretchLeftAU20L 0.00
mouthStretchRightAU20R 0.00
mouthUpperUpLeftAU10L 0.00
mouthUpperUpRightAU10R 0.00
f(x) = x₀ + Σ wᵢ · Bᵢ0/52 active

Linear blend: neutral mesh x₀ + weighted sum of 52 basis deformations. Mapped to FACS Action Units (Ekman & Friesen, 1978). 3D model: Three.js facecap.glb with ARKit morph targets.

Skinning Weight Visualizer

See how skinning weights determine mesh deformation. Click a bone to select it, then rotate to see how weights blend the deformation.

Pose Presets

Bones

Weight Color Scale

0% (Blue)50%100% (Red)

Vertices with higher weight for the selected bone move more when that bone rotates. Weights are normalized so they sum to 1 per vertex.

Face Tracking Simulator

Simulate how ARKit extracts 52 blendshapes from face tracking. Move your mouse over the canvas to control head pose, and use the sliders for expressions.

Click to start simulated tracking

Expression

Mouth Open0%
Eyes Closed0%

Blendshapes (0 active)

// ARKit blendshape output (simplified)

{}

How Real Face Tracking Works

  • • iPhone TrueDepth projects 30,000 infrared dots onto your face
  • • ARKit processes the depth map to extract 52 blendshape coefficients
  • • Each coefficient ranges from 0.0 to 1.0, representing muscle activation
  • • Data streams at 60 FPS via Live Link to Unreal Engine

Audio-to-Expression Demo

Hello

Smoothing0.30

Higher = smoother transitions

Current Blendshapes:

jawOpen
0.00
mouthSmile
0.00
mouthPucker
0.00
mouthFunnel
0.00

This demonstrates how phonemes (speech sounds) map to visemes (mouth shapes). Neural audio-to-expression models learn these mappings from video data automatically.

Bone Hierarchy Explorer

Explore how skeletal animation works. Click bones to select them, then adjust rotation. Notice how child bones inherit parent transformations.

Click a bone joint to select it
Bone Hierarchy
Root
└─ Spine
└─ Chest
├─ Neck → Head
├─ L.Shoulder → Arm → Forearm → Hand
└─ R.Shoulder → Arm → Forearm → Hand

Key Insight

When you rotate a parent bone, all children follow. This is forward kinematics. MetaHuman uses 700+ bones with this hierarchy to create lifelike movement.

Expression Blending Mixer

Mix multiple expressions together. Real faces blend emotions - you can be happy-surprised or sad-angry. Blendshapes add linearly then clamp to [0,1].

Presets

Expression Mix

Happy0%
Sad0%
Angry0%
Surprised0%
Disgusted0%
Final Blendshapes

No active blendshapes

How Expression Blending Works

final_blendshape[i] = clamp(Σ(expression_weight × blendshape_value), 0, 1)

Each expression defines target values for relevant blendshapes. When you mix expressions, the blendshape values add together (then clamp). This is how MetaHuman and ARKit create nuanced expressions from simple building blocks.

Inverse vs Forward Kinematics

FK: Set joint angles → calculate end position. IK: Set target position → solve for angles. MetaHuman uses IK for realistic hand/foot placement.

Target
End effector
Joint

IK Mode

Drag anywhere to move the target. The arm automatically solves for joint angles using the FABRIK algorithm.

Solver Iterations10

More iterations = more accurate but slower

Comparison

FK

  • • Simple to compute
  • • Direct control
  • • Animation curves

IK

  • • Goal-oriented
  • • Foot placement
  • • Hand targets

Level of Detail (LOD) System

MetaHuman uses LOD to maintain performance. Closer = more triangles and higher textures. Far away = simplified mesh. The transition is seamless.

Camera Distance3.0m

Current LOD Stats

Triangles:15,000
Vertices:8,000
Texture:2K
Range:2-5m

All LOD Levels

LOD 050,000 tris @ 0-2m
LOD 115,000 tris @ 2-5m
LOD 25,000 tris @ 5-15m
LOD 31,500 tris @ 15-30m
LOD 4500 tris @ 30m+

Performance Impact

LOD 0 to LOD 4 is a 100x reduction in triangles. In a scene with multiple MetaHumans, this is essential for maintaining 60 FPS.

Wrinkle Map System

MetaHuman uses wrinkle maps for facial detail. Each expression drives specific wrinkle regions that blend on top of the base skin texture.

Global Wrinkle Strength1.0x

Expression Drivers

brow Raise0%
brow Furrow0%
squint0%
smile0%
pucker0%

Texture Layers

MetaHuman blends multiple texture layers: base albedo, normal map, roughness, and wrinkle maps. Wrinkle maps are driven by blendshape values, creating realistic skin deformation during expressions.

Corrective Blendshapes

When multiple blendshapes activate together, their deformations can combine incorrectly. Corrective blendshapes fix these artifacts automatically.

Toggle correctives and combine both sliders to see the artifact

Blendshape A: Blink0%
Blendshape B: Look Down0%

How It Works

Problem: Blink moves eyelids down. Look down also moves eye region down. Combined = double deformation artifact.

Solution: Corrective shape (blink_lookDown) activates when both are > 0, counteracting the excess deformation.

Corrective Formula

corrective_weight = blendA × blendB

The corrective is sculpted to exactly cancel the artifact when both blendshapes are at 100%, and scales proportionally for partial activations.

MetaHuman Usage

MetaHuman has hundreds of corrective shapes for common expression combinations: smile+blink, frown+jawOpen, etc. These are pre-computed and activate automatically.

Joint Constraints

Real joints have physical limits. Elbows can't bend backward. MetaHuman enforces these constraints to prevent unnatural poses.

Elbow Angle0°
Limit: 0°Limit: 145°

Joint Limits

Shoulder-45° to 180°
Elbow0° to 145°
Wrist-70° to 70°

Constraint Types

Hinge: Single axis rotation (elbow, knee)

Ball: Multi-axis with cone limits (shoulder, hip)

Saddle: Two-axis with asymmetric limits (thumb)

In Animation

Constraints prevent impossible poses during IK solving and motion retargeting. They also help with collision avoidance (elbow not going through torso).

Eye Gaze & Tracking

Eyes are crucial for believable avatars. Control gaze direction, blink, and pupil dilation.

Gaze Direction

Controlled by eye bone rotation. ARKit provides eyeLookIn/Out/Up/Down blendshapes for detailed control.

Pupil Response

Dilates with emotion and lighting. Small detail that adds significant realism to digital humans.

Hair Simulation

Real-time hair dynamics using simplified physics. Each strand responds to wind, gravity, and stiffness.

In MetaHuman

Real hair simulation uses thousands of guide strands with interpolation, collision detection, and GPU-accelerated physics. Groom assets define the hair's look and behavior.

Secondary Motion

Secondary motion adds physics-based follow-through to primary animation. Watch earrings and hair react to head movement.

Animation Principles

Secondary motion follows Disney's principles: drag, overlap, and follow-through. Spring physics creates natural-looking motion that reacts to the primary animation.

Facial Muscle System

Anatomically-based facial animation. Click muscles to activate them and create expressions.

Muscle-Based Animation

FACS (Facial Action Coding System) maps muscle activations to Action Units. MetaHuman uses this for physically plausible facial animation driven by blendshapes.

The Animation Pipeline

Refined with metahuman-evolver output from Unreal Engine 5.7 source scans.

Step 1 of 6
Speed:

Identity Authoring

MetaHumanCharacter + MetaHumanIdentity modules assemble DNA-backed character assets

Use ← → arrow keys to navigate, Space to play/pause

Dependency Hot Paths

  • MetaHumanAnimator -> MetaHumanCoreTechLib (35)
  • MetaHumanAnimator -> RigLogic (8)
  • MetaHumanCharacter -> MetaHumanSDK (10)
  • MetaHumanLiveLink -> MetaHumanCoreTechLib (9)
  • Top module hubs: MetaHumanCoreTech (20), MetaHumanCore (19), RigLogicModule (15)

Evolver Signals

  • Cycle 11 scan: 12 plugins, 70 modules, 2898 source files, 248 internal module edges.
  • Hub plugins: MetaHumanAnimator (28 modules), MetaHumanCoreTechLib (5), MetaHumanLiveLink (7).
  • Official watch: 5/5 Epic MetaHuman docs endpoints reachable in latest cycle.
  • One-line docs include tracker model tags: hyprface-0.1.4 and wav2face-0.0.10.

Key Concepts

Master these five building blocks of the MetaHuman system.

1
Intermediate

Blendshapes (Morph Targets)

Deformed versions of a mesh representing different expressions. Blend between neutral and target shapes using 0-1 weights to create smooth animations.

Imagine a collection of rubber masks showing different expressions. To animate, you 'blend' the neutral mask toward any expression mask by a percentage - 50% smile gives you a half-smile.
Try demoGo deeper
2
Intermediate

Skeletal Rig

A hierarchy of virtual bones that deform the mesh. Moving the shoulder bone cascades to the arm, hand, and fingers. Weight painting determines how strongly each bone affects nearby vertices.

Like a marionette puppet with wooden crossbars (bones) connected by strings (joints). Moving the shoulder bar pulls the arm and hand bars along. The puppet's cloth stretches based on where bones move.
Try demoGo deeper
3
Intermediate

Live Link / ARKit

Real-time streaming from iPhone TrueDepth camera to Unreal Engine. 30,000 infrared dots map to 52 blendshape weights at 60 FPS, giving frame-accurate expression control.

Your face is reflected in a magic mirror tracking 52 muscle movements. Each muscle has a dial (0-100%). Those readings instantly control a puppet in another room.
Try demoGo deeper
4
Intermediate

UE5 Rendering

Lumen for global illumination, ray-traced hair strands, and subsurface scattering for realistic skin. 8 LOD levels balance quality and performance.

An assembly line factory: raw 3D data enters, gets measured (vertex), stamped flat (rasterization), painted (fragment shading), and assembled into the final image.
Try demoGo deeper
5
Intermediate

Audio2Face

NVIDIA's AI that generates facial animation from audio. A neural network maps speech to 72 blendshapes in real-time, enabling automatic lip sync without motion capture.

An AI ventriloquist who learned by watching thousands of hours of people speaking. It knows 'oo' sounds need pursed lips and 'ah' needs an open jaw - and puppeteers accordingly.
Try demoGo deeper

Build It Yourself

Set up MetaHuman with Live Link face tracking in Unreal Engine 5.

Download from Epic Games Launcher and create a new project

bash
3 lines
1# Download from launcher.unrealengine.com
2# Create new Third Person or Blank project
3# Enable MetaHuman plugin in Edit > Plugins

Step 1 of 5

Resources

Complete UE 5.7 Architecture Dossier

Full module-by-module implementation map spanning this repo, the UE source tree, and official Epic docs.

Open architecture documentation

MetaHuman Creator

docs

Live Link Face App

docs

MetaHuman Documentation

docs

NVIDIA Audio2Face

github

Control Rig Tutorial

docs

Convai MetaHuman Plugin

docs

When to Use MetaHuman

Use When

  • +You need frame-accurate animation control
  • +You're building for desktop/console with good GPUs
  • +You want to use existing Unreal Engine workflows
  • +You need deterministic, repeatable output
  • +You're integrating with game or simulation systems

Avoid When

  • −You need photorealism of a real person (use Gaussian Splatting)
  • −You want web/mobile deployment without powerful hardware
  • −You need one-shot avatar from any photo (use Generative Video)
  • −You're building a lightweight voice AI app (use Streaming)
  • −You have limited GPU resources

Best Use Case

Production environments requiring precise control, deterministic animation, and integration with game engine workflows

Common Misconceptions

MetaHuman has completely crossed the uncanny valley

Actually: While impressive in stills, animation often reveals the illusion. Micro-expressions, subtle skin deformations, and natural asymmetry are still challenging.

MetaHumans are easy to run on any hardware

Actually: They're computationally expensive: RTX 3070 drops to 20 FPS with 10 MetaHumans. 8K textures, ray-traced hair, and 700-bone rigs require significant GPU power.

Blendshapes and bones are interchangeable

Actually: They serve different purposes: blendshapes for soft tissue (face muscles), bones for rigid structures (limbs). MetaHuman uses both together.

ARKit captures all facial expressions accurately

Actually: 52 blendshapes are a simplification. No micro-expressions, binary tongue tracking, 60 FPS cap misses fast movements.

Audio2Face replaces traditional animation

Actually: It excels at lip sync but can't generate head movement or body gestures. Best for NPCs and first drafts, not final film quality.

See MetaHuman in Action

Try a real-time MetaHuman avatar powered by Unreal Engine pixel streaming. Cloud-rendered on GPU and delivered to your browser via WebRTC.

Launch Rapport Demo →

Ready to Go Deeper?

Explore ARKit blendshapes or see how MetaHuman compares to other approaches.

Explore ARKit Blendshapes →Full architecture docsCompare All Methods

Explore Other Approaches

Gaussian Splatting

Neural 3D rendering

Video Generation

Diffusion & streaming

Learn Real-Time Avatar Technologies

Back to Research Survey