Neural 3D Rendering + Conversation~50 min

Gaussian Splatting

3D Gaussian primitives for real-time photorealistic rendering and conversation

What is Gaussian Splatting?

Imagine representing a 3D scene not with triangles or pixels, but with millions of tiny, fuzzy, colored blobs floating in space. That's Gaussian Splatting. Each blob, called a Gaussian, has a position, a shape, a color that can change based on viewing angle, and transparency. When you look at the scene, these blobs are 'splatted' onto your screen like spray paint, blending together to create photorealistic images at 60+ frames per second. For avatars, this means we can capture a real person as a cloud of Gaussians, then animate and render them in real-time with quality that rivals photographs. Recent one-shot models like LAM (SIGGRAPH 2025) can create animatable Gaussian avatars from a single photo in seconds, and when paired with Audio2Expression pipelines, these avatars can hold real-time voice conversations in the browser.

The Core Idea in 30 Seconds

Millions of Fuzzy Blobs

Each is a 3D Gaussian with position, shape, color, opacity

Splat to Screen

Project each blob to 2D, sort by depth, blend together

60+ FPS

Rasterization (not ray tracing) enables real-time speed

Unlike NeRF's implicit neural field that requires expensive ray marching, 3D Gaussians are explicit primitives that can be directly rasterized. This makes the representation both editable and fast to render. The tradeoff? More memory usage (~1-2GB for typical scenes) compared to NeRF's compact MLP weights.

Core Mechanisms

Gaussian Function

The bell curve that defines each splat's falloff

σ1.0

G(x) = e^(-x²/2σ²)

Alpha Blending Math

How overlapping Gaussians combine colors

×0.5+

×0.5=

C_out = C_front × α + C_back × (1-α)

Covariance Matrix

2D ellipse shape from σx, σy, rotation

σx

σy

Matrix Transform

How rotation + scale create ellipsoid shapes

Rotation

Scale X

Point Cloud → Splats

Initialize Gaussians from SfM points

View-Dependent Color

Spherical harmonics encode reflections

View angle: 0°

Rendering Pipeline

Depth Sorting

Back-to-front order for correct transparency

Far

Mid

Near

Back-to-front order for correct blending

Tile Rasterization

GPU-parallel screen-space rendering

Click tile to see Gaussian count

Screen Projection

3D to 2D via perspective transform

FOV60°

Frustum Culling

Skip Gaussians outside camera view

1 of 5 visible

Z-Buffer

Depth testing for occlusion

Opacity Accumulation

Multiple Gaussians add up to opaque

T = 1 - (1-α)^n = 66%

View-Dependent Effects

Spherical Harmonics

View-dependent color encoding

1 coefficients

Constant (diffuse)

Noise in Training

Perturb positions for robustness

Training Dynamics

Adaptive Density

Clone, split, or prune Gaussians

→

High gradient → duplicate for detail

Gradient Flow

How backprop optimizes Gaussian params

Step 0: Following negative gradient

Loss Landscape

Optimization finds the best parameters

Click to place optimizer (yellow = minimum)

Backpropagation

Compute gradients through render pipeline

The Processing Pipeline

From multi-view capture to real-time rendering, here's how the data flows through the system.

Step 1 of 6

Speed:

Multi-View Capture

Record the subject from multiple camera angles to get complete 3D coverage

Use ← → arrow keys to navigate, Space to play/pause

Key Concepts

Master these five concepts and you'll understand how Gaussian Splatting works. Click "Go deeper" on any card to drill into the math.

Intermediate

The 3D Gaussian Primitive

Each Gaussian is a fuzzy ellipsoid in 3D space with position, shape, view-dependent color, and opacity. Millions of these overlapping blobs create the final image.

Think of impressionist brushstrokes - each is a soft, translucent ellipse. Viewed together, they form a complete scene.

Try demo Go deeper

Intermediate

Covariance = Shape

The covariance matrix defines whether a Gaussian is spherical, pancake-shaped, or needle-like. It's computed from scale and rotation matrices.

Start with a round balloon. Squeeze it in one direction - it elongates. Rotate it - it points a new way. The covariance matrix describes exactly how the balloon is squished.

Try demo Go deeper

Intermediate

View-Dependent Color

Instead of storing a single RGB color, each Gaussian stores spherical harmonic coefficients that encode how color changes with viewing direction.

Imagine describing how a disco ball reflects light. Instead of listing brightness for every angle, you describe the pattern of changes - that's what spherical harmonics do.

Try demo Go deeper

Intermediate

The Splatting Pipeline

Rendering happens in three steps: project each 3D Gaussian to a 2D ellipse, sort all ellipses by depth, then blend them front-to-back using alpha compositing.

Think of layered transparency film on an overhead projector. Each layer is partially see-through. The final image is what you see through all layers combined.

Try demo Go deeper

Intermediate

Learning the Scene

Training is differentiable: render an image, compare to ground truth, backpropagate gradients to adjust every Gaussian's parameters. Adaptive densification splits or removes Gaussians as needed.

You're sculpting with millions of tiny clay blobs. Take a photo, compare to reference, adjust each blob slightly. Too big? Split it. Nearly invisible? Remove it. Repeat thousands of times.

Try demo Go deeper

Interactive Demos

Learn by doing. Manipulate parameters and see immediate visual feedback.

Manipulate a Single Gaussianbeginner

Adjust position, scale, rotation, and opacity to see how each parameter affects the Gaussian's shape.

Loading 3D Gaussian demo...

Tips for this demo

Try making a flat "pancake" shape by reducing one scale axis
Notice how rotation affects the ellipsoid orientation

Covariance Matrix Matrix Transforms

Alpha Compositingintermediate

Drag to reorder layers and see how depth ordering affects the final blended color.

Loading alpha blending demo...

Tips for this demo

Drag layers to reorder and watch the output change
Notice how semi-transparent layers reveal colors beneath

Depth Sorting

Spherical Harmonicsadvanced

Adjust SH coefficients to see how view-dependent color is encoded. This is how 3DGS captures specular highlights.

Loading spherical harmonics demo...

Tips for this demo

Start with degree 0, then add higher degrees to see the difference
Move the viewpoint to see color changes with viewing angle

Single Gaussian

Demo 4: Matrix Transformations

See how scale and rotation matrices transform a unit circle into an ellipse—the foundation of Gaussian covariance.

Loading matrix transform demo...

Demo 5: 3D Covariance Shapes

Manipulate scale along each axis to create spheres, pancakes, or needles—the building blocks of 3D Gaussian Splatting.

Loading covariance shape demo...

Demo 6: Training Progress

Watch how 3DGS training evolves over 30K iterations: Gaussian count, PSNR quality, and key milestones.

Loading training progress demo...

Demo 7: Differentiable Rendering

See how gradients flow backward through rendering. Click to set a target - the Gaussian learns to cover it.

Loading differentiable rendering demo...

Demo 8: Point Cloud to Gaussians

3DGS starts from SfM point cloud and initializes Gaussians at each point. Drag to rotate, toggle to see how points become splats.

Loading point cloud demo...

Demo 9: Tile-Based Rasterization

See how 3DGS divides the screen into tiles for parallel GPU processing. Click tiles to see which Gaussians they contain.

Loading tile rasterization demo...

Demo 10: Adaptive Density Control

Watch how 3DGS dynamically adjusts Gaussian count during training through densification and pruning.

Loading adaptive density demo...

Demo 11: Depth Sorting for Alpha Blending

Transparent objects must be rendered back-to-front. Watch the sorting algorithm in action.

Loading depth sorting demo...

Build It Yourself

Get started with the official implementation. Here's a step-by-step walkthrough.

Traditional 3DGS (Multi-View Capture)

Clone the official repository and set up the CUDA environment for GPU acceleration

bash

3 lines

1git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive
2conda env create --file environment.yml
3conda activate gaussian_splatting

Step 1 of 5

Conversational Avatar Quickstart (Docker)

Deploy a talking Gaussian avatar from a single photo using OpenAvatarChat + LAM in Docker. Server needs ~4-6 GB VRAM; the avatar renders client-side in the browser via WebGL.

1. Clone and setup

cd gaussian-avatar && bash scripts/setup.sh

Downloads models (~2 GB): wav2vec2, LAM Audio2Expression, SenseVoice ASR. Generates SSL certs for WebRTC.

2. Add your API key

cp .env.example .env && nano .env

Set OPENAI_API_KEY for the LLM. Or use the Ollama config for fully local operation (no keys needed).

3. Build and run

docker compose up --build

Builds the OpenAvatarChat image with CUDA 12.2 + Python 3.11, starts the avatar server + TURN relay.

4. Open browser

https://localhost:8282

Accept the self-signed cert, allow microphone access. The Gaussian avatar renders at 60+ FPS via WebGL.

5. Custom avatar (optional)

Generate a Gaussian avatar from any photo using LAM, export as .zip, and set asset_path in the config. Supports up to 5 concurrent sessions.

Resources

Original 3DGS Repository (INRIA)

github

3D Gaussian Splatting Paper (SIGGRAPH 2023)

paper

SuperSplat Editor (Web-based)

docs

Three.js Gaussian Splat Viewer

github

Luma AI WebGL Renderer

github

D3GA: Drivable 3D Gaussian Avatars

github

LAM: Large Avatar Model (One-Shot)

github

OpenAvatarChat (Conversational Pipeline)

github

GaussianTalker (Audio-Driven, 120 FPS)

github

TaoAvatar (Full-Body 3DGS + SMPL-X)

github

LAM Audio2Expression (Real-Time Blendshapes)

github

When to Use Gaussian Splatting

Use When

+You need photorealistic rendering of a specific person
+Real-time performance (60+ FPS) is critical
+You're building for VR/AR where multi-view consistency matters
+You have access to multi-view capture equipment
+You can afford per-person training time (hours)

Avoid When

−You need extreme variety of identities without any input images (consider Generative)
−You need to change lighting dynamically (lighting is baked in)
−You're constrained to web-only without GPU (consider Streaming)
−You need production-proven tools with mature ecosystems (consider MetaHuman)
−You need to handle complex clothing or loose hair motion

Best Use Case

VR/AR telepresence, real-time voice conversation, and any use case requiring photorealistic rendering at 60+ FPS. One-shot models now enable instant avatar creation from a single photo.

Common Misconceptions

Gaussians are like voxels

Actually: Voxels are discrete grid cells. Gaussians are continuous, overlapping, anisotropic (non-cubic), and don't exist on a grid.

3DGS uses ray tracing

Actually: 3DGS is a rasterization technique. It projects primitives to the screen, not rays through the scene. This is why it's so fast.

More Gaussians = better quality

Actually: Quality depends on proper placement and parameters. Poorly placed Gaussians create 'needle' artifacts and blurriness.

Spherical harmonics are just for lighting

Actually: In 3DGS, SH encodes view-dependent color, not lighting calculations. The lighting is 'baked in' during training.

Gaussian avatars always require multi-view capture

Actually: Feed-forward models like LAM create animatable Gaussian avatars from a single photo in seconds. Multi-view capture gives higher fidelity, but one-shot models are now viable for real-time conversation.

Real-Time Conversation with Gaussian Avatars

One-shot models and conversational pipelines have transformed Gaussian splatting from a static capture technique into a viable real-time conversation platform.

Traditional Path

1.Multi-view video capture (50-200 images)
2.Per-subject optimization (2-8 hours)
3.Rig with FLAME/blendshape driver
4.Deploy with custom rendering server

One-Shot Path (2025+)

1.Single face photo as input
2.LAM generates animatable avatar in seconds
3.Audio2Expression maps speech to blendshapes
4.WebGL/WebGPU renders in browser, no server GPU for rendering

OpenAvatarChat combines SileroVAD, SenseVoice ASR, an OpenAI-compatible LLM, EdgeTTS (or CosyVoice), and LAM Audio2Expression into a single pipeline driving a Gaussian avatar. The server needs only ~4-6 GB VRAM while the browser renders the 3D avatar at 60+ FPS via WebGL. End-to-end latency: ~2.2s on RTX 4090. Supports 5+ concurrent sessions. Deploy with Docker. All Apache 2.0 licensed.

Architecture: Browser + Server Split

Browser (Client)

LAM WebRender (WebGL) renders the 3D Gaussian avatar locally at 60-563 FPS. 52 ARKit blendshape coefficients arrive via WebSocket. No GPU needed on client.

Server (Docker + GPU)

SileroVAD detects speech, SenseVoice transcribes, LLM generates response, TTS synthesizes audio, Audio2Expression maps to ARKit blendshapes. Deploy via docker compose up.

Ready to Go Deeper?

Explore the math behind each concept, build a conversational avatar, or compare approaches.

Dive into Covariance Matrix →End-to-End Pipeline →Compare All Methods

Explore Other Approaches

MetaHuman

Game engine rigs

Video Generation

Diffusion & streaming

Neural 3D Rendering + Conversation~50 min

Gaussian Splatting

3D Gaussian primitives for real-time photorealistic rendering and conversation

What is Gaussian Splatting?

The Core Idea in 30 Seconds

Millions of Fuzzy Blobs

Each is a 3D Gaussian with position, shape, color, opacity

Splat to Screen

Project each blob to 2D, sort by depth, blend together

60+ FPS

Rasterization (not ray tracing) enables real-time speed

Core Mechanisms

Gaussian Function

The bell curve that defines each splat's falloff

σ1.0

G(x) = e^(-x²/2σ²)

Alpha Blending Math

How overlapping Gaussians combine colors

×0.5+

×0.5=

C_out = C_front × α + C_back × (1-α)

Covariance Matrix

2D ellipse shape from σx, σy, rotation

σx

σy

Matrix Transform

How rotation + scale create ellipsoid shapes

Rotation

Scale X

Point Cloud → Splats

Initialize Gaussians from SfM points

View-Dependent Color

Spherical harmonics encode reflections

View angle: 0°

Rendering Pipeline

Depth Sorting

Back-to-front order for correct transparency

Far

Mid

Near

Back-to-front order for correct blending

Tile Rasterization

GPU-parallel screen-space rendering

Click tile to see Gaussian count

Screen Projection

3D to 2D via perspective transform

FOV60°

Frustum Culling

Skip Gaussians outside camera view

1 of 5 visible

Z-Buffer

Depth testing for occlusion

Opacity Accumulation

Multiple Gaussians add up to opaque

T = 1 - (1-α)^n = 66%

View-Dependent Effects

Spherical Harmonics

View-dependent color encoding

1 coefficients

Constant (diffuse)

Noise in Training

Perturb positions for robustness

Training Dynamics

Adaptive Density

Clone, split, or prune Gaussians

→

High gradient → duplicate for detail

Gradient Flow

How backprop optimizes Gaussian params

Step 0: Following negative gradient

Loss Landscape

Optimization finds the best parameters

Click to place optimizer (yellow = minimum)

Backpropagation

Compute gradients through render pipeline

The Processing Pipeline

From multi-view capture to real-time rendering, here's how the data flows through the system.

Step 1 of 6

Speed:

Multi-View Capture

Record the subject from multiple camera angles to get complete 3D coverage

Use ← → arrow keys to navigate, Space to play/pause

Key Concepts

Master these five concepts and you'll understand how Gaussian Splatting works. Click "Go deeper" on any card to drill into the math.

Intermediate

The 3D Gaussian Primitive

Each Gaussian is a fuzzy ellipsoid in 3D space with position, shape, view-dependent color, and opacity. Millions of these overlapping blobs create the final image.

Think of impressionist brushstrokes - each is a soft, translucent ellipse. Viewed together, they form a complete scene.

Try demo Go deeper

Intermediate

Covariance = Shape

The covariance matrix defines whether a Gaussian is spherical, pancake-shaped, or needle-like. It's computed from scale and rotation matrices.

Start with a round balloon. Squeeze it in one direction - it elongates. Rotate it - it points a new way. The covariance matrix describes exactly how the balloon is squished.

Try demo Go deeper

Intermediate

View-Dependent Color

Instead of storing a single RGB color, each Gaussian stores spherical harmonic coefficients that encode how color changes with viewing direction.

Imagine describing how a disco ball reflects light. Instead of listing brightness for every angle, you describe the pattern of changes - that's what spherical harmonics do.

Try demo Go deeper

Intermediate

The Splatting Pipeline

Rendering happens in three steps: project each 3D Gaussian to a 2D ellipse, sort all ellipses by depth, then blend them front-to-back using alpha compositing.

Think of layered transparency film on an overhead projector. Each layer is partially see-through. The final image is what you see through all layers combined.

Try demo Go deeper

Intermediate

Learning the Scene

Training is differentiable: render an image, compare to ground truth, backpropagate gradients to adjust every Gaussian's parameters. Adaptive densification splits or removes Gaussians as needed.

You're sculpting with millions of tiny clay blobs. Take a photo, compare to reference, adjust each blob slightly. Too big? Split it. Nearly invisible? Remove it. Repeat thousands of times.

Try demo Go deeper

Interactive Demos

Learn by doing. Manipulate parameters and see immediate visual feedback.

Manipulate a Single Gaussianbeginner

Adjust position, scale, rotation, and opacity to see how each parameter affects the Gaussian's shape.

Loading 3D Gaussian demo...

Tips for this demo

Try making a flat "pancake" shape by reducing one scale axis
Notice how rotation affects the ellipsoid orientation

Covariance Matrix Matrix Transforms

Alpha Compositingintermediate

Drag to reorder layers and see how depth ordering affects the final blended color.

Loading alpha blending demo...

Tips for this demo

Drag layers to reorder and watch the output change
Notice how semi-transparent layers reveal colors beneath

Depth Sorting

Spherical Harmonicsadvanced

Adjust SH coefficients to see how view-dependent color is encoded. This is how 3DGS captures specular highlights.

Loading spherical harmonics demo...

Tips for this demo

Start with degree 0, then add higher degrees to see the difference
Move the viewpoint to see color changes with viewing angle

Single Gaussian

Demo 4: Matrix Transformations

See how scale and rotation matrices transform a unit circle into an ellipse—the foundation of Gaussian covariance.

Loading matrix transform demo...

Demo 5: 3D Covariance Shapes

Manipulate scale along each axis to create spheres, pancakes, or needles—the building blocks of 3D Gaussian Splatting.

Loading covariance shape demo...

Demo 6: Training Progress

Watch how 3DGS training evolves over 30K iterations: Gaussian count, PSNR quality, and key milestones.

Loading training progress demo...

Demo 7: Differentiable Rendering

See how gradients flow backward through rendering. Click to set a target - the Gaussian learns to cover it.

Loading differentiable rendering demo...

Demo 8: Point Cloud to Gaussians

3DGS starts from SfM point cloud and initializes Gaussians at each point. Drag to rotate, toggle to see how points become splats.

Loading point cloud demo...

Demo 9: Tile-Based Rasterization

See how 3DGS divides the screen into tiles for parallel GPU processing. Click tiles to see which Gaussians they contain.

Loading tile rasterization demo...

Demo 10: Adaptive Density Control

Watch how 3DGS dynamically adjusts Gaussian count during training through densification and pruning.

Loading adaptive density demo...

Demo 11: Depth Sorting for Alpha Blending

Transparent objects must be rendered back-to-front. Watch the sorting algorithm in action.

Loading depth sorting demo...

Build It Yourself

Get started with the official implementation. Here's a step-by-step walkthrough.

Traditional 3DGS (Multi-View Capture)

Clone the official repository and set up the CUDA environment for GPU acceleration

bash

3 lines

1git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive
2conda env create --file environment.yml
3conda activate gaussian_splatting

Step 1 of 5

Conversational Avatar Quickstart (Docker)

Deploy a talking Gaussian avatar from a single photo using OpenAvatarChat + LAM in Docker. Server needs ~4-6 GB VRAM; the avatar renders client-side in the browser via WebGL.

1. Clone and setup

cd gaussian-avatar && bash scripts/setup.sh

Downloads models (~2 GB): wav2vec2, LAM Audio2Expression, SenseVoice ASR. Generates SSL certs for WebRTC.

2. Add your API key

cp .env.example .env && nano .env

Set OPENAI_API_KEY for the LLM. Or use the Ollama config for fully local operation (no keys needed).

3. Build and run

docker compose up --build

Builds the OpenAvatarChat image with CUDA 12.2 + Python 3.11, starts the avatar server + TURN relay.

4. Open browser

https://localhost:8282

Accept the self-signed cert, allow microphone access. The Gaussian avatar renders at 60+ FPS via WebGL.

5. Custom avatar (optional)

Generate a Gaussian avatar from any photo using LAM, export as .zip, and set asset_path in the config. Supports up to 5 concurrent sessions.

Resources

Original 3DGS Repository (INRIA)

github

3D Gaussian Splatting Paper (SIGGRAPH 2023)

paper

SuperSplat Editor (Web-based)

docs

Three.js Gaussian Splat Viewer

github

Luma AI WebGL Renderer

github

D3GA: Drivable 3D Gaussian Avatars

github

LAM: Large Avatar Model (One-Shot)

github

OpenAvatarChat (Conversational Pipeline)

github

GaussianTalker (Audio-Driven, 120 FPS)

github

TaoAvatar (Full-Body 3DGS + SMPL-X)

github

LAM Audio2Expression (Real-Time Blendshapes)

github

When to Use Gaussian Splatting

Use When

+You need photorealistic rendering of a specific person
+Real-time performance (60+ FPS) is critical
+You're building for VR/AR where multi-view consistency matters
+You have access to multi-view capture equipment
+You can afford per-person training time (hours)

Avoid When

−You need extreme variety of identities without any input images (consider Generative)
−You need to change lighting dynamically (lighting is baked in)
−You're constrained to web-only without GPU (consider Streaming)
−You need production-proven tools with mature ecosystems (consider MetaHuman)
−You need to handle complex clothing or loose hair motion

Best Use Case

VR/AR telepresence, real-time voice conversation, and any use case requiring photorealistic rendering at 60+ FPS. One-shot models now enable instant avatar creation from a single photo.

Common Misconceptions

Gaussians are like voxels

Actually: Voxels are discrete grid cells. Gaussians are continuous, overlapping, anisotropic (non-cubic), and don't exist on a grid.

3DGS uses ray tracing

Actually: 3DGS is a rasterization technique. It projects primitives to the screen, not rays through the scene. This is why it's so fast.

More Gaussians = better quality

Actually: Quality depends on proper placement and parameters. Poorly placed Gaussians create 'needle' artifacts and blurriness.

Spherical harmonics are just for lighting

Actually: In 3DGS, SH encodes view-dependent color, not lighting calculations. The lighting is 'baked in' during training.

Gaussian avatars always require multi-view capture

Real-Time Conversation with Gaussian Avatars

One-shot models and conversational pipelines have transformed Gaussian splatting from a static capture technique into a viable real-time conversation platform.

Traditional Path

1.Multi-view video capture (50-200 images)
2.Per-subject optimization (2-8 hours)
3.Rig with FLAME/blendshape driver
4.Deploy with custom rendering server

One-Shot Path (2025+)

1.Single face photo as input
2.LAM generates animatable avatar in seconds
3.Audio2Expression maps speech to blendshapes
4.WebGL/WebGPU renders in browser, no server GPU for rendering

Architecture: Browser + Server Split

Browser (Client)

LAM WebRender (WebGL) renders the 3D Gaussian avatar locally at 60-563 FPS. 52 ARKit blendshape coefficients arrive via WebSocket. No GPU needed on client.

Server (Docker + GPU)

SileroVAD detects speech, SenseVoice transcribes, LLM generates response, TTS synthesizes audio, Audio2Expression maps to ARKit blendshapes. Deploy via docker compose up.

Ready to Go Deeper?

Explore the math behind each concept, build a conversational avatar, or compare approaches.

Dive into Covariance Matrix →End-to-End Pipeline →Compare All Methods

Explore Other Approaches

MetaHuman

Game engine rigs

Video Generation

Diffusion & streaming