Gaussian Splatting
3D Gaussian primitives for real-time photorealistic rendering and conversation
What is Gaussian Splatting?
Imagine representing a 3D scene not with triangles or pixels, but with millions of tiny, fuzzy, colored blobs floating in space. That's Gaussian Splatting. Each blob, called a Gaussian, has a position, a shape, a color that can change based on viewing angle, and transparency. When you look at the scene, these blobs are 'splatted' onto your screen like spray paint, blending together to create photorealistic images at 60+ frames per second. For avatars, this means we can capture a real person as a cloud of Gaussians, then animate and render them in real-time with quality that rivals photographs. Recent one-shot models like LAM (SIGGRAPH 2025) can create animatable Gaussian avatars from a single photo in seconds, and when paired with Audio2Expression pipelines, these avatars can hold real-time voice conversations in the browser.
The Core Idea in 30 Seconds
Millions of Fuzzy Blobs
Each is a 3D Gaussian with position, shape, color, opacity
Splat to Screen
Project each blob to 2D, sort by depth, blend together
60+ FPS
Rasterization (not ray tracing) enables real-time speed
Core Mechanisms
The bell curve that defines each splat's falloff
G(x) = e^(-x²/2σ²)
How overlapping Gaussians combine colors
C_out = C_front × α + C_back × (1-α)
2D ellipse shape from σx, σy, rotation
How rotation + scale create ellipsoid shapes
Initialize Gaussians from SfM points
Spherical harmonics encode reflections
View angle: 0°
Rendering Pipeline
Back-to-front order for correct transparency
Back-to-front order for correct blending
GPU-parallel screen-space rendering
Click tile to see Gaussian count
3D to 2D via perspective transform
Skip Gaussians outside camera view
1 of 5 visible
Depth testing for occlusion
Multiple Gaussians add up to opaque
T = 1 - (1-α)^n = 66%
View-Dependent Effects
View-dependent color encoding
1 coefficients
Constant (diffuse)
Perturb positions for robustness
Training Dynamics
Clone, split, or prune Gaussians
High gradient → duplicate for detail
How backprop optimizes Gaussian params
Step 0: Following negative gradient
Optimization finds the best parameters
Click to place optimizer (yellow = minimum)
Compute gradients through render pipeline
The Processing Pipeline
From multi-view capture to real-time rendering, here's how the data flows through the system.
Multi-View Capture
Record the subject from multiple camera angles to get complete 3D coverage
Use ← → arrow keys to navigate, Space to play/pause
Key Concepts
Master these five concepts and you'll understand how Gaussian Splatting works. Click "Go deeper" on any card to drill into the math.
The 3D Gaussian Primitive
Each Gaussian is a fuzzy ellipsoid in 3D space with position, shape, view-dependent color, and opacity. Millions of these overlapping blobs create the final image.
Covariance = Shape
The covariance matrix defines whether a Gaussian is spherical, pancake-shaped, or needle-like. It's computed from scale and rotation matrices.
View-Dependent Color
Instead of storing a single RGB color, each Gaussian stores spherical harmonic coefficients that encode how color changes with viewing direction.
The Splatting Pipeline
Rendering happens in three steps: project each 3D Gaussian to a 2D ellipse, sort all ellipses by depth, then blend them front-to-back using alpha compositing.
Learning the Scene
Training is differentiable: render an image, compare to ground truth, backpropagate gradients to adjust every Gaussian's parameters. Adaptive densification splits or removes Gaussians as needed.
Interactive Demos
Learn by doing. Manipulate parameters and see immediate visual feedback.
Manipulate a Single Gaussianbeginner
Adjust position, scale, rotation, and opacity to see how each parameter affects the Gaussian's shape.
Tips for this demo
- Try making a flat "pancake" shape by reducing one scale axis
- Notice how rotation affects the ellipsoid orientation
Alpha Compositingintermediate
Drag to reorder layers and see how depth ordering affects the final blended color.
Tips for this demo
- Drag layers to reorder and watch the output change
- Notice how semi-transparent layers reveal colors beneath
Spherical Harmonicsadvanced
Adjust SH coefficients to see how view-dependent color is encoded. This is how 3DGS captures specular highlights.
Tips for this demo
- Start with degree 0, then add higher degrees to see the difference
- Move the viewpoint to see color changes with viewing angle
Demo 4: Matrix Transformations
See how scale and rotation matrices transform a unit circle into an ellipse—the foundation of Gaussian covariance.
Demo 5: 3D Covariance Shapes
Manipulate scale along each axis to create spheres, pancakes, or needles—the building blocks of 3D Gaussian Splatting.
Demo 6: Training Progress
Watch how 3DGS training evolves over 30K iterations: Gaussian count, PSNR quality, and key milestones.
Demo 7: Differentiable Rendering
See how gradients flow backward through rendering. Click to set a target - the Gaussian learns to cover it.
Demo 8: Point Cloud to Gaussians
3DGS starts from SfM point cloud and initializes Gaussians at each point. Drag to rotate, toggle to see how points become splats.
Demo 9: Tile-Based Rasterization
See how 3DGS divides the screen into tiles for parallel GPU processing. Click tiles to see which Gaussians they contain.
Demo 10: Adaptive Density Control
Watch how 3DGS dynamically adjusts Gaussian count during training through densification and pruning.
Demo 11: Depth Sorting for Alpha Blending
Transparent objects must be rendered back-to-front. Watch the sorting algorithm in action.
Build It Yourself
Get started with the official implementation. Here's a step-by-step walkthrough.
Traditional 3DGS (Multi-View Capture)
Clone the official repository and set up the CUDA environment for GPU acceleration
1git clone https://github.com/graphdeco-inria/gaussian-splatting --recursive2conda env create --file environment.yml3conda activate gaussian_splattingStep 1 of 5
Conversational Avatar Quickstart (Docker)
Deploy a talking Gaussian avatar from a single photo using OpenAvatarChat + LAM in Docker. Server needs ~4-6 GB VRAM; the avatar renders client-side in the browser via WebGL.
1. Clone and setup
Downloads models (~2 GB): wav2vec2, LAM Audio2Expression, SenseVoice ASR. Generates SSL certs for WebRTC.
2. Add your API key
Set OPENAI_API_KEY for the LLM. Or use the Ollama config for fully local operation (no keys needed).
3. Build and run
Builds the OpenAvatarChat image with CUDA 12.2 + Python 3.11, starts the avatar server + TURN relay.
4. Open browser
Accept the self-signed cert, allow microphone access. The Gaussian avatar renders at 60+ FPS via WebGL.
5. Custom avatar (optional)
Generate a Gaussian avatar from any photo using LAM, export as .zip, and set asset_path in the config. Supports up to 5 concurrent sessions.
Resources
Original 3DGS Repository (INRIA)
github
3D Gaussian Splatting Paper (SIGGRAPH 2023)
paper
SuperSplat Editor (Web-based)
docs
Three.js Gaussian Splat Viewer
github
Luma AI WebGL Renderer
github
D3GA: Drivable 3D Gaussian Avatars
github
LAM: Large Avatar Model (One-Shot)
github
OpenAvatarChat (Conversational Pipeline)
github
GaussianTalker (Audio-Driven, 120 FPS)
github
TaoAvatar (Full-Body 3DGS + SMPL-X)
github
LAM Audio2Expression (Real-Time Blendshapes)
github
When to Use Gaussian Splatting
Use When
- +You need photorealistic rendering of a specific person
- +Real-time performance (60+ FPS) is critical
- +You're building for VR/AR where multi-view consistency matters
- +You have access to multi-view capture equipment
- +You can afford per-person training time (hours)
Avoid When
- −You need extreme variety of identities without any input images (consider Generative)
- −You need to change lighting dynamically (lighting is baked in)
- −You're constrained to web-only without GPU (consider Streaming)
- −You need production-proven tools with mature ecosystems (consider MetaHuman)
- −You need to handle complex clothing or loose hair motion
Best Use Case
VR/AR telepresence, real-time voice conversation, and any use case requiring photorealistic rendering at 60+ FPS. One-shot models now enable instant avatar creation from a single photo.
Common Misconceptions
Gaussians are like voxels
Actually: Voxels are discrete grid cells. Gaussians are continuous, overlapping, anisotropic (non-cubic), and don't exist on a grid.
3DGS uses ray tracing
Actually: 3DGS is a rasterization technique. It projects primitives to the screen, not rays through the scene. This is why it's so fast.
More Gaussians = better quality
Actually: Quality depends on proper placement and parameters. Poorly placed Gaussians create 'needle' artifacts and blurriness.
Spherical harmonics are just for lighting
Actually: In 3DGS, SH encodes view-dependent color, not lighting calculations. The lighting is 'baked in' during training.
Gaussian avatars always require multi-view capture
Actually: Feed-forward models like LAM create animatable Gaussian avatars from a single photo in seconds. Multi-view capture gives higher fidelity, but one-shot models are now viable for real-time conversation.
Real-Time Conversation with Gaussian Avatars
One-shot models and conversational pipelines have transformed Gaussian splatting from a static capture technique into a viable real-time conversation platform.
Traditional Path
- 1.Multi-view video capture (50-200 images)
- 2.Per-subject optimization (2-8 hours)
- 3.Rig with FLAME/blendshape driver
- 4.Deploy with custom rendering server
One-Shot Path (2025+)
- 1.Single face photo as input
- 2.LAM generates animatable avatar in seconds
- 3.Audio2Expression maps speech to blendshapes
- 4.WebGL/WebGPU renders in browser, no server GPU for rendering
Architecture: Browser + Server Split
Browser (Client)
LAM WebRender (WebGL) renders the 3D Gaussian avatar locally at 60-563 FPS. 52 ARKit blendshape coefficients arrive via WebSocket. No GPU needed on client.
Server (Docker + GPU)
SileroVAD detects speech, SenseVoice transcribes, LLM generates response, TTS synthesizes audio, Audio2Expression maps to ARKit blendshapes. Deploy via docker compose up.
Ready to Go Deeper?
Explore the math behind each concept, build a conversational avatar, or compare approaches.