NIA-GPU-001: Niagara GPU Budget

WarningNiagara

What This Rule Detects

This rule flags Niagara Systems with settings that may exceed runtime VFX budgets:

GPU emitter count: GPU simulation has per-emitter overhead
Collision module count: CPU↔GPU sync for collision is expensive
Missing fixed bounds: Dynamic bounds cause CPU recalculation each frame
Total emitter complexity: Combined risk of all emitters

The default threshold is a risk score of 5.0 before flagging.

Why This Matters

Three Budgets to Consider

Niagara systems affect three distinct budgets:

Budget	What It Means	When You Feel It
GPU simulation	Particle update on GPU	Frame time, heat
CPU ↔ GPU sync	Data transfers for collision	Stalls, hitches
Culling/bounds	Visibility determination	Pop-in, CPU overhead

GPU vs CPU Emitters: A Trade-off

GPU simulation is excellent for scale (thousands of particles) but has per-emitter overhead:

Each GPU emitter requires compute dispatches
Buffer management and sync points add fixed cost
Many small GPU emitters can be worse than one large CPU emitter

Rule of thumb (from Niagara documentation):

Under 1,000 particles → CPU often more efficient
Over 1,000 particles → GPU usually wins
Collision needed → CPU recommended (avoids sync)

Fixed Bounds: Culling Correctness and Performance

Why this matters: Niagara systems with missing fixed bounds must calculate bounds dynamically by reading all particle positions every frame.

From Niagara’s API documentation: Fixed Bounds is an explicit, editable property that enables skipping per-frame bounds calculation.

Without fixed bounds:

System recalculates bounds every frame
Reads all particle positions from GPU
Causes CPU overhead and potential sync stalls
Effects may pop/disappear unexpectedly during culling

With fixed bounds:

Bounds are known in advance
Culling is instant and correct
No per-frame position reads

Tick Scheduling Caveat

From Niagara documentation: Using a fixed tick delta can cause substepping (multiple ticks per frame) and can force the system to tick on the game thread instead of an async task.

If you “stabilize” simulation with fixed delta without understanding the scheduling consequence, you may unexpectedly increase CPU cost.

Scalability: Particle systems that run fine in isolation can destroy frame rate when 10+ instances spawn simultaneously.

Real Example: NS_MuzzleFlash with 4 GPU emitters and collision for shell casings. Each weapon fires it. With 8 AI firing simultaneously, that’s 32 GPU emitters and 8 collision syncs per frame.

When This Is Acceptable

Hero VFX: Important moments that justify the cost (ultimates, boss attacks)
Limited instances: Only 1-2 ever active simultaneously
Cinematic sequences: Pre-planned, non-interactive visuals
Already profiled: Measured on target hardware and within budget

The Problem

BeginPlay

Direct Hard Reference

Spawn Actor

GPU emitters have fixed overhead per-emitter
Collision modules cause GPU→CPU→GPU sync
Missing fixed bounds forces per-frame calculation

Per-emitter overhead multiplies when multiple systems spawn.

Risk Score Calculation

RiskScore =
    GpuEmitterCount × 2.0 +
    CollisionModuleCount × 1.5 +
    TotalEmitterCount × 0.25 +
    (MissingFixedBounds ? 1.0 : 0.0)

EstimatedImpactMs =
    GpuEmitterCount × 0.40 +
    CollisionModuleCount × 0.18

Threshold: 5.0 (default)
Critical multiplier: 2.5×

Example Analysis

NS_Explosion (complex VFX)
├── Emitter 1: Core Flash (GPU)         → +2.0 risk
├── Emitter 2: Debris (GPU + Collision) → +2.0 + 1.5 = +3.5 risk
├── Emitter 3: Smoke Trail (GPU)        → +2.0 risk
├── Emitter 4: Sparks (GPU)             → +2.0 risk
├── Emitter 5: Shockwave (CPU)          → +0.25 risk
└── No Fixed Bounds                      → +1.0 risk

Total Risk Score: 10.75 (threshold: 5.0) = CRITICAL
Estimated Impact: 4 GPU × 0.4 + 1 Collision × 0.18 = 1.78ms/instance

The Fix

Option 1: Convert GPU to CPU Emitters

For emitters with low particle counts, CPU can be more efficient:

When to use CPU:

<1,000 particles
Simple update logic
No GPU-specific features (GPU events, mesh sampling)
Collision needed

How to convert:

Open Niagara System
Select emitter
Properties → Sim Target → CPUSim

CPU emitters avoid GPU dispatch overhead for small particle counts.

Option 2: Consolidate Emitters

Merge similar emitters to reduce overhead:

Before (4 GPU emitters):

Emitter 1: Red sparks
Emitter 2: Orange sparks
Emitter 3: Yellow sparks
Emitter 4: White sparks

After (1 GPU emitter):

Emitter 1: Sparks
  → Random color from palette
  → Single dispatch instead of 4

Option 3: Remove or Simplify Collision

Collision is expensive. Consider alternatives:

Option A: Kill instead of collide

Particle Update → Kill Particles When Below Z
(Floor height is known, no collision needed)

Option B: CPU emitter for collision

GPU emitter for visuals (thousands of particles)
CPU emitter for physics (tens of particles)

Option C: Collision on spawn only

Check spawn position, adjust if inside geometry
No per-frame collision queries

Option 4: Enable Fixed Bounds

Set explicit bounds instead of calculating:

Open Niagara System
System Properties → Fixed Bounds
Check Fixed Bounds
Set Min/Max to contain all possible particles

Estimating bounds:

Max extent = SpawnPosition + (MaxVelocity × Lifetime) + Gravity

Example: Particles spawn at origin, max velocity 500, lifetime 2s, gravity -980:

X/Y: ±1000 units (500 × 2)
Z: +1000 to -2460 (velocity up, then gravity down)

Option 5: Use Scalability Settings

Configure LOD and budget scaling:

Open Niagara System
System Properties → Effect Type
Create/assign Effect Type with budget limits

Effect Type settings:

Max System Instances: 10
Max Per-System Budget: 0.5ms
Significance Handler: Distance-based

Option 6: Split into Scalability Tiers

Create quality levels for different platforms:

NS_Explosion_High
├── 5 GPU emitters, collision, full detail
└── Used on: High-end PC

NS_Explosion_Med
├── 3 GPU emitters, no collision
└── Used on: Console, mid-range PC

NS_Explosion_Low
├── 1 CPU emitter, simple sprites
└── Used on: Mobile, low-end

Use Scalability settings to auto-select based on platform.

GPU vs CPU Emitter Guide

Factor	Prefer GPU	Prefer CPU
Particle count	>1,000	<1,000
Update complexity	Parallel-friendly	Sequential/conditional
Collision needed	No	Yes
Mesh sampling	Mesh Distance Fields	CPU mesh sampling
Instance count	Few (1-5)	Many (10+)

Bounds Optimization

Why fixed bounds matter:

Dynamic bounds read all particle positions every frame
With 10,000 particles, that’s 10k position reads
Fixed bounds skip all of this

When dynamic bounds are OK:

Debug/development only
<100 particles total
Bounds change is intentional (growing effects)

Profiling Niagara

Use these console commands:

stat Niagara           // Overview of Niagara cost
stat NiagaraGPU        // GPU-specific timing
stat NiagaraSystem     // Per-system breakdown

In Niagara Editor:

Performance Mode shows per-emitter cost
Statistics panel shows particle counts
GPU Visualizer shows dispatch timings

Configuration

Threshold: Risk score before flagging (default: 5.0)

To adjust in Project Settings:

Blueprint Health Analyzer → Rule Thresholds → NIA-GPU-001 → 8.0

Higher thresholds for VFX-heavy games with validated budgets. Lower thresholds (3.0) for mobile or VR where particle budgets are tight.

BP-TICK-001 - Niagara spawned per-frame compounds cost
RND-SET-001 - Rendering settings affect Niagara performance
AST-MESH-001 - Mesh particles add geometry cost