Shader Optimization Techniques
Shaders run on every pixel, every frame. Small optimizations compound massively.
The Golden Rule
Profile first, optimize second. Modern compilers are smart. Don’t waste time optimizing code that isn’t slow.
Common Bottlenecks
Texture Fetches
The slowest operation in most shaders:
// BAD - Multiple fetches from same texture
vec3 color1 = texture(albedoMap, uv).rgb;
vec3 color2 = texture(albedoMap, uv + offset).rgb;
// GOOD - Fetch once, reuse
vec4 albedo = texture(albedoMap, uv);
vec3 color1 = albedo.rgb;
vec3 color2 = albedo.rgb;
Dependent Texture Reads
Reads where UV depends on previous texture fetch:
// SLOW - Dependent read
vec2 offset = texture(noiseMap, uv).rg;
vec3 color = texture(colorMap, uv + offset).rgb;
// FASTER - Pre-compute in vertex shader when possible
Math Operations
Some operations are more expensive:
Fast: add, multiply, mad (multiply-add) Medium: reciprocal, rsqrt Slow: division, sqrt, pow, sin, cos, tan
// BAD
float result = value / constant;
// GOOD - Pre-compute reciprocal
float invConstant = 1.0 / constant;
float result = value * invConstant;
Optimization Patterns
Move to Vertex Shader
Calculations that don’t need per-pixel precision:
// Vertex shader
out vec3 viewDir;
void main() {
vec3 worldPos = (model * vec4(position, 1.0)).xyz;
viewDir = normalize(cameraPos - worldPos);
// ...
}
// Fragment shader - viewDir already calculated!
Use Lower Precision
Mobile especially benefits:
// Use mediump where possible
mediump vec3 normal;
mediump float roughness;
// Only use highp when necessary
highp vec3 worldPosition;
Vectorize Operations
GPUs love vector math:
// BAD - Scalar operations
float r = texture(tex, uv).r * factor;
float g = texture(tex, uv).g * factor;
float b = texture(tex, uv).b * factor;
// GOOD - Vector operation
vec3 color = texture(tex, uv).rgb * factor;
Branch Prediction
Avoid dynamic branching:
// BAD - Dynamic branch
if (useTexture) {
color = texture(tex, uv).rgb;
} else {
color = baseColor;
}
// GOOD - Use mix/lerp
float useTex = float(useTexture);
color = mix(baseColor, texture(tex, uv).rgb, useTex);
Advanced Techniques
Approximations
Sometimes exact isn’t necessary:
// Exact pow(x, 5) is expensive
float pow5(float x) {
float x2 = x * x;
return x2 * x2 * x;
}
// Fast approximate sqrt for normalization
vec3 fastNormalize(vec3 v) {
return v * inversesqrt(dot(v, v));
}
Lookup Tables (LUTs)
Pre-compute expensive functions:
// Pre-computed in texture
float fresnel = texture(fresnelLUT, vec2(NdotV, roughness)).r;
Pack Data
Use all texture channels:
// Pack multiple values in one texture
// R: Metallic
// G: Roughness
// B: Ambient Occlusion
// A: Height
vec4 surfaceData = texture(packedMap, uv);
Measuring Performance
Use GPU profilers:
- RenderDoc (PC)
- Xcode GPU Debugger (iOS)
- Android GPU Inspector (Android)
- NVIDIA Nsight (NVIDIA GPUs)
Look for:
- Shader execution time
- Texture bandwidth
- Occupancy
- Register usage
The Reality
Most shaders are fine. Focus optimization on:
- Shaders that run on many pixels (fullscreen effects)
- Shaders on low-end hardware
- Shaders causing measurable frame drops
Readability > premature optimization.
[!note] A shader that’s 10% slower but maintainable is usually the right choice.