perf: apply Burst compiler to primary hot-path candidates #34

Merged
lavarius merged 3 commits from feat/burst-compiler-optimization into master 2026-03-13 04:31:43 +00:00
Collaborator

Summary

Applies Unity Burst compiler to the three primary per-frame hot paths identified in the analysis. All three use the [BurstCompile] static-method pattern (same as the existing BurstBlendshapeRemapper) so there is zero job-scheduler overhead — Burst JIT-compiles the methods to SIMD native code and they are called directly from managed code.


Changes

SimpleGravity.cs

  • Added GravityMath static [BurstCompile] class
  • ComputeVelocity() handles the grounded snap (-2f) and gravity accumulation (+= gravity * dt) in native code, including the branch

MovementManager.cs

  • Added BurstMovementMath static [BurstCompile] class with 3 methods:
    • ComputeCameraRelativeDirection — flattens camera forward/right to XZ plane, normalizes, produces camera-relative float3 move direction
    • ComputeSpeed — dot-product same-direction check → acceleration or deceleration
    • ComputeRotationquaternion.LookRotationSafe + math.slerp replacing the managed Quaternion.LookRotation + Quaternion.Slerp

FaceTracker.cs

Three improvements stacked:

  1. Blendshape index cache — all 52 mesh indices + smooth speeds computed once in Start() via InitBlendshapeCache(), eliminating 52 GetBlendShapeIndex string lookups every frame
  2. BlendshapeSmoothingJob [BurstCompile] IJob — 52-float vectorized math.lerp pass, auto-vectorized by Burst with SIMD
  3. ApplyBlendshapes() rewritten as a 3-phase batch: read all 52 weights → job.Run() (synchronous Burst, no scheduler) → write all 52 smoothed+remapped weights back
  4. OnDestroy() added to safely dispose the four persistent NativeArray<float> buffers

Expected Gains

File Hotspot Est. gain
MovementManager Cam-relative math, dot-product accel, LookRotation+Slerp +15–30%
SimpleGravity Gravity velocity branch+accumulation +10–15%
FaceTracker 52× string index lookups eliminated + SIMD smoothing +8–15%

Notes

  • No behaviour changes — all logic is identical, only the execution path changes (managed → Burst native)
  • job.Run() is used (not job.Schedule()) so there is no 1-frame latency and no dependency management needed
  • NativeArray buffers use Allocator.Persistent to avoid per-frame allocation cost
  • quaternion.LookRotationSafe is used instead of LookRotation to handle zero-length vectors gracefully
## Summary Applies Unity Burst compiler to the three primary per-frame hot paths identified in the analysis. All three use the `[BurstCompile]` static-method pattern (same as the existing `BurstBlendshapeRemapper`) so there is zero job-scheduler overhead — Burst JIT-compiles the methods to SIMD native code and they are called directly from managed code. --- ## Changes ### `SimpleGravity.cs` - Added `GravityMath` static `[BurstCompile]` class - `ComputeVelocity()` handles the grounded snap (`-2f`) and gravity accumulation (`+= gravity * dt`) in native code, including the branch ### `MovementManager.cs` - Added `BurstMovementMath` static `[BurstCompile]` class with 3 methods: - **`ComputeCameraRelativeDirection`** — flattens camera forward/right to XZ plane, normalizes, produces camera-relative `float3` move direction - **`ComputeSpeed`** — dot-product same-direction check → acceleration or deceleration - **`ComputeRotation`** — `quaternion.LookRotationSafe` + `math.slerp` replacing the managed `Quaternion.LookRotation` + `Quaternion.Slerp` ### `FaceTracker.cs` Three improvements stacked: 1. **Blendshape index cache** — all 52 mesh indices + smooth speeds computed once in `Start()` via `InitBlendshapeCache()`, eliminating 52 `GetBlendShapeIndex` string lookups every frame 2. **`BlendshapeSmoothingJob`** `[BurstCompile]` `IJob` — 52-float vectorized `math.lerp` pass, auto-vectorized by Burst with SIMD 3. **`ApplyBlendshapes()` rewritten** as a 3-phase batch: read all 52 weights → `job.Run()` (synchronous Burst, no scheduler) → write all 52 smoothed+remapped weights back 4. **`OnDestroy()`** added to safely dispose the four persistent `NativeArray<float>` buffers --- ## Expected Gains | File | Hotspot | Est. gain | |------|---------|-----------| | `MovementManager` | Cam-relative math, dot-product accel, LookRotation+Slerp | +15–30% | | `SimpleGravity` | Gravity velocity branch+accumulation | +10–15% | | `FaceTracker` | 52× string index lookups eliminated + SIMD smoothing | +8–15% | --- ## Notes - No behaviour changes — all logic is identical, only the execution path changes (managed → Burst native) - `job.Run()` is used (not `job.Schedule()`) so there is no 1-frame latency and no dependency management needed - `NativeArray` buffers use `Allocator.Persistent` to avoid per-frame allocation cost - `quaternion.LookRotationSafe` is used instead of `LookRotation` to handle zero-length vectors gracefully
- Analysis of MovementManager.cs (15-30% speedup potential)
- Analysis of SimpleGravity.cs (10-25% speedup potential)
- Analysis of FaceTracker.cs (5-15% additional speedup potential)
- Detailed implementation roadmaps for each file
- Job implementations with code examples
- Performance metrics and optimization strategies
SimpleGravity.cs:
- Add GravityMath [BurstCompile] static class
- ComputeVelocity() handles grounded reset (-2f) and gravity
  accumulation in native code; same pattern as BurstBlendshapeRemapper

MovementManager.cs:
- Add BurstMovementMath [BurstCompile] static class with 3 methods:
  - ComputeCameraRelativeDirection: flatten+normalize cam vectors,
    produce camera-relative float3 moveDir
  - ComputeSpeed: dot-product direction check + accel/decel logic
  - ComputeRotation: quaternion.LookRotationSafe + math.slerp
- All called synchronously (no job-scheduling overhead); Burst still
  JIT-compiles to SIMD native code

FaceTracker.cs:
- Cache all 52 blendshape mesh indices + smooth speeds at Start() to
  eliminate per-frame GetBlendShapeIndex string/dict lookups
- Add persistent NativeArray<float> buffers (Allocator.Persistent)
- Add BlendshapeSmoothingJob [BurstCompile] IJob: 52-value vectorized
  lerp pass executed via job.Run() (main-thread Burst, no scheduler)
- Rewrite ApplyBlendshapes() as 3-phase batch: read → Burst smooth → write
- Add OnDestroy() to dispose NativeArrays safely

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
- BurstMovementMath (MovementManager.cs): add 'in' modifier to float2/float3/quaternion
  by-value parameters in ComputeCameraRelativeDirection, ComputeSpeed, and ComputeRotation.
  Burst's external function ABI does not support structs/vectors passed by value (BC1064/BC1067).

- FaceTracker.cs: qualify [ReadOnly] attributes on BlendshapeSmoothingJob fields as
  [Unity.Collections.ReadOnly] to resolve ambiguity with Sirenix.OdinInspector.ReadOnlyAttribute (CS0104).

Co-authored-by: Copilot <223556219+Copilot@users.noreply.github.com>
lavarius merged commit 1d3c1b9840 into master 2026-03-13 04:31:43 +00:00
lavarius deleted branch feat/burst-compiler-optimization 2026-03-13 04:31:44 +00:00
Sign in to join this conversation.
No reviewers
No milestone
No project
No assignees
2 participants
Notifications
Due date
The due date is invalid or out of range. Please use the format "yyyy-mm-dd".

No due date set.

Dependencies

No dependencies set.

Reference
lavarius/ProjectOverlay!34
No description provided.