go-twig/STRING_INTERN_BENCHMARK_RESULTS.md
semihalev 4d0e37e1a0 Implement Zero Allocation Plan Phase 1: Global String Cache Optimization
- Added global string cache for efficient string interning (5.2x faster)
- Implemented optimized tokenizer with object pooling
- Created comprehensive benchmarks and documentation
- Cleaned up old optimization files and experiments

Performance improvements:
- String interning: 5.2x faster (1,492 ns/op vs 7,746 ns/op)
- Zero allocations for common strings
- Same memory efficiency as original (36 B/op, 9 allocs/op)
2025-03-12 10:05:23 +03:00

3.8 KiB

String Interning Optimization Benchmark Results

Overview

This document presents the benchmark results for Phase 1 of the Zero Allocation Plan: Global String Cache Optimization.

String Interning Benchmarks

Individual String Interning Performance

Benchmark Operations/sec ns/op B/op allocs/op
BenchmarkIntern_Common 165,962,065 7.092 0 0
BenchmarkIntern_Uncommon 22,551,727 53.14 24 1
BenchmarkIntern_Long 562,113,764 2.138 0 0

String Interning Comparison

Benchmark Operations/sec ns/op B/op allocs/op
OriginalGetStringConstant 154,611 7,746 0 0
GlobalIntern 813,786 1,492 0 0

The global string interning is about 5.2 times faster than the original method.

Tokenizer Benchmarks

Benchmark Operations/sec ns/op B/op allocs/op
OriginalTokenizer 128,847 9,316 36 9
OptimizedTokenizer (Initial) 119,088 10,209 11,340 27
OptimizedTokenizer (Pooled) 128,768 9,377 36 9

Analysis

  1. String Interning Efficiency:

    • For common strings, the interning is very efficient with zero allocations
    • For uncommon strings, there's only one allocation per operation
    • For long strings (>64 bytes), we avoid interning altogether to prevent memory bloat
  2. Global String Cache Performance:

    • Our new Intern function is 5.2 times faster than the original method
    • This is due to using a map-based lookup (O(1)) instead of linear search (O(n))
    • The global cache with fast paths for common strings dramatically improves performance
  3. Tokenizer Performance:

    • Initial Implementation Challenges:

      • Despite faster string interning, the first implementation was slower
      • Initial issues: map operations overhead, higher allocations (27 vs 9), large memory usage (11,340 B/op vs 36 B/op)
    • Pooled Implementation Benefits:

      • Implementing object pooling brought allocations back to the same level as original (9 allocs/op)
      • Memory usage reduced from 11,340 B/op to 36 B/op
      • Performance is now on par with the original implementation (9,377 ns/op vs 9,316 ns/op)
      • All with the benefits of the faster string interning underneath

Next Steps

Based on these results, we should focus on:

  1. Further Optimizing String Interning:

    • Extend the fast paths to cover more common strings
    • Investigate string partitioning to improve cache locality
    • Consider pre-loading more common HTML and template strings
  2. Tokenization Process Optimization:

    • Implement specialization for different token types
    • Optimize tag detection with faster algorithms
    • Consider block tag-specific optimizations
  3. Proceed to Phase 2:

    • Move forward with the "Optimized String Lookup During Tokenization" phase
    • Focus on improving tokenization algorithms now that interning is optimized
    • Implement buffer pooling for internal token handling

Conclusion

The global string interning optimization has been successful, showing a 5.2x performance improvement in isolation. With the addition of object pooling, we've successfully maintained the memory efficiency of the original implementation while gaining the benefits of faster string interning.

The implementation achieves our goals for Phase 1:

  1. Creating a centralized global string cache with pre-loaded common strings
  2. Implementing mutex-protected access with fast paths
  3. Ensuring zero allocations for common strings
  4. Length-based optimization to prevent memory bloat
  5. Object pooling to avoid allocation overhead

The next phase will focus on improving the tokenization process itself to leverage our optimized string interning system more effectively.