mirror of https://github.com/semihalev/twig.git synced 2026-03-14 13:55:46 +01:00

semihalev 4d0e37e1a0 Implement Zero Allocation Plan Phase 1: Global String Cache Optimization

- Added global string cache for efficient string interning (5.2x faster)
- Implemented optimized tokenizer with object pooling
- Created comprehensive benchmarks and documentation
- Cleaned up old optimization files and experiments

Performance improvements:
- String interning: 5.2x faster (1,492 ns/op vs 7,746 ns/op)
- Zero allocations for common strings
- Same memory efficiency as original (36 B/op, 9 allocs/op)

2025-03-12 10:05:23 +03:00

3.8 KiB

Raw Blame History

String Interning Optimization Benchmark Results

Overview

This document presents the benchmark results for Phase 1 of the Zero Allocation Plan: Global String Cache Optimization.

String Interning Benchmarks

Individual String Interning Performance

Benchmark	Operations/sec	ns/op	B/op	allocs/op
BenchmarkIntern_Common	165,962,065	7.092	0	0
BenchmarkIntern_Uncommon	22,551,727	53.14	24	1
BenchmarkIntern_Long	562,113,764	2.138	0	0

String Interning Comparison

Benchmark	Operations/sec	ns/op	B/op	allocs/op
OriginalGetStringConstant	154,611	7,746	0	0
GlobalIntern	813,786	1,492	0	0

The global string interning is about 5.2 times faster than the original method.

Tokenizer Benchmarks

Benchmark	Operations/sec	ns/op	B/op	allocs/op
OriginalTokenizer	128,847	9,316	36	9
OptimizedTokenizer (Initial)	119,088	10,209	11,340	27
OptimizedTokenizer (Pooled)	128,768	9,377	36	9

Analysis

String Interning Efficiency:
- For common strings, the interning is very efficient with zero allocations
- For uncommon strings, there's only one allocation per operation
- For long strings (>64 bytes), we avoid interning altogether to prevent memory bloat
Global String Cache Performance:
- Our new Intern function is 5.2 times faster than the original method
- This is due to using a map-based lookup (O(1)) instead of linear search (O(n))
- The global cache with fast paths for common strings dramatically improves performance
Tokenizer Performance:
- Initial Implementation Challenges:
  - Despite faster string interning, the first implementation was slower
  - Initial issues: map operations overhead, higher allocations (27 vs 9), large memory usage (11,340 B/op vs 36 B/op)
- Pooled Implementation Benefits:
  - Implementing object pooling brought allocations back to the same level as original (9 allocs/op)
  - Memory usage reduced from 11,340 B/op to 36 B/op
  - Performance is now on par with the original implementation (9,377 ns/op vs 9,316 ns/op)
  - All with the benefits of the faster string interning underneath

Next Steps

Based on these results, we should focus on:

Further Optimizing String Interning:
- Extend the fast paths to cover more common strings
- Investigate string partitioning to improve cache locality
- Consider pre-loading more common HTML and template strings
Tokenization Process Optimization:
- Implement specialization for different token types
- Optimize tag detection with faster algorithms
- Consider block tag-specific optimizations
Proceed to Phase 2:
- Move forward with the "Optimized String Lookup During Tokenization" phase
- Focus on improving tokenization algorithms now that interning is optimized
- Implement buffer pooling for internal token handling

Conclusion

The global string interning optimization has been successful, showing a 5.2x performance improvement in isolation. With the addition of object pooling, we've successfully maintained the memory efficiency of the original implementation while gaining the benefits of faster string interning.

The implementation achieves our goals for Phase 1:

✅ Creating a centralized global string cache with pre-loaded common strings
✅ Implementing mutex-protected access with fast paths
✅ Ensuring zero allocations for common strings
✅ Length-based optimization to prevent memory bloat
✅ Object pooling to avoid allocation overhead

The next phase will focus on improving the tokenization process itself to leverage our optimized string interning system more effectively.

3.8 KiB Raw Blame History