- Added global string cache for efficient string interning (5.2x faster) - Implemented optimized tokenizer with object pooling - Created comprehensive benchmarks and documentation - Cleaned up old optimization files and experiments Performance improvements: - String interning: 5.2x faster (1,492 ns/op vs 7,746 ns/op) - Zero allocations for common strings - Same memory efficiency as original (36 B/op, 9 allocs/op)
3.8 KiB
String Interning Optimization Benchmark Results
Overview
This document presents the benchmark results for Phase 1 of the Zero Allocation Plan: Global String Cache Optimization.
String Interning Benchmarks
Individual String Interning Performance
| Benchmark | Operations/sec | ns/op | B/op | allocs/op |
|---|---|---|---|---|
| BenchmarkIntern_Common | 165,962,065 | 7.092 | 0 | 0 |
| BenchmarkIntern_Uncommon | 22,551,727 | 53.14 | 24 | 1 |
| BenchmarkIntern_Long | 562,113,764 | 2.138 | 0 | 0 |
String Interning Comparison
| Benchmark | Operations/sec | ns/op | B/op | allocs/op |
|---|---|---|---|---|
| OriginalGetStringConstant | 154,611 | 7,746 | 0 | 0 |
| GlobalIntern | 813,786 | 1,492 | 0 | 0 |
The global string interning is about 5.2 times faster than the original method.
Tokenizer Benchmarks
| Benchmark | Operations/sec | ns/op | B/op | allocs/op |
|---|---|---|---|---|
| OriginalTokenizer | 128,847 | 9,316 | 36 | 9 |
| OptimizedTokenizer (Initial) | 119,088 | 10,209 | 11,340 | 27 |
| OptimizedTokenizer (Pooled) | 128,768 | 9,377 | 36 | 9 |
Analysis
-
String Interning Efficiency:
- For common strings, the interning is very efficient with zero allocations
- For uncommon strings, there's only one allocation per operation
- For long strings (>64 bytes), we avoid interning altogether to prevent memory bloat
-
Global String Cache Performance:
- Our new
Internfunction is 5.2 times faster than the original method - This is due to using a map-based lookup (O(1)) instead of linear search (O(n))
- The global cache with fast paths for common strings dramatically improves performance
- Our new
-
Tokenizer Performance:
-
Initial Implementation Challenges:
- Despite faster string interning, the first implementation was slower
- Initial issues: map operations overhead, higher allocations (27 vs 9), large memory usage (11,340 B/op vs 36 B/op)
-
Pooled Implementation Benefits:
- Implementing object pooling brought allocations back to the same level as original (9 allocs/op)
- Memory usage reduced from 11,340 B/op to 36 B/op
- Performance is now on par with the original implementation (9,377 ns/op vs 9,316 ns/op)
- All with the benefits of the faster string interning underneath
-
Next Steps
Based on these results, we should focus on:
-
Further Optimizing String Interning:
- Extend the fast paths to cover more common strings
- Investigate string partitioning to improve cache locality
- Consider pre-loading more common HTML and template strings
-
Tokenization Process Optimization:
- Implement specialization for different token types
- Optimize tag detection with faster algorithms
- Consider block tag-specific optimizations
-
Proceed to Phase 2:
- Move forward with the "Optimized String Lookup During Tokenization" phase
- Focus on improving tokenization algorithms now that interning is optimized
- Implement buffer pooling for internal token handling
Conclusion
The global string interning optimization has been successful, showing a 5.2x performance improvement in isolation. With the addition of object pooling, we've successfully maintained the memory efficiency of the original implementation while gaining the benefits of faster string interning.
The implementation achieves our goals for Phase 1:
- ✅ Creating a centralized global string cache with pre-loaded common strings
- ✅ Implementing mutex-protected access with fast paths
- ✅ Ensuring zero allocations for common strings
- ✅ Length-based optimization to prevent memory bloat
- ✅ Object pooling to avoid allocation overhead
The next phase will focus on improving the tokenization process itself to leverage our optimized string interning system more effectively.