This changes registers to be temporarily stored in wasm locals, across
each complete wasm module. Registers are moved from memory to locals
upon entering the wasm module and moved from locals to memory upon
leaving. Additionally, calls to functions that modify registers are
wrapped between moving registers to memory before and moving back to
locals after. This affects:
1. All non-custom instructions
2. safe_{read,write}_slow, since it may page fault (the slow path of all memory accesses)
3. task_switch_test* and trigger_ud
4. All block boundaries
5. The fallback functions of gen_safe_read_write (read-modify-write memory accesses)
The performance benefits are currently mostly eaten up by 1. and 4. (if
one calculates the total number of read/writes to registers in memory,
they are higher after this patch, as each instructions of typ 1. or 4.
requires moving all 8 register twice). This can be improved later by the
relatively mechanical work of making instructions custom (not
necessarily full code generation, only the part of the instruction where
registers are accessed). Multi-page wasm module generation will
significantly reduce the number of type 4. instructions.
Due to 2., the overall code size has significantly increased. This case
(the slow path of memory access) is often generated but rarely executed.
These moves can be removed in a later patch by a different scheme for
safe_{read,write}_slow, which has been left out of this patch for
simplicity of reviewing.
This also simplifies our code generation for storing registers, as
instructions_body.const_i32(register_offset);
// some computations ...
instruction_body.store_i32();
turns into:
// some computations ...
write_register(register_index);
I.e., a prefix is not necessary anymore as locals are indexed directly.
Further patches will allow getting rid of some temporary locals, as
registers now can be used directly.
Now that WASM_TABLE_SIZE may be capped, we set it slightly below the
limit under which chromium crashes: https://bugs.chromium.org/p/v8/issues/detail?id=8427
JIT_THRESHOLD is also reduced due to two reasons:
- With the lower WASM_TABLE_SIZE, we want to avoid compiling too many
modules
- It has occasionally been observed that under node, the engine's wasm
compiler can't catch up with the number of modules we produce, thus
resulting in 100s of pending compiled modules. This most likely
happens only under node as we don't render the screen and
the main loop (based on setImmediate) is faster.
The new value doesn't seem to exhibit this problem, but we may want to
increase the threshold further if the problem appears again
Also fix jit_empty_cache when callbacks are pending (fixes#53)
This is also a preparation for setting WASM_TABLE_SIZE to a low value to
work around memory limitations in browsers.
This commit prevents creation of entry points for jumps within the same
page. In interpreted mode, execution is continued on these kinds of
jumps.
Since this prevents the old hotness detection from working efficiently,
hotness detection has also been changed to work based on instruction
counters, and is such more precise (longer basic blocks are compiled
earlier).
This also breaks the old detection loop safety mechanism and causes
Linux to sometimes loop forever on "calibrating delay loop", so
JIT_ALWAYS_USE_LOOP_SAFETY has been set to 1.
Makes the following a block boundary:
- push
- Any non-custom instruction that uses modrm encoding
- Any sse/fpu instruction
This commit affects performance negatively. In order to fix this, the
above instructions need to be implemented using custom code generators
for the memory access.