This commit prevents creation of entry points for jumps within the same
page. In interpreted mode, execution is continued on these kinds of
jumps.
Since this prevents the old hotness detection from working efficiently,
hotness detection has also been changed to work based on instruction
counters, and is such more precise (longer basic blocks are compiled
earlier).
This also breaks the old detection loop safety mechanism and causes
Linux to sometimes loop forever on "calibrating delay loop", so
JIT_ALWAYS_USE_LOOP_SAFETY has been set to 1.
Although hardlinks across filesystems worked quite well in the current
implementation, we will run into problems when we try to implement
different backends for each sub-filesystem.
Note: EPERM is used instead of EXDEV since the mv command will silently
try to use copy-and-unlink when rename(2) fails with EXDEV.
The rules for linking has been reverted back:
- Before commit: Any inode, including forwarders, could be linked as
long as the parent is not a forwarder and is a directory.
- After commit: Only non-forwarders and root-forwarders are allowed to
be linked, and must be linked under a directory and not a forwarder.
- Test both file and directory.
- Test combinations of forwarder/non-forwarder source/destinations
- Test different destination filenames
- Test empty file
Can't test early failure behaviour when destination can't be unlinked,
since right now only non-empty directories and root directories can't be
unlinked and busybox's mv program does not support mv
--no-target-directory.
If mountpoint already exists, then we're silently making its children
inaccessible which may not be what we expected/intended.
Create a new forwarder inode upon mounting.
The testing "framework" code is slowly turning into spaghetti due to the
asynchronous nature of the triggers. Using async functions will help
clarify the program flow if we think we should address this issue.
- adds alloc_local and free_local
- slightly better wasmgen tests
- caller of gen_safe_write32 is responsible for allocating
and freeing locals for address and value
- updates expect-tests
The following files and functions were ported:
- jit.c
- codegen.c
- _jit functions in instructions*.c and misc_instr.c
- generate_{analyzer,jit}.js (produces Rust code)
- jit_* from cpu.c
And the following data structures:
- hot_code_addresses
- wasm_table_index_free_list
- entry_points
- jit_cache_array
- page_first_jit_cache_entry
Other miscellaneous changes:
- Page is an abstract type
- Addresses, locals and bitflags are unsigned
- Make the number of entry points a growable type
- Avoid use of global state wherever possible
- Delete string packing
- Make CachedStateFlags abstract
- Make AnalysisType product type
- Make BasicBlockType product type
- Restore opcode assertion
- Set opt-level=2 in debug mode (for test performance)
- Delete JIT_ALWAYS instrumentation (now possible via api)
- Refactor generate_analyzer.js
- Refactor generate_jit.js
I wrongly assumed that libv86 would perform faster than libv86-debug.
Using libv86 caused ttyS0 to timeout before initialising, so there was
not serial terminal to use by the time debian booted.
These tests do not work yet: it fails to capture the serial output
after root login.
In addition, these tests are currently too slow to run
since it takes a while to boot the debian image (> 16 minutes on my
machine while running the test in the terminal.)
Would need to switch to a newer minimal linux image when available.
Stack based passing was needless given that we wanted the values in
local variables anyway and could simply have the caller do it.
We could have a minor optimization and have the caller gen_tee_local for
the address, but I chose not to do that because it seems like a minor
difference that complicates the calling convention
significantly (i.e. both set local variable and push to stack).
When they're stack-allocated, their addresses may change very easily,
even simply through updates in a header file that cpu.c includes.
This will make the expect-tests fail unnecessarily, simply because these
offsets changed.
In the future, we may change how memory handling works, so this may be a
non-issue then, but for now, it's easier to use a fixed offset for these
arrays.
WebAssembly has no "dup" operator (to duplicate the value on the top of
the stack), which means that if a value needs to be used twice, we need
to use a variable. We add 2 general-purpose local scratch variables to
this end.
Note: Users of the variable should be careful about interference with
other users - they may contain junk values, and between functions, the
state of a variable may well change.
- introduce multiple entry points per compiled wasm module, by passing
the initial state to the generated function.
- continue analysing and compiling after instructions that change eip, but
will eventually return to the next instruction, in particular CALLs
(and generate an entry point for the following instruction)
This commit is incomplete in the sense that the container will crash
after some time of execution, as wasm table indices are never freed
The old method understandably triggered the 0x67 address-size override
prefix, which caused the segfault.
(Changed "mov [mem], ax" because that might use op 0xA3.)
This regression test will _NOT_ currently catch regressions in "make
nasmtests-force-jit" due to a pending issue where JIT execution doesn't
follow through the popf instruction, which this and most tests use
through header.inc.
To see the test actually fail, you need:
- To remove the popf instruction
- s/load_u16/load_i32/ in gen_set_reg16_r
These tests confirm that the generated WAST:
1. Optimizes "mov reg, reg" with direct memory access stores, requiring
no function calls
2. Memory loads/stores use the correct data-width (i32.load16_u for
reg16, etc.)
This is a (partial) fix for #8. It manually forces compilation of the
first set of basic blocks in the nasm tests, waits for the blocks to be
compiled and then runs the test. It doesn't make use of the fragile
compile-time machinery to force compilation.
Analysis currently doesn't follow execution after the popf instructions,
which is included in most tests. It has been manually verified that
tests pass when the popf instruction is compiled, and this limitation
will be lifted soon.
Analysis also doesn't follow through call and return instructions, so
nasm tests can't currently test these in their compiled form.
The stdout stream is not line-buffered, which means that occasionally the
chunk/Buffer of data that comes in immediately after we run the commands may be:
~% chmod +x /mnt/test-i386
~% /mnt/test-i386 > /mnt/result
~% echo test fini''shed
test finished
~%
Note how the data ends with the prompt, not in "test finished\n" - this will
cause the test to run indefinitely.
Recently distributions started disabling the CONFIG_MODIFY_LDT_SYSCALL
kernel config, breaking parts of the qemu test. qemu is very suitable to
run these tests in the same configuration as v86 (booting from
linux3.iso, exchanging files using 9p fs and controlling via serial
console).
With ENABLE_JIT_ALWAYS set, it effectively tests all variants (is_32 +
imm/reg/rm) of the instruction.
Stack size variation should also be tested in the future.
See WASM dump for optimized versions here:
https://gist.github.com/AmaanC/c5c5f9d76bef589a7fbb0f4af3b15e64
Since we can only know an instruction's length (and other characteristics) after
its been decoded, this change allows us to first decode the instruction and
generate the instruction function calls to a scratch buffer, which can then be
copied over to the main (cs) buffer as appropriate.
Tests for the scratch buffer were added too.
1. SEGFAULTs may be caused by using MAP_FIXED - it may override an existing
mapping, which may exist for internal implementation helpers.
2. The Linux3 profile seems to allocate physical frames in the order in which
writes to memory mapped virtual addresses happen.
For this commit, for eg:
page0; virtual=0xB7766000 physical=0x12C9000
throwaway; virtual=0xB7765000 physical=0x12C8000
page1; virtual=0xB7767000 physical=0x12CC000
This way we can write from page0 -> page1's virtual addresses and make sure our
paging support works as it should, and that the tests aren't simply passing
because even the underlying physical frames are contiguous.
- Moved all helper functions to coverage.js
- Refactor individual cov_*[func_id] objects to coverage[func_id].*
- Write coverage data to its own directory (./build/coverage/coverage_data*)
- Enable/disable coverage logging in do_many_cycles to account for exceptions
- Better naming
- Minor stylistic refactoring
It looks for files in ./build/cov_data_fn_name_XX where XX=number of conditional
blocks in fn_name, and parses the binary data contained within to find which
blocks are missing.
Reports are generated in ./tests/coverage/build/report_XX where XX is
incremented to allow us to see how the coverage changes over time.
Note:
- The cov_data_* generated needs to be manually deleted if we want to compare 2
individual coverage runs.
- We currently ignore the 0th block (LLVM marks it as visited when a function is
entered).
restore memcpy comment
delete all the things!
fix jshint issues
restore memcpy comment
remove duplicate fxsave assignment
Count cache drops
Use already available physical address instead of calling read_imm8
Remove useless assertion
Just move around to reduce later diff
Run jit paging test with assertions enabled
Run jit-paging test on CI
Extend jit-paging test
Fix deleting invalidated code across memory pages
Add jit-paging test to gitlab ci
Remove jit_in_progress
Clean up old comments, use bool for jit_jump
Fix state image not begin garbage collected
Add ENABLE_PROFILER_TIMES to configure slow profiling times
Move to jit_generate and jit_run_interpreted to separate function
Add missing struct field
Fix: Don't write jit cache entry until no more faults can happen
Download image for jit paging test
Add missing initialiser
Mark jit_{generate,run_interpreted} as static
Specify full path to profiler.h
Clean up duplicate/missing declaration after rebase
mmap error handling, line length and fix some warnings
remove further unused code
move js imports to single header file
The gen_fixtures.js currently generates fixtures for all
host_executables (*.bin) files, regardless of whether they need to be
regenerated or not.
The script speeds the process up by:
- Listing all files that need fixtures
- Finding nr_of_cpus (to parallelize execution)
- Dividing them up to roughly even "chunks" based on nr_of_cpus
- Spawning N gdb processes, which iteratively call a user-defined command named
"extract-state" (see gdb-extract-def); the command automatically loads the new
file and extracts its state into a fixture file
Rough benchmarks for 3725 tests:
Old method (after `rm build/*.fixture`):
make -j4 325.23s user 40.28s system 206% cpu 2:57.38 total
make -j4 342.65s user 43.73s system 289% cpu 2:13.23 total
make -j4 435.69s user 55.61s system 289% cpu 2:49.63 total
New method:
node gen_fixtures.js 11.85s user 4.86s system 140% cpu 11.890 total
node gen_fixtures.js 12.48s user 5.48s system 114% cpu 15.736 total
node gen_fixtures.js 14.54s user 6.33s system 141% cpu 14.769 total
2e469796 Minor
fab422ef Improve generation of 0f instructions
08ad7fe9 Improved if-else generation
3f81014d Minor: Align test output
4a3a84ef Generate modrm tests
61aa1875 Simplify
a6e47954 Generate decoding of immediate operands
435b2c10 Fix warnings
e4933042 Add missing immediate operand
3f3810c7 Generate immediate operands for instructions with modrm byte
a0aa7b1f Make memory layout in nasm tests clearer
6b8ef212 Remove 'g' property from instruction table (implied by 'e')
bf15c58c Remove unused declarations
1e543035 Remove useless `| 0` and `>>> 0` javascriptisms
1ccc5d53 Fix headers
8b40c532 Update qemu tests with changes from qemu
ec9b0fb5 Port xchg instructions to C
c73613e7 Port virt_boundary_* to C
d61d1241 Add headers
fd19f22c Make written value in write8 and write16 int32_t
497dcaec Generate read_imm for instructions with a modrm byte
8b7003d6 Generate read_imm8s
0cc75498 Remove read_op
9d716086 Trigger unimplemented_sse for partial sse instructions with prefix
8d5edd03 Remove unimplmented sse c-to-js hack
585d3565 Remove | 0
308124b2 Use int32_t as return value
f193f8e1 Use JS version of cvttsd2si for now
12747b97 Generate trigger_ud for missing modrm branches
770f674e Split 0f00 and 0f01 into multiple instructions depending on modrm bits
1cb372a3 Generate decoder for some 0f-prefix instructions
cec7bc63 Disable unused parameter warnings in instruction functions
807665b1 Generate read_imm for 0f/jmpcc
cdf6eccc Generate modrm decoding for shld
04528429 Create temporary files in /tmp/, not cwd
d8f3fbd8 Generate modrm/imm decoding for shld
00ef0942 Generate modrm decoding for bts
f531984b Generate modrm decoding for shrd and imul
07569c53 Generate modrm decoding cmpxchg
535ff190 Generate modrm decoding for lfs/lgs/lss
2f8ced8d Generate modrm decoding for btr and btc
95de6c66 Generate modrm decoding for movzx
c4d07e7e Generate modrm decoding for bsf and bsr
f0985d26 Generate modrm decoding for movsx
4b30937a Generate modrm decoding for xadd
a422eb27 Generate modrm decoding for movnti
e5501d3c Generate modrm decoding for mov to/from control/debug registers
bce11ec5 Generate modrm decoding for lar/lsl
5729a23c Fix access to DR4 and DR5 when cr4.DE is clear
44269a81 Specify immediate size explicitly instead of inferring it
82b2867a Fix STR instruction
98a9cc89 Log failing assertion
6d2f9964 Fix rdtsc
00260694 Log GP exceptions
7916883d Port trigger_ud and trigger_nm to C
36fedae9 Remove unused code
e08fabd0 Generate modrm decoding for 0f00 and 0f01
8ae8174d Generate modrm decoding for 0fae and 0fc7 (fxsave, cmpxchg8, etc.)
26168164 Generate modrm+immediate decoding for 0fba (bit test with immediate operand)
6adf7fa7 Simplify create_tests.js (unused prefix call)
c77cbdd8 Add comments about the implementation of pop [addr]
4640b4fe Simplify prefix call
a81a5497 Don't use var
3ca5d13d Separate call name and arguments in code generator
3191a543 Simplify other prefix call (8D/lea)
5185080e Update generated code (stylistic changes and #ud generation)
93b51d41 Remove unused wasm externals
e4af0a7f Avoid hardcoding special cases in code generator (lea, pop r/m)
654a2b49 Avoid hardcoding special cases in code generator (enter/jumpf/callf)
fd1a1e86 Commit generated code (only stylistic changes)
7310fd1a Simplify code generator by merging code for with and without 0f prefix
e7eae4af Simplify code generator by merging code for immediate operands
00fafd8a Improve assertions
db084e49 Simplify code generator (modrm if-else)
0a0e4c9e Improve code generation of switch-case
ce292795 Clarify some comments
37cf33fa Generate code in if/else blocks
cbcc33fc Document naming scheme
e30b97eb Generate modrm decoding for 0f12 (sse) instruction
24b72c2f movlpd cannot be used for register-to-register moves
72d72995 Generate modrm decoding for 0f13 (sse) instruction and disable register-to-register moving
75d76fbb Generate modrm decoding for 0f14 (sse) instruction
ac8965a7 Generate modrm decoding for 0f28-0f2b (movap, movntp)
e919d33e Generate modrm decoding for cvttsd2si
5f2ca2b4 Generate modrm decoding for andp and xorp
c8d1c6de Generate modrm decoding for 0f60-0f70 (sse instructions)
ae4ed46d Add multi-byte nop and prefetch to nasm test, generate modrm decoding
718a1acf Print qemu test error message more useful
d1ecc37e Generate modrm decoding for 0f70-0f80 (sse instructions)
6a7219a5 Generate modrm decoding for popcnt
25278217 Generate modrm decoding for 0f71-0f73 (sse shift with immediate byte)
ed1ec81b Generate modrm decoding for the remaining sse instructions (0fc0-0fff)
42bc5a6f Use 64-bit multiplication for native code
dda3fb39 Remove old modrm-decoding functions
717975ef Move register access functions to cpu.c
aee8138f Remove read_op, read_sib, read_op0F, read_disp
f31317f2 Rename xmm/mmx register access functions
a525e70b Remove 32-bit access to reg_xmm and reg_mmx
c803eabc Rename s8/s16/s32 to i8/i16/i32
9fbd2ddf Don't use uninitialised structs
942eb4f7 Use 64-bit load for mmx registers and assert reg64 and reg128 size
f94ec612 Use 64-bit writes for write_xmm64
08022de9 Use more efficient method for some 128-bit stores
9d5b084c Make timestamp counter unsigned
2ef388b3 Pass 64-bit value to safe_write64
4cb2b1be Optimise safe_write64 and safe_write128
b0ab09fb Implement psllq (660ff3)
9935e5d4 Optimise safe_read64s and safe_read128s
af9ea1cc Log cl in cpuid only if relevant
be5fe23e Add multi-op cache (disabled by default through ENABLE_JIT macro) and JIT paging test (similar to QEMU test).
aa2f286e Don't initialise group_dirtiness with 1 as it increases the binary size significantly
b8e14ed9 Remove unused reg_xmm32s
bc726e03 Implement dbg_log for native code with format characters 'd' and 'x'
454039d6 Fix store task register
63a4b349 Remove unnecessary parens and clean up some log statements
4cc96814 Add logop and dbg_trace imports
7940655d Only inhibit interrupts if the interrupt flag was 0 in STI
876c68a7 Split create_tests into create_tests and generate_interpreter
aa82499f Move detection of string instructions to x86_table
f3840ec2 Move C ast to separate file
90400703 Skip tests for lfence/mfence/sfence, clarify their encoding
4a9d8204 elf: Hide log messages when log level is zero
a601c526 Allow setting log level via settings
8a624453 Add cpu_exception_hook to debug builds
f9e335bf Nasm: Test exceptions
599ad088 logop: Format instruction pointer as unsigned
f95cf22b Don't skip zero dividing tests
2a655a0e Remove get_seg_prefix_ds from read_moffs (preparation for calling read_moffs from the code generator)
bc580b71 Remove obsolete comment
e556cee0 Fix nasmtest dependencies in makefile and clean
dcb1e72b Use all cores on travis
86efa737 Replace all instances of u32 & 0xFFFF with the respective u16 accesses
98b9f439 Use u8 instead of bit-shifts and masks from u32
b43f6569 Replace all instances of u32 >> 16 with the respective u16 accesses
9bfa72c7 Remove unnecessary parens
9cf93734 Clean up remaining instance of u32 with a mask instead of u16
22d4117f Correct order of writes in virt_boundary_write32
6734c7c1 Fix keyboard on ios, fixes#105
858a4506 Add missing file, c_ast.js
1d62e39e Move instruction naming scheme into function
f4816852 Reorder some code
69d49788 Minor improvements
0493e05f Add util.js
af9000c1 Improve full test
e5feba31 Add missing export
c7c42065 Replace prefix_call with custom_resolve_modrm
3186e6ad Add support for "%%" format string to dbg_log_wasm for printf import
efe54fad Add barebones instrumentation profiler (disabled by default).
c9f0d462 Implement movlps m64, xmm and enable its test
42869a12 Add tests for cross-page reads/writes confirmed with byte reads/writes
d68976ea Mask word values in port byte reads
9758d51e Add PS2_LOG_VERBOSE
5f52f037 Update NASM Makefile to include all dependencies to prevent unnecessary recompilation
2c71f927 Have NASM test generator use a seedable PRNG to allow for faster incremental tests
e4aa45bb Add chunk{16,32}_rw paging tests; instructions that read and write to memory
bdf538a2 add codegen to cpu constructor
aa76ce8e add resolve_modrm16
14d7ecf1 refactor codegen
b710319f [rebased] Merge branch codegen
0565ea42 minor refactoring
071dff3f temporary fix for automatic cast warnings
57c504f2 fix modrm16 issue
c2db5d9e jit modrm32
85c04245 reinstate modrm_fn0 and modrm_fn1
be65dafd add ip and previous ip manipulating functions
ae00ef89 update codegen js interface
530a74fa squashed commit for refactor
2c692199 add codegen-test to build
c15afe68 prefix gen to codegen api
c9611533 codegen tests fixes