docs: how it works, networking with state images and profiling
This commit is contained in:
parent
d1cf93e2ed
commit
17a6b3b4e9
10
Readme.md
10
Readme.md
|
@ -49,6 +49,16 @@ list of emulated hardware:
|
|||
[KolibriOS](https://copy.sh/v86/?profile=kolibrios) —
|
||||
[QNX](https://copy.sh/v86/?profile=qnx)
|
||||
|
||||
## Docs
|
||||
|
||||
[How it works](docs/how-it-works.md) —
|
||||
[Networking](docs/networking.md) —
|
||||
[Archlinux guest setup](docs/archlinux.md) —
|
||||
[Windows 2000/XP guest setup](docs/windows-xp.md) —
|
||||
[9p filesystem](docs/filesystem.md) —
|
||||
[Linux rootfs on 9p](docs/linux-9p-image.md) —
|
||||
[Profiling](docs/profiling.md)
|
||||
|
||||
## Compatibility
|
||||
|
||||
Here's an overview of the operating systems supported in v86:
|
||||
|
|
|
@ -1,9 +1,7 @@
|
|||
A 9p filesystem is supported by the emulator, using a virtio transport. Using
|
||||
it, files can be exchanged with the guest OS, see
|
||||
[`create_file`](/src/browser/starter.js#L1179-L1199)
|
||||
and
|
||||
[`read_file`](/src/browser/starter.js#L1209-L1228). It can
|
||||
be enabled by passing the following options to `V86Starter`:
|
||||
it, files can be exchanged with the guest OS, see `create_file` and `read_file`
|
||||
in [`starter.js`](https://github.com/copy/v86/blob/master/src/browser/starter.js).
|
||||
It can be enabled by passing the following options to `V86`:
|
||||
|
||||
```javascript
|
||||
filesystem: {
|
||||
|
|
81
docs/how-it-works.md
Normal file
81
docs/how-it-works.md
Normal file
|
@ -0,0 +1,81 @@
|
|||
Here's an overview of v86's workings. For details, check the
|
||||
[source](https://github.com/copy/v86/tree/master/src).
|
||||
|
||||
The major limitations of WebAssembly are (for the purpose of making emulators with jit):
|
||||
|
||||
- structured control flow (no arbitrary jumps)
|
||||
- no control over registers (you can't keep hardware registers in wasm locals across functions)
|
||||
- no mmap (paging needs to be fully emulated)
|
||||
- no patching
|
||||
- module generation is fairly slow, but at least it's asynchronous, so other things can keep running
|
||||
- there is some memory overhead per module, so you can't generate more than a few thousand
|
||||
|
||||
v86 has an interpreted mode, which collects entry points (targets of function
|
||||
calls and indirect jumps). It also measures the hotness per page, so that
|
||||
compilation is focused on code that is often executed. Once a page is
|
||||
considered hot, code is generated for the entire page and up to `MAX_PAGES`
|
||||
that are directly reachable from it.
|
||||
|
||||
v86 generates a single function with a big switch statement (brtable), to
|
||||
ensure that all functions and targets of indirect jumps are reachable from
|
||||
other modules. The remaining control flow is handled using the "stackifier"
|
||||
algorithm (well-explained in
|
||||
[this blog post](https://medium.com/leaningtech/solving-the-structured-control-flow-problem-once-and-for-all-5123117b1ee2)).
|
||||
At the moment, there is no linking of wasm modules. The current module is
|
||||
exited, and the main loop detects if a new module can be entered.
|
||||
|
||||
In practice, I found that browsers don't handle this structure (deep brtables,
|
||||
with locals being used across the entire function) very well, and `MAX_PAGES`
|
||||
has to be set to fairly low, otherwise memory usage blows up. It's likely that
|
||||
improvements are possible (generating fewer entry points, splitting code across
|
||||
multiple functions).
|
||||
|
||||
Code-generation happens in two passes. The first pass finds all basic block
|
||||
boundaries, the second generates code for each basic block. Instruction
|
||||
decoding is generated by a [set of
|
||||
scripts](https://github.com/copy/v86/tree/master/gen) from a [table of
|
||||
instructions](https://github.com/copy/v86/blob/master/gen/x86_table.js). It's
|
||||
also used to [generate
|
||||
tests](https://github.com/copy/v86/blob/master/tests/nasm/create_tests.js).
|
||||
|
||||
To handle paging, v86 generates code similar to this (see `gen_safe_read`):
|
||||
|
||||
```
|
||||
entry <- tlb[addr >> 12 << 2]
|
||||
if entry & MASK == TLB_VALID && (addr & 0xFFF) <= 0xFFC - bytes: goto fast
|
||||
entry <- safe_read_jit_slow(addr, instruction_pointer)
|
||||
if page_fault: goto exit-with-pagefault
|
||||
fast: mem[(entry & ~0xFFF) ^ addr]
|
||||
```
|
||||
|
||||
There is a 4 MB cache that acts like a tlb. It contains the physical address,
|
||||
read-only bit, whether the page contains code (in order to invalidate it on
|
||||
write) and whether the page points to mmio. Any of those cases are handled in
|
||||
the slow path (`safe_read_jit_slow`), as well as walking the page tables and
|
||||
triggering page faults. The fast path is taken in the vast majority of times.
|
||||
|
||||
The remaining code generation is mostly a straight-forward, 1-to-1 translation
|
||||
of x86 to wasm. The only analysis done is to optimise generation of condional
|
||||
jumps immediately after arithmetic instructions, e.g.:
|
||||
|
||||
```
|
||||
cmp eax, 52
|
||||
setb eax
|
||||
```
|
||||
|
||||
becomes:
|
||||
|
||||
```
|
||||
... // code for cmp
|
||||
eax <- eax < 52
|
||||
```
|
||||
|
||||
A lazy flag mechanism is used to speed arithmetic (applies to both jit and
|
||||
interpreted mode, see
|
||||
[`arith.rs`](https://github.com/copy/v86/blob/master/src/rust/cpu/arith.rs) and
|
||||
[`misc_instr.rs`](https://github.com/copy/v86/blob/master/src/rust/cpu/misc_instr.rs)).
|
||||
There's a wip that tries to elide most lazy-flags updates:
|
||||
https://github.com/copy/v86/pull/466
|
||||
|
||||
FPU instructions are emulated using softfloat (very slow, but unfortunately
|
||||
some code relies on 80 bit floats).
|
|
@ -1,7 +1,11 @@
|
|||
# v86 networking
|
||||
|
||||
Emulating a network card is supported. It can be used by passing the
|
||||
`network_relay_url` option to `V86Starter`. The url must point to a running
|
||||
`network_relay_url` option to `V86`. The url must point to a running
|
||||
WebSockets Proxy. The source code for WebSockets Proxy can be found at
|
||||
https://github.com/benjamincburns/websockproxy.
|
||||
[benjamincburns/websockproxy](https://github.com/benjamincburns/websockproxy).
|
||||
An alternative, Node-based implementation is
|
||||
[krishenriksen/node-relay](https://github.com/krishenriksen/node-relay).
|
||||
|
||||
The network card could also be controlled programatically, but this is
|
||||
currently not exposed.
|
||||
|
@ -13,3 +17,31 @@ browser-compatible `WebSocket` constructor being present in the global scope.
|
|||
throttling built-in by default which will degrade the networking.
|
||||
`bellenottelling/websockproxy`docker image has this throttling removed via
|
||||
[websockproxy/issues/4#issuecomment-317255890](https://github.com/benjamincburns/websockproxy/issues/4#issuecomment-317255890).
|
||||
|
||||
### Interaction with state images
|
||||
|
||||
When using state images, v86 randomises the MAC address after the state has
|
||||
been loaded, so that multiple VMs don't receive the same address. However, the
|
||||
guest OS is not aware that the MAC address has changed, which prevents it from
|
||||
sending and receiving packets correctly. There are several workarounds:
|
||||
|
||||
- Unload the network driver before saving the state. On Linux, unloading can be
|
||||
done using `rmmod ne2k-pci` or `echo 0000:00:05.0 >
|
||||
/sys/bus/pci/drivers/ne2k-pci/unbind` and loading (after the state has been
|
||||
loaded) using `modprobe ne2k-pci` or `echo 0000:00:05.0 >
|
||||
/sys/bus/pci/drivers/ne2k-pci/bind`
|
||||
- Pass `preserve_mac_from_state_image: true` to the V86 constructor. This
|
||||
causes MAC addresses to be shared between all VMs with the same state image.
|
||||
- Pass `mac_address_translation: true` to the V86 constructor. This causes v86
|
||||
to present the old MAC address to the guest OS, but translate it to a
|
||||
randomised MAC address in outgoing packets (and vice-versa for incoming
|
||||
packets). This mechanism currently only supports the ethernet, ipv4, dhcp and
|
||||
arp protcols. See `translate_mac_address` in
|
||||
[`src/ne2k.js`](https://github.com/copy/v86/blob/master/src/ne2k.js). This is
|
||||
currently used in Windows, ReactOS and SerenityOS profiles.
|
||||
- Some OSes don't cache the MAC address when the driver loads and therefore
|
||||
don't need any of the above workarounds. This seems to be the case for Haiku,
|
||||
OpenBSD and FreeBSD.
|
||||
|
||||
Note that the same applies to IP addresses, so a dhcp client should only be run
|
||||
after the state has been loaded.
|
||||
|
|
7
docs/profiling.md
Normal file
7
docs/profiling.md
Normal file
|
@ -0,0 +1,7 @@
|
|||
v86 has a built-in profiler, which instruments generated code to count certain
|
||||
events and types of instructions. It can be used by building with `make
|
||||
debug-with-profiler` and opening debug.html.
|
||||
|
||||
For debugging networking, packet logging is available in the UI in both debug
|
||||
and release builds. The resulting `traffic.hex` file can be loaded in Wireshark
|
||||
using file -> import from hex -> tick direction indication, timestamp %s.%f.
|
Loading…
Reference in a new issue