docs: how it works, networking with state images and profiling

This commit is contained in:
Fabian 2023-01-06 16:13:51 +01:00
parent d1cf93e2ed
commit 17a6b3b4e9
5 changed files with 135 additions and 7 deletions

View file

@ -49,6 +49,16 @@ list of emulated hardware:
[KolibriOS](https://copy.sh/v86/?profile=kolibrios) —
[QNX](https://copy.sh/v86/?profile=qnx)
## Docs
[How it works](docs/how-it-works.md) —
[Networking](docs/networking.md) —
[Archlinux guest setup](docs/archlinux.md) —
[Windows 2000/XP guest setup](docs/windows-xp.md) —
[9p filesystem](docs/filesystem.md) —
[Linux rootfs on 9p](docs/linux-9p-image.md) —
[Profiling](docs/profiling.md)
## Compatibility
Here's an overview of the operating systems supported in v86:

View file

@ -1,9 +1,7 @@
A 9p filesystem is supported by the emulator, using a virtio transport. Using
it, files can be exchanged with the guest OS, see
[`create_file`](/src/browser/starter.js#L1179-L1199)
and
[`read_file`](/src/browser/starter.js#L1209-L1228). It can
be enabled by passing the following options to `V86Starter`:
it, files can be exchanged with the guest OS, see `create_file` and `read_file`
in [`starter.js`](https://github.com/copy/v86/blob/master/src/browser/starter.js).
It can be enabled by passing the following options to `V86`:
```javascript
filesystem: {

81
docs/how-it-works.md Normal file
View file

@ -0,0 +1,81 @@
Here's an overview of v86's workings. For details, check the
[source](https://github.com/copy/v86/tree/master/src).
The major limitations of WebAssembly are (for the purpose of making emulators with jit):
- structured control flow (no arbitrary jumps)
- no control over registers (you can't keep hardware registers in wasm locals across functions)
- no mmap (paging needs to be fully emulated)
- no patching
- module generation is fairly slow, but at least it's asynchronous, so other things can keep running
- there is some memory overhead per module, so you can't generate more than a few thousand
v86 has an interpreted mode, which collects entry points (targets of function
calls and indirect jumps). It also measures the hotness per page, so that
compilation is focused on code that is often executed. Once a page is
considered hot, code is generated for the entire page and up to `MAX_PAGES`
that are directly reachable from it.
v86 generates a single function with a big switch statement (brtable), to
ensure that all functions and targets of indirect jumps are reachable from
other modules. The remaining control flow is handled using the "stackifier"
algorithm (well-explained in
[this blog post](https://medium.com/leaningtech/solving-the-structured-control-flow-problem-once-and-for-all-5123117b1ee2)).
At the moment, there is no linking of wasm modules. The current module is
exited, and the main loop detects if a new module can be entered.
In practice, I found that browsers don't handle this structure (deep brtables,
with locals being used across the entire function) very well, and `MAX_PAGES`
has to be set to fairly low, otherwise memory usage blows up. It's likely that
improvements are possible (generating fewer entry points, splitting code across
multiple functions).
Code-generation happens in two passes. The first pass finds all basic block
boundaries, the second generates code for each basic block. Instruction
decoding is generated by a [set of
scripts](https://github.com/copy/v86/tree/master/gen) from a [table of
instructions](https://github.com/copy/v86/blob/master/gen/x86_table.js). It's
also used to [generate
tests](https://github.com/copy/v86/blob/master/tests/nasm/create_tests.js).
To handle paging, v86 generates code similar to this (see `gen_safe_read`):
```
entry <- tlb[addr >> 12 << 2]
if entry & MASK == TLB_VALID && (addr & 0xFFF) <= 0xFFC - bytes: goto fast
entry <- safe_read_jit_slow(addr, instruction_pointer)
if page_fault: goto exit-with-pagefault
fast: mem[(entry & ~0xFFF) ^ addr]
```
There is a 4 MB cache that acts like a tlb. It contains the physical address,
read-only bit, whether the page contains code (in order to invalidate it on
write) and whether the page points to mmio. Any of those cases are handled in
the slow path (`safe_read_jit_slow`), as well as walking the page tables and
triggering page faults. The fast path is taken in the vast majority of times.
The remaining code generation is mostly a straight-forward, 1-to-1 translation
of x86 to wasm. The only analysis done is to optimise generation of condional
jumps immediately after arithmetic instructions, e.g.:
```
cmp eax, 52
setb eax
```
becomes:
```
... // code for cmp
eax <- eax < 52
```
A lazy flag mechanism is used to speed arithmetic (applies to both jit and
interpreted mode, see
[`arith.rs`](https://github.com/copy/v86/blob/master/src/rust/cpu/arith.rs) and
[`misc_instr.rs`](https://github.com/copy/v86/blob/master/src/rust/cpu/misc_instr.rs)).
There's a wip that tries to elide most lazy-flags updates:
https://github.com/copy/v86/pull/466
FPU instructions are emulated using softfloat (very slow, but unfortunately
some code relies on 80 bit floats).

View file

@ -1,7 +1,11 @@
# v86 networking
Emulating a network card is supported. It can be used by passing the
`network_relay_url` option to `V86Starter`. The url must point to a running
`network_relay_url` option to `V86`. The url must point to a running
WebSockets Proxy. The source code for WebSockets Proxy can be found at
https://github.com/benjamincburns/websockproxy.
[benjamincburns/websockproxy](https://github.com/benjamincburns/websockproxy).
An alternative, Node-based implementation is
[krishenriksen/node-relay](https://github.com/krishenriksen/node-relay).
The network card could also be controlled programatically, but this is
currently not exposed.
@ -13,3 +17,31 @@ browser-compatible `WebSocket` constructor being present in the global scope.
throttling built-in by default which will degrade the networking.
`bellenottelling/websockproxy`docker image has this throttling removed via
[websockproxy/issues/4#issuecomment-317255890](https://github.com/benjamincburns/websockproxy/issues/4#issuecomment-317255890).
### Interaction with state images
When using state images, v86 randomises the MAC address after the state has
been loaded, so that multiple VMs don't receive the same address. However, the
guest OS is not aware that the MAC address has changed, which prevents it from
sending and receiving packets correctly. There are several workarounds:
- Unload the network driver before saving the state. On Linux, unloading can be
done using `rmmod ne2k-pci` or `echo 0000:00:05.0 >
/sys/bus/pci/drivers/ne2k-pci/unbind` and loading (after the state has been
loaded) using `modprobe ne2k-pci` or `echo 0000:00:05.0 >
/sys/bus/pci/drivers/ne2k-pci/bind`
- Pass `preserve_mac_from_state_image: true` to the V86 constructor. This
causes MAC addresses to be shared between all VMs with the same state image.
- Pass `mac_address_translation: true` to the V86 constructor. This causes v86
to present the old MAC address to the guest OS, but translate it to a
randomised MAC address in outgoing packets (and vice-versa for incoming
packets). This mechanism currently only supports the ethernet, ipv4, dhcp and
arp protcols. See `translate_mac_address` in
[`src/ne2k.js`](https://github.com/copy/v86/blob/master/src/ne2k.js). This is
currently used in Windows, ReactOS and SerenityOS profiles.
- Some OSes don't cache the MAC address when the driver loads and therefore
don't need any of the above workarounds. This seems to be the case for Haiku,
OpenBSD and FreeBSD.
Note that the same applies to IP addresses, so a dhcp client should only be run
after the state has been loaded.

7
docs/profiling.md Normal file
View file

@ -0,0 +1,7 @@
v86 has a built-in profiler, which instruments generated code to count certain
events and types of instructions. It can be used by building with `make
debug-with-profiler` and opening debug.html.
For debugging networking, packet logging is available in the UI in both debug
and release builds. The resulting `traffic.hex` file can be loaded in Wireshark
using file -> import from hex -> tick direction indication, timestamp %s.%f.