docs: how it works, networking with state images and profiling

2023-01-06 16:13:51 +01:00 · 2023-01-06 16:13:51 +01:00 · 17a6b3b4e9
parent d1cf93e2ed
commit 17a6b3b4e9
5 changed files with 135 additions and 7 deletions
--- a/Readme.md
+++ b/Readme.md
@ -49,6 +49,16 @@ list of emulated hardware:
 [KolibriOS](https://copy.sh/v86/?profile=kolibrios) —
 [QNX](https://copy.sh/v86/?profile=qnx)
 ## Docs
 [How it works](docs/how-it-works.md) —
 [Networking](docs/networking.md) —
 [Archlinux guest setup](docs/archlinux.md) —
 [Windows 2000/XP guest setup](docs/windows-xp.md) —
 [9p filesystem](docs/filesystem.md) —
 [Linux rootfs on 9p](docs/linux-9p-image.md) —
 [Profiling](docs/profiling.md)
 ## Compatibility
 Here's an overview of the operating systems supported in v86:
--- a/docs/filesystem.md
+++ b/docs/filesystem.md
@ -1,9 +1,7 @@
 A 9p filesystem is supported by the emulator, using a virtio transport. Using
-it, files can be exchanged with the guest OS, see
+it, files can be exchanged with the guest OS, see `create_file` and `read_file`
-[`create_file`](/src/browser/starter.js#L1179-L1199)
+in [`starter.js`](https://github.com/copy/v86/blob/master/src/browser/starter.js).
-and
+It can be enabled by passing the following options to `V86`:
 [`read_file`](/src/browser/starter.js#L1209-L1228). It can
 be enabled by passing the following options to `V86Starter`:
 ```javascript
 filesystem: {
--- a/docs/how-it-works.md
+++ b/docs/how-it-works.md
@ -0,0 +1,81 @@
 Here's an overview of v86's workings. For details, check the
 [source](https://github.com/copy/v86/tree/master/src).
 The major limitations of WebAssembly are (for the purpose of making emulators with jit):
 - structured control flow (no arbitrary jumps)
 - no control over registers (you can't keep hardware registers in wasm locals across functions)
 - no mmap (paging needs to be fully emulated)
 - no patching
 - module generation is fairly slow, but at least it's asynchronous, so other things can keep running
 - there is some memory overhead per module, so you can't generate more than a few thousand
 v86 has an interpreted mode, which collects entry points (targets of function
 calls and indirect jumps). It also measures the hotness per page, so that
 compilation is focused on code that is often executed. Once a page is
 considered hot, code is generated for the entire page and up to `MAX_PAGES`
 that are directly reachable from it.
 v86 generates a single function with a big switch statement (brtable), to
 ensure that all functions and targets of indirect jumps are reachable from
 other modules. The remaining control flow is handled using the "stackifier"
 algorithm (well-explained in
 [this blog post](https://medium.com/leaningtech/solving-the-structured-control-flow-problem-once-and-for-all-5123117b1ee2)).
 At the moment, there is no linking of wasm modules. The current module is
 exited, and the main loop detects if a new module can be entered.
 In practice, I found that browsers don't handle this structure (deep brtables,
 with locals being used across the entire function) very well, and `MAX_PAGES`
 has to be set to fairly low, otherwise memory usage blows up. It's likely that
 improvements are possible (generating fewer entry points, splitting code across
 multiple functions).
 Code-generation happens in two passes. The first pass finds all basic block
 boundaries, the second generates code for each basic block. Instruction
 decoding is generated by a [set of
 scripts](https://github.com/copy/v86/tree/master/gen) from a [table of
 instructions](https://github.com/copy/v86/blob/master/gen/x86_table.js). It's
 also used to [generate
 tests](https://github.com/copy/v86/blob/master/tests/nasm/create_tests.js).
 To handle paging, v86 generates code similar to this (see `gen_safe_read`):
 ```
 entry <- tlb[addr >> 12 << 2]
 if entry & MASK == TLB_VALID && (addr & 0xFFF) <= 0xFFC - bytes: goto fast
 entry <- safe_read_jit_slow(addr, instruction_pointer)
 if page_fault: goto exit-with-pagefault
 fast: mem[(entry & ~0xFFF) ^ addr]
 ```
 There is a 4 MB cache that acts like a tlb. It contains the physical address,
 read-only bit, whether the page contains code (in order to invalidate it on
 write) and whether the page points to mmio. Any of those cases are handled in
 the slow path (`safe_read_jit_slow`), as well as walking the page tables and
 triggering page faults. The fast path is taken in the vast majority of times.
 The remaining code generation is mostly a straight-forward, 1-to-1 translation
 of x86 to wasm. The only analysis done is to optimise generation of condional
 jumps immediately after arithmetic instructions, e.g.:
 ```
 cmp eax, 52
 setb eax
 ```
 becomes:
 ```
 ... // code for cmp
 eax <- eax < 52
 ```
 A lazy flag mechanism is used to speed arithmetic (applies to both jit and
 interpreted mode, see
 [`arith.rs`](https://github.com/copy/v86/blob/master/src/rust/cpu/arith.rs) and
 [`misc_instr.rs`](https://github.com/copy/v86/blob/master/src/rust/cpu/misc_instr.rs)).
 There's a wip that tries to elide most lazy-flags updates:
 https://github.com/copy/v86/pull/466
 FPU instructions are emulated using softfloat (very slow, but unfortunately
 some code relies on 80 bit floats).
--- a/docs/networking.md
+++ b/docs/networking.md
@ -1,7 +1,11 @@
 # v86 networking
 Emulating a network card is supported. It can be used by passing the
-`network_relay_url` option to `V86Starter`. The url must point to a running
+`network_relay_url` option to `V86`. The url must point to a running
 WebSockets Proxy. The source code for WebSockets Proxy can be found at
-https://github.com/benjamincburns/websockproxy.
+[benjamincburns/websockproxy](https://github.com/benjamincburns/websockproxy).
 An alternative, Node-based implementation is
 [krishenriksen/node-relay](https://github.com/krishenriksen/node-relay).
 The network card could also be controlled programatically, but this is
 currently not exposed.
@ -13,3 +17,31 @@ browser-compatible `WebSocket` constructor being present in the global scope.
 throttling built-in by default which will degrade the networking.
 `bellenottelling/websockproxy`docker image has this throttling removed via
 [websockproxy/issues/4#issuecomment-317255890](https://github.com/benjamincburns/websockproxy/issues/4#issuecomment-317255890).
 ### Interaction with state images
 When using state images, v86 randomises the MAC address after the state has
 been loaded, so that multiple VMs don't receive the same address. However, the
 guest OS is not aware that the MAC address has changed, which prevents it from
 sending and receiving packets correctly. There are several workarounds:
 - Unload the network driver before saving the state. On Linux, unloading can be
  done using `rmmod ne2k-pci` or `echo 0000:00:05.0 >
  /sys/bus/pci/drivers/ne2k-pci/unbind` and loading (after the state has been
  loaded) using `modprobe ne2k-pci` or `echo 0000:00:05.0 >
  /sys/bus/pci/drivers/ne2k-pci/bind`
 - Pass `preserve_mac_from_state_image: true` to the V86 constructor. This
  causes MAC addresses to be shared between all VMs with the same state image.
 - Pass `mac_address_translation: true` to the V86 constructor. This causes v86
  to present the old MAC address to the guest OS, but translate it to a
  randomised MAC address in outgoing packets (and vice-versa for incoming
  packets). This mechanism currently only supports the ethernet, ipv4, dhcp and
  arp protcols. See `translate_mac_address` in
  [`src/ne2k.js`](https://github.com/copy/v86/blob/master/src/ne2k.js). This is
  currently used in Windows, ReactOS and SerenityOS profiles.
 - Some OSes don't cache the MAC address when the driver loads and therefore
  don't need any of the above workarounds. This seems to be the case for Haiku,
  OpenBSD and FreeBSD.
 Note that the same applies to IP addresses, so a dhcp client should only be run
 after the state has been loaded.
--- a/docs/profiling.md
+++ b/docs/profiling.md
@ -0,0 +1,7 @@
 v86 has a built-in profiler, which instruments generated code to count certain
 events and types of instructions. It can be used by building with `make
 debug-with-profiler` and opening debug.html.
 For debugging networking, packet logging is available in the UI in both debug
 and release builds. The resulting `traffic.hex` file can be loaded in Wireshark
 using file -> import from hex -> tick direction indication, timestamp %s.%f.