Spaces:
Runtime error
Runtime error
Here's an overview of v86's workings. For details, check the | |
[source](https://github.com/copy/v86/tree/master/src). | |
The major limitations of WebAssembly are (for the purpose of making emulators with jit): | |
- structured control flow (no arbitrary jumps) | |
- no control over registers (you can't keep hardware registers in wasm locals across functions) | |
- no mmap (paging needs to be fully emulated) | |
- no patching | |
- module generation is fairly slow, but at least it's asynchronous, so other things can keep running | |
- there is some memory overhead per module, so you can't generate more than a few thousand | |
v86 has an interpreted mode, which collects entry points (targets of function | |
calls and indirect jumps). It also measures the hotness per page, so that | |
compilation is focused on code that is often executed. Once a page is | |
considered hot, code is generated for the entire page and up to `MAX_PAGES` | |
that are directly reachable from it. | |
v86 generates a single function with a big switch statement (brtable), to | |
ensure that all functions and targets of indirect jumps are reachable from | |
other modules. The remaining control flow is handled using the "stackifier" | |
algorithm (well-explained in | |
[this blog post](https://medium.com/leaningtech/solving-the-structured-control-flow-problem-once-and-for-all-5123117b1ee2)). | |
At the moment, there is no linking of wasm modules. The current module is | |
exited, and the main loop detects if a new module can be entered. | |
In practice, I found that browsers don't handle this structure (deep brtables, | |
with locals being used across the entire function) very well, and `MAX_PAGES` | |
has to be set to fairly low, otherwise memory usage blows up. It's likely that | |
improvements are possible (generating fewer entry points, splitting code across | |
multiple functions). | |
Code-generation happens in two passes. The first pass finds all basic block | |
boundaries, the second generates code for each basic block. Instruction | |
decoding is generated by a [set of | |
scripts](https://github.com/copy/v86/tree/master/gen) from a [table of | |
instructions](https://github.com/copy/v86/blob/master/gen/x86_table.js). It's | |
also used to [generate | |
tests](https://github.com/copy/v86/blob/master/tests/nasm/create_tests.js). | |
To handle paging, v86 generates code similar to this (see `gen_safe_read`): | |
``` | |
entry <- tlb[addr >> 12 << 2] | |
if entry & MASK == TLB_VALID && (addr & 0xFFF) <= 0xFFC - bytes: goto fast | |
entry <- safe_read_jit_slow(addr, instruction_pointer) | |
if page_fault: goto exit-with-pagefault | |
fast: mem[(entry & ~0xFFF) ^ addr] | |
``` | |
There is a 4 MB cache that acts like a tlb. It contains the physical address, | |
read-only bit, whether the page contains code (in order to invalidate it on | |
write) and whether the page points to mmio. Any of those cases are handled in | |
the slow path (`safe_read_jit_slow`), as well as walking the page tables and | |
triggering page faults. The fast path is taken in the vast majority of times. | |
The remaining code generation is mostly a straight-forward, 1-to-1 translation | |
of x86 to wasm. The only analysis done is to optimise generation of condional | |
jumps immediately after arithmetic instructions, e.g.: | |
``` | |
cmp eax, 52 | |
setb eax | |
``` | |
becomes: | |
``` | |
... // code for cmp | |
eax <- eax < 52 | |
``` | |
A lazy flag mechanism is used to speed arithmetic (applies to both jit and | |
interpreted mode, see | |
[`arith.rs`](https://github.com/copy/v86/blob/master/src/rust/cpu/arith.rs) and | |
[`misc_instr.rs`](https://github.com/copy/v86/blob/master/src/rust/cpu/misc_instr.rs)). | |
There's a wip that tries to elide most lazy-flags updates: | |
https://github.com/copy/v86/pull/466 | |
FPU instructions are emulated using softfloat (very slow, but unfortunately | |
some code relies on 80 bit floats). | |