This article is the author sharing his experience of how to write a Java Virtual Machine (JVM) in Rust. He emphasizes that this is a toy-level JVM, mainly for learning purposes, not a serious implementation. Nonetheless, he implements some non-trivial features such as control flo

2023-07-25 02:25:14 shows 4936℃

This article is the author sharing his experience of how to write a Java Virtual Machine (JVM) in Rust. He emphasizes that this is a toy-level JVM, mainly for learning purposes, not a serious implementation. Nonetheless, he implements some non-trivial features such as control flow statements, object creation, method invocation, exception handling, garbage collection, etc. He also details the implementation details of code organization, file parsing, method execution, modeling of values and objects, instruction execution, exception handling, and garbage collection.

link: https://andreabergia.com/blog/2023/07/i-have-written-a-jvm-in-rust/

Without permission, reprinting is prohibited! Author of

| Andrea Bergia editor | Mingming Ruyue editor | Xia Meng produced | CSDN (ID: CSDNnews)

Recently, I have been learning Rust. Like any sane person, after writing a few hundred-line programs, I decided to do something more challenging: I wrote a Java Virtual Machine (Java Virtual Machine). In a very creative way I named it rjvm. You can find the source code on GitHub.

I want to emphasize that this is just a toy level JVM built for learning, not a serious implementation.

it does not support:

generic
thread
reflection
annotation
I/O
just-in-time compiler
string intern function

However, some very trivial things have been implemented:

control flow statements (if, for, ...)
creation of primitive types and objects
virtual and static method calls
exception handling
garbage collection 3 parsing class

from jar file The following is part of the test suite:

class StackTracePrinting { public static void main(String[] args){ Throwable ex = new Exception(); StackTraceElement[] stackTrace = ex.getStackTrace(); for (StackTraceElement element : stackTrace) { tempPrint( element.getClassName() + "::" + element.getMethodName() + " - " + element.getFileName() + ":" + element.getLineNumber()); } } // We use this in place of System.out.println because we don't have real I/O private static native void tempPrint(String value);}

which uses the real rt.jar which contains the OpenJDK 7 classes -- so in the example above, the java.lang.StackTraceElement class is from the real JDK!

I'm really happy with what I've learned, both about Rust and how to implement a virtual machine. I'm extra happy with my implementation of a real, working, garbage collector. Although it's average, it's me who wrote it, and I love it. Now that I've achieved my original goal, I've decided to stop here. I know there are some issues, but I have no plans to fix them.

overview

In this article, I will show you how my JVM works. In upcoming articles, I'll discuss some of the aspects involved here in more detail.

code organization

This is a standard Rust project.I divided it into three packages (aka crates):

reader, which reads .class files and contains types to model their content;
vm, which contains a virtual machine that can execute code as a library; A command-line launcher for running a VM, in the spirit of the java executable.

I'm considering pulling the reader package into a separate repo and publishing to crates.io as it might actually be helpful to others.

parses .class files

As we all know, Java is a compiled language——javac compiler compiles your .java source files into various .class files, usually distributed in .jar file, which is just a zip file. Therefore, the first thing to do when executing some Java code is to load an .class file, which contains the bytecode generated by the compiler. A class file contains various things:

metadata of the class, such as its name or source file name
superclass name
implemented interfaces
fields, along with their types and annotations
methods and:

their descriptors, which is a string , indicating the type of each parameter and the return type of the method
metadata, such as throws clauses, annotations, generic information
bytecodes, and some additional metadata, such as exception handler table and line number table.

As mentioned above, for rjvm I created a separate package called -reader that parses a class file and returns an Rust struct that models a class and all its contents. The main API of the

vm package is Vm::invoke, which is used to execute methods. It needs a CallStack parameter, which will contain multiple CallFrame, and each CallFrame corresponds to a method being executed. When the main method is executed, the call stack will initially be empty and a new stack frame will be created to run it. Each function call then adds a new stack frame to the call stack. When the execution of a method ends, its corresponding stack frame is discarded and removed from the call stack.

Most methods will be implemented in Java, so their bytecode will be executed. However, rjvm also supports native methods, that is, methods implemented directly by the JVM rather than in Java bytecode. There are many such methods in the "lower layers" of the Java API, the parts that need to interact with the operating system (for example to do I/O) or need runtime support. Some examples of the latter that you may have seen include System::currentTimeMillis, System::arraycopy, or Throwable::fillInStackTrace. In rjvm , these are implemented via Rust functions.

JVM is a stack-based virtual machine, which means that bytecode instructions are mainly operated on the value stack. There is also a set of local variables identified by indices that can be used to store values and pass parameters to methods. In rjvm these are associated with each call stack frame.The

Value type is used to simulate possible values of local variables, stack elements or object fields, as follows:

/// Simulate a general value that can be stored in local variables or operand stacks #[derive(Debug, Default, Clone, PartialEq)]pub enum Value'a { /// An uninitialized element, it should not appear on the operand stack, but it is the default state of local variables #[default] Uninitialized, /// Simulate all 32-bit or less data types in the Java virtual machine: `boolean`, /// `byte`, `char`, `short`, and `int`. Int(i32), /// Models a `long` value. Long(i64), /// Models a `float` value. Float(f32), /// Models a `double` value. Double(f64), /// Models an object value Object(AbstractObject'a), /// Models a null object Null,}

Incidentally, this is a nice abstraction for Rust's enum types (summable types), which are great for expressing the fact that a value may be of many different types.

For storing objects and their values, I initially used a simple struct Object that contained a reference to a class (to simulate the type of the object) and an Vec for storing field values. However, when I implemented the garbage collector, I modified this structure to use a lower-level implementation with lots of pointers and casts, equivalent to C-style! In the current implementation, an AbstractObject (which simulates a "real" object or array) is just a pointer to a byte array containing a few header bytes followed by the field's value. The

execution instruction

execution method means to execute its bytecode instructions one by one. The JVM has a long list of instructions (over two hundred!), encoded by one byte in the bytecode. Many instructions are followed by parameters, and some are of variable length. In code, this is emulated by the type Instruction:

/// represents a Java bytecode instruction. #[derive(Clone, Copy, Debug, Eq, PartialEq)]pub enum Instruction { Aaload, Aastore, Aconst_null, Aload(u8), // ...}

As mentioned above, the execution of the method will keep a The stack and a set of local variables, which instructions refer to by index. It also initializes the program counter to zero - the address of the next instruction to be executed. The instruction will be processed and the program counter will be updated - usually advance one space, but various jump instructions can move it to a different location. These are used to implement all flow control statements such as if, for or while.

Another special class of directives are those that call another method. There are several ways of resolving which method should be called: virtual or static lookups are the main ones, but there are others. After parsing the correct instruction, rjvm will add a new frame to the call stack and start the execution of the method. Unless the method's return value is void, the return value is pushed onto the stack and execution resumes. The

Java bytecode format is quite interesting, and I'm going to dedicate an article to discussing the various types of instructions.

exception handling

exception handling is a complex task because it breaks normal control flow, possibly returning from a method early (and propagating up the call stack!). Nonetheless, I'm pretty happy with how I've implemented it, and I'll show some relevant code next.

First of all, you need to know that any catch block corresponds to an entry in the method exception table, and each entry includes the covered program counter range, the address of the first instruction in the catch block, and the exception class name that the block can catch.

Next, the signature of CallFrame::execute_instruction is as follows:

fn execute_instruction( mut self, vm: mut Vm'a,call_stack: mut CallStack'a, instruction: Instruction,) - Result'a, MethodCallFailed'a

where the type is defined as:

/// The possible execution results of the instruction enum InstructionCompleted'a { /// indicates that the executed instruction is return One of n series. The caller /// should stop method execution and return a value. ReturnFromMethod(OptionValue'a), /// indicates that the instruction is not a return, so execution should continue from the /// instruction from the program counter. ContinueMethodExecution,}/// Indicates that the method execution fails pub enum MethodCallFailed'a { InternalError(VmError), Exception Thrown(JavaException'a),}

standard Rust Result types are:

enum ResultT, E { Ok(T), Err(E),}

Therefore, executing an instruction may result in four possible states:

instruction executed successfully and execution of current method can continue (standard case);
instruction executed successfully and is a return instruction, so current method should return (optional) return value;
cannot execute instruction because some internal VM error occurred;
cannot execute instruction , because a standard Java exception was thrown.

Therefore, the code to execute the method is as follows:

/// Execute the entire method impl'a CallFrame'a { pub fn execute( mut self, vm: mut Vm'a, call_stack: mut C allStack'a, ) - MethodCallResult'a { self.debug_start_execution(); loop { let executed_instruction_pc = self.pc; let (instruction, new_address) = Instruction::parse( self.code, executed_instruction_pc.0.into_usize_safe()).map_err(|_|MethodCallFailed::InternalError( VmError:: ValidationException ) )?; self.debug_print_status(instruction); // before executing the instruction, move the pc to the next instruction, // because we want "goto" to cover this step self.pc = ProgramCounter(new_add ress as u16); let instruction_result = self.execute_instruction(vm, call_stack, instruction); match instruction_result { Ok(ReturnFromMethod(return_value)) = retur n Ok(return_value), Ok(ContinueMethodExecution) = { /* continue the loop */ } Err(MethodCallFailed::InternalError(err)) = { return Err(MethodCallFailed::InternalError(err)) } Err(MethodCallFailed::ExceptionThrown(exception)) = { let exception_handler = self.find_exception_h andler( vm, call_stack, executed_instruction_pc, exception,); match exception_handler { Err(err) = return Err(err), Ok(N one) = { // Bubble the exception to the caller return Err(MethodCallFailed::ExceptionThrown(exception)); } Ok(Some(catch_handler_pc)) = { // Push the exception back onto the stack and continue execution of this method from the catch handler self.stack.push(Value::Object(exception.0))?; self.pc = catch_handler_pc; } } } } } }}

I know there are a lot of implementation details in this code, but I hope it demonstrates how Rust's Result and pattern matching map nicely to the behavior described above. I must say I'm pretty proud of this code.

In rjvm, the last milestone is to implement the garbage collector. The algorithm I've chosen is a stop-the-world (this is obviously due to no threads!) half-space copying collector. I implemented a poor variant of Cheney's algorithm - but I should really be implementing the real Cheney's algorithm.

The idea of this algorithm is to divide the available memory into two parts, called half-spaces: one part will be active and used for allocating objects, and the other part will not be used. When the active half space is full, garbage collection is triggered and all surviving objects are copied to the other half space. All object references will then be updated by so that they point to the new copies. In the end, the roles of the two will be swapped - this is similar to how blue-green deployments work.

This algorithm has the following characteristics:

Obviously, it wastes a lot of memory (half of the possible maximum memory!);
compressing objects improves performance because it makes better use of cache lines.

Actual Java Virtual Machines use more complex algorithms, usually generational garbage collectors such as G1 or Parallel GC, which use evolved versions of the copying strategy.

I learned a lot and had fun writing rjvm. I am very satisfied to learn so much from a small project. Maybe next time I'm learning a new programming language I'll pick a slightly less difficult project!

By the way, writing code in Rust language has brought me a very good programming experience. As I've written before, I think it's a great language, and I've definitely had all kinds of fun implementing my JVM with it!

When you are learning a new programming language, have you ever written some difficult or interesting software? Welcome to exchange and discuss in the comment area.

Tags： shows

Prev post： Two days ago, I saw "Southern Zhou" interview Chen Sicheng, and when asked about "kitsch", he answered righteously - "If the capital has no money to earn, then what kind of movie is there?"

Next post： Facing Sheng Liya's body, Meng Chuansheng told the reason why he has not accepted her for so many years: I once liked a girl. Both he and the girl had reached the stage of discussing marriage. Meng Chuansheng, who was about to come to propose marriage, was told by his girlfriend: