Tuesday, July 16, 2013

Understanding the Java bytecode even before the JVM takes it!

Yeah....passion and endurance is what I'll need in order to go through the wreckage of my so-beautifully-written piece of Java code, bytecode as known by nerds.
I'll update what and how I learned once I kick in its tooth. And with that I mean, ASAP.
3 days later….
javap -c
Legends say that the "p" stands for printer. But we never met 'em... And "c" option is for disassembling the class files.
javap as described in the JSE documentation is an utility that disassembles the class files provided in its argument and prints a human-readable version of the bytecode.
I stumbled upon this utility just for enjoyment and in my first encounter, I instantly entered "exit" when I saw a weird output on my console. But now, I find it as the first step to understand program behavior for simple programs (I'll never do that for a bigger program :D)

javac StringReferenceTest.java
class StringReferenceTest{
  void stringRefTest(){
   String str1 = "hangover10";
   String str2 = "hang" + "over10";
   String str3 = "hangover" + str1.length();
  }
 }
 
D:\Java_Bin>javap -c StringReferenceTest
Compiled from "StringReferenceTest.java"
 class StringReferenceTest extends java.lang.Object{
 StringReferenceTest();
   Code:
    0:   aload_0
    1:   invokespecial   #1; //Method java/lang/Object."":()V
    4:   return

void stringRefTest();
   Code:
    0:   ldc     #2; //String hangover10
    2:   astore_1
    3:   ldc     #2; //String hangover10
    5:   astore_2
    6:   new     #3; //class java/lang/StringBuilder
    9:   dup
    10:  invokespecial   #4; //Method java/lang/StringBuilder."":()V
    13:  ldc     #5; //String hangover
    15:  invokevirtual   #6; //Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
    18:  aload_1
    19:  invokevirtual   #7; //Method java/lang/String.length:()I
    22:  invokevirtual   #8; //Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
    25:  invokevirtual   #9; //Method java/lang/StringBuilder.toString:()Ljava/lang/String;
    28:  astore_3
    29:  return

} 
I am only concerned with how the Strings are loaded or formed, so I will only write about the related operation codes (opcode). Instructions 0 and 3 ldc refers to loading of constants into the execution stack. We are not running the program, these operation codes only specify to JVM how to handle them.
We have two lines:
String str1 = "hangover10";

 String str2 = "hang" + "over10";

The compiler is optimized over the years to do interning even before JVM gets the code. It knows very well that the two String references, str1 and str2, point to same string literals since str2 will also get constructed when the class is loaded and that we have not used any explicit command like new or toString() to prepare the string. The compiler will intern the "hang" and "over10" literals and find that same literal was already present for str1 reference, same will be used for str2 also. That's why instructions 0 and 3 seem to load the same constant (literal) as ldc    #2.
Let us see what happens for following line
String str3 = "hangover" + str1.length();

Instruction number 6 happens to use a new opcode. There we are, our code just forced the compiler to use this keyword even though we did not use directly. A new object of java.lang.StringBuilder will be created when JVM sees this.
Instruction 9: dup, I do not know about this.
Following is an interesting set of instructions:
Instruction 10: invokespecial opcode will invoke the no-argument constructor of StringBuilder class.
Instruction 13: ldc    #5 will inform the JVM to load "hangover" to the stack.
Instruction 15: invokevirtual informs the JVM to call the append(String) method of StringBuilder class and passes the previous loaded constant, ldc    #5, as parameter.
Instruction 18: Load into stack.
Instruction 19: invokevirtual informs the JVM to call length() method of String class. I is the output type I think.
Instruction 22: invokevirtual now informs the JVM to append I from previous instruction, i.e., the length of previously loaded string, to the existing StringBuilder object using append(int). API length to method.
Now the StringBuilder object has the following content   |h|a|n|g|o|v|e|r|1|0|   , you guessed it, in the form of char array. Learn more about StringBuilder here.
Instruction 25: invokevirtual instructs the JVM to convert the StringBuilder object to String object using String class toString() method, since this has to be assigned to a String variable str3.

Now we can easily tell why
str1 == str2 //will pass, and

str1 == str3 //will fail, and

str2 == str3 //will fail

No comments:

Post a Comment

Liked or hated the post? Leave your words of wisdom! Thank you :)