Most experienced Java developers probably know that when you concatenate strings together in your Java code the Java compiler will use a StringBuilder for you. However; there are quite a few misconceptions about how smart the compiler actually is. In this post I would like to explain what it does, what it doesn’t do, and how to use the tools at your disposal to figure this out yourself.
Basic example
So let’s start with a little example:
public static String appendA(String a, String b) {
return a + b;
}
public static String appendB(String a, String b) {
return new StringBuilder().append(a).append(b).toString();
}
The first one is, although 'cleaner', not less performant right? If we use the javap tool installed with our JDK we can disassemble a compiled class file to look at what the compiler actually did for us.
Note
|
You disassemble a previously compiled .class file using: "javap -c <name>.class" |
This is the output:
public static java.lang.String appendA(java.lang.String, java.lang.String);
Code:
0: new #3 // class StringBuilder
3: dup
4: invokespecial #4 // Method StringBuilder."<init>"
7: aload_0
8: invokevirtual #5 // Method StringBuilder.append
11: aload_1
12: invokevirtual #5 // Method StringBuilder.append
15: invokevirtual #7 // Method StringBuilder.toString
18: areturn
public static java.lang.String appendB(java.lang.String, java.lang.String);
Code:
0: new #3 // class StringBuilder
3: dup
4: invokespecial #4 // Method StringBuilder."<init>"
7: aload_0
8: invokevirtual #5 // Method StringBuilder.append
11: aload_1
12: invokevirtual #5 // Method StringBuilder.append
15: invokevirtual #7 // Method StringBuilder.toString
18: areturn
Note
|
I removed some of the javap output that shows the parameters and return types of the functions being called. It doesn’t add anything to this example and messed up the formatting. |
No surprise here. They result in exactly the same byte code. You see a new StringBuilder being created and it’s constructor being called (it’s the invokespecial call). Then the first parameter is being loaded (aload_0), append() is being called (invokevirtual), the same for the second parameter (aload_1) and then finally we call toString() and return out of the function.
We can already see a small optimization where we can help the compiler by feeding the first parameter into the constructor:
public static String appendC(String a, String b) {
return new StringBuilder(a).append(b).toString();
}
javap output:
public static java.lang.String appendC(java.lang.String, java.lang.String);
Code:
0: new #3 // class StringBuilder
3: dup
4: aload_0
5: invokespecial #10 // Method StringBuilder."<init>"
8: aload_1
9: invokevirtual #5 // Method StringBuilder.append
12: invokevirtual #7 // Method StringBuilder.toString
15: areturn
As you can see we are already 'smarter' than the compiler since we figured out that we can pass the first param to the constructor saving one invokevirtual call. The compiler will not do this for you!
Now this isn’t that interesting an example but where it does become very interesting is within a loop. This is where many developers have the wrong idea about what the compiler does and doesn’t do for you.
For loops
Let’s create a function with a small for loop:
public static String numbersA() {
String line = "";
for(int i = 0;i < 10;i++) {
line += i;
}
return line;
}
This simple function that appends the numbers 0-9 together should use a StringBuilder right? It does, but it might not do so in the way you expect it to:
public static java.lang.String numbersA();
Code:
0: ldc #12 // String
2: astore_0
3: iconst_0
4: istore_1
5: iload_1
6: bipush 10
8: if_icmpge 36
11: new #3 // class StringBuilder
14: dup
15: invokespecial #4 // Method StringBuilder."<init>"
18: aload_0
19: invokevirtual #5 // Method StringBuilder.append
22: iload_1
23: invokevirtual #6 // Method StringBuilder.append
26: invokevirtual #7 // Method StringBuilder.toString
29: astore_0
30: iinc 1, 1
33: goto 5
36: aload_0
37: areturn
If you’re not used to reading assembly-like listings you might wonder where the for loop went. Well; CPU’s don’t do for or while loops. They only do comparisons and jumps. If you check out position 33 you see a "goto": this is the end of our loop. Where does it jump to? To position 5. So it’s easy to spot the start (pos 5) and end (pos 33) of our for loop.
And we also see our familiar StringBuilder being constructed here. But it’s done on position 11: inside our for loop!
So as you can see, even an extremely simple example where you append to a string inside a loop does not use a StringBuilder optimally: each iteration in the loop creates a new StringBuilder, appends the previous value, then the new value, and then stores the toString() of that StringBuilder in the line var.
Fortunately now that we know this we can help the compiler by defining the StringBuilder ourselves:
public static String numbersB() {
StringBuilder builder = new StringBuilder();
for(int i = 0;i < 10;i++) {
builder.append(i);
}
return builder.toString();
}
Which when compiled disassembles to:
public static java.lang.String numbersB();
Code:
0: new #3 // class StringBuilder
3: dup
4: invokespecial #4 // Method StringBuilder."<init>"
7: astore_0
8: iconst_0
9: istore_1
10: iload_1
11: bipush 10
13: if_icmpge 28
16: aload_0
17: iload_1
18: invokevirtual #6 // Method StringBuilder.append
21: pop
22: iinc 1, 1
25: goto 10
28: aload_0
29: invokevirtual #7 // Method StringBuilder.toString
32: areturn
Here we can see that the for loop is from position 10 to 25 and that the only method invoked inside it is the append. The difference in length might not seem much but the actual body from the for-loop went from 28 to 15 bytes!
Conclusion
I hope that this gives a bit more insight into what the compiler does and does not do for you. Especially when concatenating strings together in a loop (for, while and do-while all work the same way) you should always strongly consider using a builder instead of relying on the compiler to come up with an optimal solution.