Tuesday, August 7, 2012

StringBuilder Optimizations Demystified

Introduction

There are lots of myths around concatenating Strings in Java. Lets find out exactly which of them are true.

The results are based on the Oracle JDK 1.7 (Update 3). Please, feel free to leave a comment, if the compiler you are using works differently.

Myth 1: You should always use StringBuilder when concatenating Strings

When arguing about whether to use StringBuffer or StringBuilder it is usually fair to say that StringBuilder is better, because the StringBuffer has all the methods synchronized which can harm the performance. However, it is not always true that you have to use StringBuilder:

  1. The most obvious case is concatenating final values. Joining the standard Java constants:

    public static final String X = "123";
    public static final String Y = X + "456";
    public static final String Z = Y + 789;
    

    ...will be optimized by the compiler to:

    public static final String X = "123";
    public static final String Y = "123456";
    public static final String Z = "123456789";
    

    What may be surprising, the final on its own is sufficient, so this snippet:

    public class SomeClass {
      public final String a = "123";
      public final String b = a + "456";
      public final String c = b + 789;
    }
    

    ...will be optimized to:

    public class SomeClass {
      public final String a;
      public final String b;
      public final String c;
    
      public SomeClass() {
        a = "123";
        b = "123456";
        c = "123456789";
      }
    }
    

    The bytecode behind looks like this:

    public class SomeClass {
      public final java.lang.String a;
      public final java.lang.String b;
      public final java.lang.String c;
    
      public SomeClass();
        Code:
           0: aload_0       
           1: invokespecial #18                 // Method java/lang/Object."<init>":()V
           4: aload_0       
           5: ldc           #7                  // String 123
           7: putfield      #20                 // Field a:Ljava/lang/String;
          10: aload_0       
          11: ldc           #10                 // String 123456
          13: putfield      #22                 // Field b:Ljava/lang/String;
          16: aload_0       
          17: ldc           #13                 // String 123456789
          19: putfield      #24                 // Field c:Ljava/lang/String;
          22: return        
    }
    
  2. What about the non-final fields? You should be a little bit concerned why do you want this, but leaving the rest to the compiler is also OK. Although the non-final fields are not optimized straight into Strings, StringBuilder is used. The following code for example:

    public class SomeClass {
      public String x = "123";
      public String y = x + "456";
      public String z = y + 789;
    }
    

    ...will be transformed by the compiler into this:

    public class SomeClass {
      public String x;
      public String y;
      public String z;
    
      public SomeClass() {
        x = "123";
        y = (new StringBuilder(String.valueOf(x))).append("456").toString();
        z = (new StringBuilder(String.valueOf(y))).append(789).toString();
      }
    }
    

    The bytecode for the intrigued:

    public class SomeClass {
      public java.lang.String x;
      public java.lang.String y;
      public java.lang.String z;
    
      public SomeClass();
        Code:
           0: aload_0       
           1: invokespecial #12                 // Method java/lang/Object."<init>":()V
           4: aload_0       
           5: ldc           #14                 // String 123
           7: putfield      #16                 // Field x:Ljava/lang/String;
          10: aload_0       
          11: new           #18                 // class java/lang/StringBuilder
          14: dup           
          15: aload_0       
          16: getfield      #16                 // Field x:Ljava/lang/String;
          19: invokestatic  #20                 // Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
          22: invokespecial #26                 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
          25: ldc           #29                 // String 456
          27: invokevirtual #31                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
          30: invokevirtual #35                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
          33: putfield      #39                 // Field y:Ljava/lang/String;
          36: aload_0       
          37: new           #18                 // class java/lang/StringBuilder
          40: dup           
          41: aload_0       
          42: getfield      #39                 // Field y:Ljava/lang/String;
          45: invokestatic  #20                 // Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
          48: invokespecial #26                 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
          51: sipush        789
          54: invokevirtual #41                 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
          57: invokevirtual #35                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
          60: putfield      #44                 // Field z:Ljava/lang/String;
          63: return        
    }
    
  3. How about concatenating arguments of different types inside a method? Leave it to the compiler. This piece of code:

    long a = 123;
    String b = "456";
    int c = 789;
    String d = "000";
    String result = a + b + c + d;
    

    ...will be turned into this:

    long a = 123L;
    String b = "456";
    int c = 789;
    String d = "000";
    String result = (new StringBuilder(String.valueOf(a))).append(b).append(c).append(d).toString();
    

    The bytecode:

    Code:
           0: ldc2_w        #51                 // long 123l
           3: lstore_1      
           4: ldc           #29                 // String 456
           6: astore_3      
           7: sipush        789
          10: istore        4
          12: ldc           #53                 // String 000
          14: astore        5
          16: new           #18                 // class java/lang/StringBuilder
          19: dup           
          20: lload_1       
          21: invokestatic  #55                 // Method java/lang/String.valueOf:(J)Ljava/lang/String;
          24: invokespecial #26                 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
          27: aload_3       
          28: invokevirtual #31                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
          31: iload         4
          33: invokevirtual #41                 // Method java/lang/StringBuilder.append:(I)Ljava/lang/StringBuilder;
          36: aload         5
          38: invokevirtual #31                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
          41: invokevirtual #35                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
          44: astore        6
          46: return        
    

Myth 2: You can rely on the StringBuilder optimization when you are concatenating inside an if statement

After reading some Java compiler specs you can fall for this. Unfortunately this is not true. Whenever you are concatenating Strings in more than one expression, you should watch out. For example this:

String x = "123";
x += "456";
x += "789";

...will be unfortunately "optimized" to this code:

String x = "123";
x = (new StringBuilder(String.valueOf(x))).append("456").toString();
x = (new StringBuilder(String.valueOf(x))).append("789").toString();

...when we expected this code:

String x = (new StringBuilder("123")).append("456").append("789").toString();

Again the bytecode for the comparison with the decompiled Java code:

    Code:
       0: ldc           #14                 // String 123
       2: astore_1      
       3: new           #18                 // class java/lang/StringBuilder
       6: dup           
       7: aload_1       
       8: invokestatic  #20                 // Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
      11: invokespecial #26                 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
      14: ldc           #29                 // String 456
      16: invokevirtual #31                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      19: invokevirtual #35                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      22: astore_1      
      23: new           #18                 // class java/lang/StringBuilder
      26: dup           
      27: aload_1       
      28: invokestatic  #20                 // Method java/lang/String.valueOf:(Ljava/lang/Object;)Ljava/lang/String;
      31: invokespecial #26                 // Method java/lang/StringBuilder."<init>":(Ljava/lang/String;)V
      34: ldc           #66                 // String 789
      36: invokevirtual #31                 // Method java/lang/StringBuilder.append:(Ljava/lang/String;)Ljava/lang/StringBuilder;
      39: invokevirtual #35                 // Method java/lang/StringBuilder.toString:()Ljava/lang/String;
      42: astore_1      
      43: return    

Going further, the "advanced" example with an if statement will work out no better. The following:

String y = "123";
long z = 789;
boolean b = false;

y += "456";
if(!b) {
  y += "789";
  y += z;
}

...will NOT be magically turned into this:

StringBuilder sb = new StringBuilder("123");
long z = 789;
boolean b = false;

sb.append("456");
if(!b) {
  sb.append("789");
  sb.append(z);
}
String y = sb.toString();

Instead it will come down to this:

String y = "123";
long z = 789L;
boolean b = false;

y = (new StringBuilder(String.valueOf(y))).append("456").toString();
if(!b) {
  y = (new StringBuilder(String.valueOf(y))).append("789").toString();
  y = (new StringBuilder(String.valueOf(y))).append(z).toString();
}

As an exercise you can check the bytecode yourself.

To generate the bytecode you have to first compile your java class:
javac YourJavaClass.java
Then disassemble the class file:
javap -cp . -c YourJavaClass > bytecode.txt
As a result the bytecode will be saved to the bytecode.txt file.

Myth 3: StringBuilder should be used only if you are concatenating inside a loop

This myth is based on the assumption that the compiler is so clever that it can guess correctly where and how to put StringBuilder for the most of the time. After reading up to this point you should already know it is not true. StringBuilder should be used much more than expected. However, joining Strings inside a loop is a special case, where you can be 100% sure you have to use StringBuilder (or StringBuffer on rare occasions). For instance this:

String x = "123";
for(int i = 1; i < 100; i++) {
  x += i;
}

...equals this:

String x = "123";
for(int i = 1; i < 100; i++) {
  x = (new StringBuilder(String.valueOf(x))).append(i).toString();
}

...and we really wanted this:

StringBuilder sb = new StringBuilder("123");
for(int i = 1; i < 100; i++) {
  sb.append(i);
}
String x = sb.toString();

Summary

I hope you enjoyed the article. Please, leave a comment, if you think there is something that should be added to the topic.

TL;DR: For the most cases use the StringBuilder class. Better be safe than sorry.


UPDATE:
TL;DR (made by standardout in the comments): As a general rule, use '+' operator String concatenation when everything is defined in a single line. In some cases it will be more efficient, but even when not it will at least be easier to read. But never use += with Strings.
If you want to learn more, you can follow the Reddit discussion about this blog post (as Javin Paul suggested).

13 comments:

  1. Doesn't example 2 (where you lose the "final" keyword) run into this part of the Java Language Specification and get turned into constants at compile time anyway?

    ReplyDelete
    Replies
    1. Instead of answering here I will point to the Reddit discussion where your question was answered:
      http://www.reddit.com/r/programming/comments/xtan6/stringbuilder_optimizations_demystified/

      Delete
    2. Thanks for reddit thread link, It complements your blog post well, better to include in main post for more exposure. By the way I have shared few differences on String and StringBuffer, mostly on theoretical part.

      Delete
    3. Thanks for this nice tip. I updated the article.

      Delete
  2. What tools did you use to get the compiler optimized code?

    ReplyDelete
    Replies
    1. I used DJ Java Decompiler, but you can also learn a lot about your code from the bytecode generated using the standard javap.

      For example, if you have some Java code stored in a file "MyJavaClass.java", then you can use the following two commands to save the corresponding bytecode in a file "TheBytecode.txt":

      javac MyJavaClass.java
      javap -cp . -c MyJavaClass > TheBytecode.txt

      Delete
  3. As a general rule, use '+' operator String concatenation when everything is defined in a single line. In some cases it will be more efficient, but even when not it will at least be easier to read. But never use += with Strings.

    I saw a place where the developer had heard the String concatenation is slow myth, but still didn't want to be bothered with constantly creating StringBuffers. So he created a helper function createString which took in a list of Strings, iterated over them, and appended them to a StringBuffer, then used it all over the place (including places where he was concatenating constants). It was done as a part of an optimization effort, yet ended up making the code slower.

    ReplyDelete
    Replies
    1. As a general rule, use '+' operator String concatenation when everything is defined in a single line. In some cases it will be more efficient, but even when not it will at least be easier to read. But never use += with Strings.

      That is exactly the conclusion you should come to after reading this article. You did a great job of putting it in such a few words. I hope you would not be mad, if I reused it and updated the article :] (otherwise let me know).

      I saw a place where the developer had heard the String concatenation is slow myth. (...) It was done as a part of an optimization effort, yet ended up making the code slower.

      As a developer you are kind of obliged to know this things, but as you wrote sometimes people do not find it important to make the effort and understand the implications of their code.

      Delete
  4. Thank you for your article.

    ReplyDelete
  5. nice research but keep in mind that performance issues are often in the IO (connections, db,...) and rarely in micro loop optimization ;)

    see https://www.coderanch.com/how-to/java/EnterprisePerformance
    'It's probably the database' point

    ReplyDelete
    Replies
    1. Yes, I agree with you, database is usually the main source of performance troubles. The main idea behind this article is, as someone already said on the Reddit discussion, doing String concatenation right is a low hanging fruit that is easy to pick. With minimum effort you can avoid some nasty memory issues.

      Delete