String.Join performance issue in C#

String.Join performance issue in C#

I've been researching a question that was presented to me: How to write a function that takes a string as input and returns a string with spaces between the characters.  The function is to be written to optimize performance when it is called thousands of times per second.

I know that .net has a function called String.Join, to which I may pass in the space character as a separator along with the original string.
Barring the use of String.Join, I can use the StringBuilder class to append spaces after each character.
Another way to accomplish this task is to declare a character array with 2*n-1 characters (You have to add n-1 characters for the spaces).  The character array can be filled in a loop and then passed to the String constructor.

I've written some .net code that runs each of these algorithms one millions times each with the parameter "Hello, World" and measures how long it takes to execute.  Method (3) is much, much faster than (1) or (2).
I know that (3) should be very fast because it avoids creating any additional string references to be garbage collected, but it seems to me that a built-in .net function such as String.Join should yield good performance.  Why is using String.Join so much slower than doing the work by hand?
public static class TestClass
{
    // 491 milliseconds for 1 million iterations
    public static string Space1(string s) 
    {            
        return string.Join(" ", s.AsEnumerable());
    }

    //190 milliseconds for 1 million iterations
    public static string Space2(string s) 
    {
        if (s.Length < 2)
            return s;
        StringBuilder sb = new StringBuilder();
        sb.Append(s[0]);
        for (int i = 1; i < s.Length; i++)
        {
            sb.Append(' ');
            sb.Append(s[i]);
        }            
        return sb.ToString();
    }

    // 50 milliseconds for 1 million iterations
    public static string Space3(string s) 
    {
        if (s.Length < 2)
            return s;
        char[] array = new char[s.Length * 2 - 1];
        array[0] = s[0];
        for (int i = 1; i < s.Length; i++)
        {
            array[2*i-1] = ' ';
            array[2*i] = s[i];
        }
        return new string(array);
    }

Update:  I have changed my project to "Release" mode and updated my elapsed times in the question accordingly.

Solutions/Answers:

Answer 1:

Why is using String.Join so much slower than doing the work by hand?

The reason String.Join is slower in this case is that you can write an algorithm that has prior knowledge of the exact nature of your IEnumerable<T>.

String.Join<T>(string, IEnumerable<T>) (the overload you're using), on the other hand, is intended to work with any arbitrary enumerable type, which means it cannot pre-allocate to the proper size. In this case, it's trading flexibility for pure performance and speed.

Many of the framework methods do handle certain cases where things could be sped up by checking for conditions, but this typically is only done when that "special case" is going to be common.

In this case, you're effectively creating an edge case where a hand-written routine will be faster, but it is not a common use case of String.Join. In this case, since you know, exactly, in advance what is required, you have the ability to avoid all of the overhead required to have a flexible design by pre-allocating an array of exactly the right size, and building the results manually.

You'll find that, in general, it's often possible to write a method that will out perform some of the framework routines for specific input data. This is common, as the framework routines have to work with any dataset, which means that you can't optimize for a specific input scenario.

Answer 2:

Your String.Join example works on an IEnumerable<char>. Enumerating an IEnumerable<T> with foreach is often slower than executing a for loop (it depends on the the collection type and other circumstances, as Dave Black pointed out in a comment). Even if Join uses a StringBuilder, the internal buffer of the StringBuilder will have to be increased several times, since the number of items to append is not known in advance.

Answer 3:

Since you aren't using the Release build (which should have optimizations checked by default) and/or you're debugging through visual studio then the JITer will be prevented from making a lot of it's optimizations. Because of this you're just not getting a good picture of how long each operation really takes. Once you add in the optimizations you can get the real picture of what's going on.

It's also important that you not be debugging in visual studio. Go to the bin/release folder and double click the executable entirely outside of visual studio.

Answer 4:

In your first method, you are using the overload of String.Join that operates on an Enumerable, which requires that the method walk the characters of the string using an enumerator. Internally, this uses a StringBuilder as the exact number of characters is unknown.

Have you considered using the String.Join overload that takes a string (or string array) instead? That implementation allows a fixed length buffer to be used (similar to your third method) along with some internal unsafe string operations for speed. The call would change to - String.Join(" ", s); Without actually doing the legwork to measure, I would expect this to be on par or faster than your third approach.

Answer 5:

The bad performance is not coming from String.Join, but from the way you handle each character. In this case, since characters have to be handled individually, your first method will create much more intermediate strings and the second method suffers from two .Append method calls for each character. Your third method does not involve a lots of intermediate strings or methods calls and that's the reason why your third method is the fastest.

Answer 6:

When you have passed an IEnumerable to String.Join, it has no idea on how much memory needs to be allocated. I allocates a chunk of memory, resizes it if it is insufficient and repeats the process until it gets enough memory to accommodate all the strings.

The array version is faster because we know the amount of memory allocated well ahead.

Also please not that when you are running the 1st version, GC might have occurred.

References