17. Wasted Computation

Part of CS:2820 Object Oriented Software Development Notes, Spring 2017
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

 

The Cost of Concatenation

In the solution distributed for MP2, there is the following code:

                String sourceName = ScanSupport.nextName( sc, "Synapse ??" );
                String dstName = ScanSupport.nextName( sc,
                        "Synapse " + sourceName + " ??"
                );
                delay = ScanSupport.nextFloat( sc,
                        "Synapse " + sourceName + " " + dstName + " ??"
                );
                strength = ScanSupport.nextFloat( sc,
                        "Synapse " + sourceName + " " +
                        dstName + " " + delay + " ??"
                );
                ScanSupport.lineEnd(
                        sc,
                        "Synapse " + sourceName + " " +
                        dstName + " " + delay + " " + strength
                );

In this code, the normal case is that the methods in class ScanSupport() return without reporting an error, since the normal situation with input data files is that they contain few if any errors.

The problem is, each time one of these methods is called, there is a very expensive computation for the error message. The simple looking Java expression "a"+"b"+"c" is actually short hand for something like the following:

new StringBuilder( "a" ).append( "b" ).append( "c" ).toString()

Remember, Java strings are not mutable, and Java string concatenation returns a new string object. Java objects of class StringBuilder are identical to strings, except that they are mutable and have a variety of append() and insert() methods for modifying the string. Things are worse than the above code suggests, because the code to initialize a new StringBuilder from a string has to copy the characters into place one at a time, and the append() method does the same.

The impact of this is considerable. Consider this method call:

                String dstName = ScanSupport.nextName( sc,
                        "Synapse " + sourceName + " ??"
                );

In the above, the total cost of the "nextName()" method may be less than the cost of the computation needed to compute the second parameter to the method call. This cries out for solution.

A First Solution

The first solution that comes to mind is to change the ScanSupport methods so that they take a number of strings as parameters. Thus, we replace this call:

                String dstName = ScanSupport.nextName( sc,
                        "Synapse " + sourceName + " ??"
                );

with this call:

                String dstName = ScanSupport.nextName( sc,
                        "Synapse ", sourceName, " ??", "", "", ""
                );

In the above, we provided 6 parameters because it seemed that that would be enough for the most complex error message that might be needed. In the normal case, nextName() ignores these parameters, but if there is a need to assemble an error message, it concatenates them. As a result, we still pay the cost of parameter passing, perhaps one or two instructions per string parameter, but the cost of concatenation is deferred until it is actually needed.

Of course, the number of parameters we need to pass depends on the particular error message we are generating. That's a nuisance. Also, for some of our error messages, the values we are passing aren't strings. Consider this line of code:

                ScanSupport.lineEnd(
                        sc,
                        "Synapse " + sourceName + " " +
                        dstName + " " + delay + " " + strength
                );

Here, our new model would require something like this:

                ScanSupport.lineEnd(
                        sc,
                        "Synapse ", sourceName, " ",
                        dstName,  " ", delay.toString, " ", strength.toString
                );

We were forced to convert the floating point numbers to their textual form because the parameters were all strings. We could, of course, make a specialized form of scan-support routine that knows the format of the error message, but this forces the author of scan-support to know far more about the use of the routines. They become specific to the particular neuron network or traffic network context instead of being general purpose routines. As a result, they are harder to design and implement.

A General Solution

The most general solution involves replacing the data parameter with a parameter that conveys a computation. In Java, the way we do this is to contruct an object and pass that object. If the called routine needs the value, it will call a method of that object. That is the method that will do the work. Consider this new version of the SyntaxCheck.lineEnd() method:

public abstract class ErrorMessage {
        public abstract String toString();
}

static void lineEnd( Scanner sc, ErrorMessage message ) {
                String skip = sc.nextLine();
                if (!"".equals( skip )) {
                        Errors.warning( message + message.toString() );
                }
        }

Now, all we have to do to call our syntax-check method is first create a new subclass of ErrorMessage with the appropriate toString() method.

This sounds awful, but Java provides some shorthand to make it easy. We'll do the awful long-winded solution first before we look at the shorthand notation.