Tuesday, June 16, 2009

GCC switch statement stack allocation

One of the things I've noticed when writing the bytecode interpreter for my programming language is that GCC uses excessive stack space in the large switch statement I'm using.

I decided to use a switch statement rather than a jump table because I figured the compiler would optimize the jumps for me, which it appears to do. Apparently this results in better branch prediction than a jump table would. I have not tested this, performance-wise, but that's what the literature I've read indicates.

Anyway, since the switch statement is switching on a bytecode, it's pretty massive, and there can be a lot of local variables depending on which case is true. To illustrate:


while (running)
{
switch (opcode)
{
case 0:
{
int a;
int b;
int c;
int d;
...some routine involving a and b...
}
break;

case 1:
{
int c;
...some routine involving c...
}
break;

case 2:
{
// call interpret();
}
break;
}
}


This causes a stack allocation of at least 16 bytes for the four ints (a, b, c, and d), even if case 1 or 2 are true 99% of the time. There's no way to tell the compiler that case 0 is hardly ever true, and odds are it would still allocate space for case 0's local variables just in case.

In a switch statement with 50 cases and each case equally likely, just one case that allocates a large number of local stack space can quickly eat up memory in deeply nested function calls, since the space is allocated on function entry.

This is hardly a problem with "normal" programs, but in my case the interpreter has about 50 cases and calls itself recursively. With each function entry about 44 bytes (enough for all local variables in the whole function) are allocated, even though most of that memory is never used. This severely limits the depth of recursion when memory is constrained.

The ideal solution would be if the switch statement could allocate the stack space after it evaluates each case.

I may have to work around this using a combination of if/else and switch for the more complex routines.

4 comments:

Kyle H said...

Did you ever figure this out? I have a similar problem.

drblast said...

I've since switched away from using an interpreter in favor of JIT compiled code (specifically subroutine threading), so the switch statement doesn't exist anymore.

This is a limitation of GCC, and C in general. It's not expressive enough to let you give hints to the compiler in cases like this.

Troy Curtis said...

Both ways of approaching it have pros and cons. Here you are concerned with stack size, but GCC is providing the option with better performance. It takes executable instructions to change the stack size, and (other than memory usage) it doesn't buy you anything. By allocating all stack variables at entry to the function, one stack "allocation" is performed, and the references can be used everywhere else. However, syntactically you have scoped variables. Probably wouldn't matter much for a one-off block, but inside a loop those instructions could really add to the run-time (especially if the loop itself was short).

Now perhaps the best option would be to allow the programmer to annotate it somehow to optimize it better. I'm curious to see if compiling for size optimization (-Os) results in deferred stack allocation? Not curious enough to test it.

drblast said...

Normally this is a performance improvement, but in my pathological case it would have been a performance penalty as it would have resulted in cache misses, and the "extra instruction" penalty would have been paid only very rarely when that specific case in the switch statement was true.

GCC does have the "__builtin_expect" macro that might fix this, but I think that's designed to optimize branch prediction rather than stack allocation. I haven't tested it, in any event.