1001 Hacks

Tuesday, September 1, 2009

Yellowjacket types

The programming language I'm writing finally has a name (yellowjacket) and it's linked on the left.

One of the more tedious aspects of writing a dynamically typed run time that's not object oriented (especially in C) is coercing types to do what you want. Using C unions and structures creatively can clean things up a bit. In yellowjacket, the yjk_object type is essentially either an immediate value like a fixnum or char, or a pointer to a heap object. You can determine which it is by examining the two least significant bits, which contains a tag. If those bits are zero, the remaining bits contain an integer value. If the tag is 1, the yjk_object is a pointer (after the tag is set to zero) to an actual object.

The actual objects have a 32 bit header which shows their size in memory and type, and bits reserved for garbage collection purposes. Various types include yjk_float, yjk_string, etc.

Because the yjk_object is simply a union of all of these types, you can easily coerce the yjk_object to be anything in C without explicit casting, like this:


yjk_object
add_fixnums(yjk_object a, yjk_object b)
{
  return a.fixnum + b.fixnum;
}

Essentially it's a clean way to nearly bypass the C type system, which is exceedingly useful when implementing a dynamic language.

Wednesday, June 17, 2009

GCC inline stack allocation

Well, the overzealous stack allocation problem in my program is likely due to inline functions, because it can be fixed by disallowing them using -fno-inline. This causes the stack frame of my interpret function to shrink to 12 bytes from 44.

This is really quite horrible, because the functions I declared inline used no local variables. It's possible that gcc was inlining larger functions, so I'll have to check into that. What I'm reading indicates that GCC has difficulty re-using stack space for multiple inlined functions. If so, that sucks.

I'm guessing the long term solution will be to only allow GCC to inline functions that I specify.

I did some searching and apparently I'm not the first one with this problem:

Linux and GCC inline

From Linus: "Gcc inlining is a total and utter pile of shit."

I'll take his word for it, but my experience bears this out. I guess I'll be using macros instead.

Tuesday, June 16, 2009

GCC switch statement stack allocation

One of the things I've noticed when writing the bytecode interpreter for my programming language is that GCC uses excessive stack space in the large switch statement I'm using.

I decided to use a switch statement rather than a jump table because I figured the compiler would optimize the jumps for me, which it appears to do. Apparently this results in better branch prediction than a jump table would. I have not tested this, performance-wise, but that's what the literature I've read indicates.

Anyway, since the switch statement is switching on a bytecode, it's pretty massive, and there can be a lot of local variables depending on which case is true. To illustrate:


while (running)
{
  switch (opcode)
  {
    case 0:
      {
        int a;
        int b;
        int c;
        int d;
        ...some routine involving a and b...
      }
    break;

    case 1:
      {
        int c;
        ...some routine involving c...
      }
    break;

    case 2:
      {
         // call interpret();
      }
    break;
  } 
}

This causes a stack allocation of at least 16 bytes for the four ints (a, b, c, and d), even if case 1 or 2 are true 99% of the time. There's no way to tell the compiler that case 0 is hardly ever true, and odds are it would still allocate space for case 0's local variables just in case.

In a switch statement with 50 cases and each case equally likely, just one case that allocates a large number of local stack space can quickly eat up memory in deeply nested function calls, since the space is allocated on function entry.

This is hardly a problem with "normal" programs, but in my case the interpreter has about 50 cases and calls itself recursively. With each function entry about 44 bytes (enough for all local variables in the whole function) are allocated, even though most of that memory is never used. This severely limits the depth of recursion when memory is constrained.

The ideal solution would be if the switch statement could allocate the stack space after it evaluates each case.

I may have to work around this using a combination of if/else and switch for the more complex routines.

Wednesday, June 10, 2009

Future of computing

I've been thinking a lot about the shift in direction in CPU design lately, and I've come to the conclusion that things will pretty much stay the same, even given an explosion in the number of cores available.

First, the average desktop computer is hardly ever running the CPU at 100% unless there's something wrong. Very few applications that I use are CPU bound, and my guess it that this is also true for server apps. For server apps the problems that are CPU bound are either trivially spread among multiple servers using a load-balancing scheme, or are likely long-running scientific calculations.

There are easily 50 processes running on this WinXP computer right now, and I have Firefox and Outlook open. Even with a CPU with 50 cores, the OS can spread the load at the process boundary to get speed/latency increases without ever having to change programming styles.

I can think of one or two apps I use that actually peg the CPU for any significant time, and those are video editing and encoding programs. These would benefit from multiple cores, but the rest of the software I use happily runs with a small percentage of single-core CPU right now.

So what's the future? Will email ever develop into something so complex and slow that it needs more than an entire 3 GHz CPU to run smoothly? I'm not discounting this possibility, but it seems unlikely.

I think the future of computing will belong to an architecture that uses the increased power for reliability and ease of development rather than performance. I think it's safe to assume that the application you write five to ten years from now will have more than enough CPU power to run. The trick is to make it dead-simple to create reliable, responsive programs.

With that in mind, I'm working on a new programming language (starting with a virtual machine) that combines a number of features I've been interested in for a while:

1) Compiler-enforced cooperative threading: The big idea behind this language is that, like Erlang, you'll handle most tasks by spawning an extremely lightweight thread. The threads (called tasks) are non-native and the compiler will insert time checks at function calls and loop iterations to run the scheduler. There is no need to call a yield() function, the compiler does that for you at the right time.

All I/O will be asynchronous so one task doesn't block progress of the others. Each task has its own heap and stack; the lack of shared data means that garbage collection can be run on a per-task basis and thread synchronization and locking are non-issues. Tasks will communicate via message passing.

The advantages of doing this are numerous. Cooperatively threaded programs won't ever context-switch during inopportune times. In my case, the only time tasks may give up control is between function calls or loop iterations. Things proceed in a very deterministic way (especially with the simple round-robin scheduler that I have now.)

Also, creating a task is extremely fast, requiring only that a small stack and heap be allocated (I'm using malloc for portability reasons).

The disadvantage to this design is that it takes a little more work to ensure that one task won't hang the entire process, and there is no hardware or operating system support (portably) that will allow automatic checking of stack/heap overflow, so those limits have to be checked manually. This check is done at the beginning of every function call, since the compiler can tell in advance how much stack space a function will use.

2) Message-passing architecture: Tasks communicate via messages, with complex data being serialized and copied. The advantage is that sharing no data means that read/write race conditions are a non-issue. The other advantage is that the serialized messages can just as easily be sent over a network.

The disadvantage here is performance, which is a cost I'm willing to pay.

3) Mixed bytecode and compiled code run time: The virtual machine I'm writing is stack-based, and can call compiled functions from byte-coded functions and vice-versa. There is no overhead for doing this. The drawback is that for this to work effectively, a bytecoded function must be compiled along with all of the functions it calls.

4) Stack-based, bytecode virtual machine: This ensures portability and ease of implementation. The virtual machine is coded almost entirely in C.

5) Distributed VM's: multiple VM instances can be run on the same machine and/or on multiple machines in a network. This offers a simple way to take advantage of multiple cores (1 process per core) or multiple nodes in a network, as necessary.

6) Language Agnostic VM: The VM is designed for dynamic, functional languages, but will support most language features you can throw at it. Apparently people who want first-class functions, continuations, tail-call optimization, weak-types, etc., aren't well served by the CLR or Java virtual machine. I'd like to support these types of languages first.

7) Easy integration with native code: self-explanatory.

The VM I'm working on now has no JIT compiler, it's purely bytecode. That said, it can create tasks and schedule between them, pass messages between tasks, support tail calls and iterative loops, and is slightly faster than PERL running similar basic test programs. I consider the VM's performance to be "good enough to ignore for now."

I haven't begun on the syntax of the language itself, but there will definitely be a syntax that looks a lot like C and another that looks a lot like Lisp. These will be compiled to an intermediate bytecode assembly language, and from there to bytecode. The bytecode assembler is written and working.

Other than the compiler-enforced cooperative threads I don't know if there's anything original here (and the cooperative thread thing must exist somewhere), but I'm trying to take the best ideas from the more esoteric languages and bring them under the C-like syntax umbrella.

Wednesday, August 22, 2007

Javascript this

One of the most frustrating things about Javascript is its handling of "this," which you first encounter when attaching events to objects, and next when you try to pass member functions to other member functions within an object, and then try to call them from another member function, like in this (contrived) prototype:


...

sayHello: function(text)
{
  alert("Hello " + text);
},

handlerFunc: function(myFunc)
{
  //call a member function
  myFunc('drblast!');
},

doAlert: function()
{
  this.handlerFunc(this.sayHello);
}

...

You'd expect to see an alert window with "Hello drblast!" in it, but instead, an error occurs. The reason is that "this" is set to the global "this" object when you call the myFunc function from within handlerFunc(), since the name "myFunc" is not a member of any object, even though it's representing a member function.

To get this to work, you need to set a member variable to the member function you want to call, as follows:


...

handlerFunc: function(myFunc)
{
  //call a member function
  this.tempFunc = myFunc;
  this.tempFunc('drblast!');
},

...

Now this will be set properly, and you'll get an alert window with message as expected.

Tuesday, August 21, 2007

IE <pre> tag bug

For those of you searching for an answer to why IE doesn't handle whitespace properly in <pre> tags that have been updated with javascript by setting innerHTML, this is a bug in IE, and exists as of this date in version 7.

You can work around it by adding an additional <pre> </pre> in the innerHTML, so you have nested <pre>'s. That works. Or you can put the <pre> you want to update in a container <div> and update the innerHTML property of the container <div> instead.

The second approach seems to be the more correct way to handle this, but you'll have to re-attach any events to the new pre, because you're creating a new element instead of updating the old one.

Wednesday, August 15, 2007

What's old is new again

While looking for some info on vi, I found the following interview with Bill Joy, from around 1984:

REVIEW: You mention everything but disks.
JOY: You might want to page over satellite telephone... Page fault, and the computer makes a phone call. Direct broadcast or audio disk - that's the technology to do that. It's half a gigabyte - and you get 100 kilobyte data rate or a megabyte or something. I don't remember. You can then carry around with you all the software you need. You can get random data through some communications link. It is very like Dick Tracy. Have you seen these digital pagers? You can really communicate digital information on a portable.

I don't think you need to have a disk with you. There are so many people who believe that you need to have a disk that you'll be able to have one because they'll make it cheap. That's the way things work. It's not what's possible but what people believe is possible...

What's particularly striking about this quote is that he was extraordinarily prescient and wayyyy off simultaneously.

He was talking about how people would use computing devices in the future, and his idea was that instead of mass storage in a small format, we'd have thin clients where even the operating system was served from a remote machine, very much like the Unix consoles he was working on at the time. (at 1200 baud, apparently)

So let's look at what happened. Just as he predicted, hard drives have gotten cheaper and smaller and now everyone has a portable one (iPod). However, the network latency and throughput has not yet caught up to to hard drive speed to where it's feasible to do all of your computing over a network link. You still need an operating system. But he knew that would probably be the case, because so many people wanted it to be like that.

So it seems like he was right on, except that it's obvious that he was thinking that big servers and thin clients would continue to exist. Even though he talked about Apple's new Mac in the interview, I don't think he predicted the huge increase in the performance of cheap hardware.

It's easy to see why. A million times faster is an easy concept to understand. If I told you computers will be a million times faster in 20 years, you'd shrug and say, "sure, that sounds about right." But what neither of us can envision is what people will actually do with that horsepower.

I guarantee you that if you went back 20 years and told people that everyone will own a multi-core CPU that runs at 3GHz, and that with all that horsepower, the most popular application development environment is an interpreted scripting language that makes asynchronous network calls to grab data and display it on a tree-based document display program that runs on top of an graphical windowing system (which, coincidentally, already has all of the widgets to do all of those things the browser does), which is implemented in multiple layers that span multiple processes that all run on top of the operating system, they'd look at you like you were crazy. And yet that's exactly what we're doing with Javascript, which in effect, is a way to make today's lightning fast hardware behave just like the network computer Bill Joy described.

He was exactly right, but he never predicted there would be five or six layers in between pages of RAM and the user's data.

Here's the big question: if you were develop a system to do what web browsers and Javascript do from scratch, would it look anything like a web browser and Javascript? My guess is no. I'm not sure if that would be a good thing or not.

So. . .here's my prediction, taking into account the pace of hardware development and the history of software development. In 20 years, cheap hardware will be ridiculously fast, but it will still look very much like Intel hardware today. We'll have many, many CPU cores to work with, but nobody will use a parallel programming language designed to take advantage of multiple cores. Instead, virtualization (i.e. VMware, Xen) will be integreted into the operating system, and each process will run on its own virtual machine.

Each web browser will have been expanded into a full blown widget toolkit and have merged in something that looks like Flash, but there will still be multiple incompatible browsers. The latest craze will be a browser compatibility layer written in a programming language that compiles to "raw Javascript", and it will reduce the performance of applications but allow you to use them anywhere.

People will set about re-writing a version of Photoshop in the new compatibility layer, and everyone will wonder why they'd do that, when the current version of Photoshop runs in Internet Explorer just fine.

Tuesday, August 14, 2007

Light on Dark vs. Dark on Light

I'm not a graphic designer by trade, but elements of graphic design come up often when you produce a web site. I'm always conflicted about whether to use a dark background with light text or vice versa.

With paper, it's obvious that dark text on a light background is easiest to read.

When I'm working all day in emacs, though, black on white text is painful. I either have to darken the background to a lighter gray, or sometimes I reverse the colors to a light gray on black or dark gray.

I've always been able to work much longer without strain with a light on dark background, but I'll be the first to admit it doesn't look as warm and pleasing as dark text on light.

So here we are

I never thought I'd be doing this, but it's become apparent that blogs about computer programming are extremely useful, not to mention ultra-hip and will have the chicks at Starbucks flocking over to my non-hip, non-Mac laptop to see what I'm writing about. I'm just kidding, the chicks would have to flock to Dunkin Donuts to find me. It looks like I still need to work on the color scheme and shrink the column width down so that about three words will fit into each line, though. Then I'll feel smarter.

I started writing a vi-clone in Javascript yesterday. It's still in its infancy, but the cursor controls and insert and command mode work, so it's just a matter of adding commands to make it complete. The difficult part was realizing that no amount of trying would make a <textarea> behave like I wanted it to, so I had to use a <div> and implement a line editor from scratch.

Javascript is quickly becoming one of my favorite languages, although it still has its quirks. It's extremely flexible, and has first class functions, which are making their way up the list of mandatory features for me.

The most glaring problem that Javascript has is its annoying treatment of "this." I'm especially peeved that you can't attach a method of an object to an event handler using addEventListener. Yes, there are ways around this, but this should be easy. Right now, it makes it difficult to encapsulate everything in objects, so you always run the risk of name collisions by using global function names.

Of course, you could add a prefix to everything, like vieditor_keypressed(ev){}, but this is 2007, and the namespace problem has been solved for years.

Anyway, that's a small complaint. I think Javascript gets a bad name mostly because of all of the browser incompatibilities, but it's minimal work to write a decent layer to hide those.