The Transterpreter Project

Concurrency, everywhere.

Websites, source code, and compilers

I need to be better about posting regular updates. If we're going to have a blog, I should use it.

Website and source code

First, we've moved to MediaWiki for our main website. This was a large change, and hopefully it means more documentation will find its way more readily to the WWW. If nothing else, it means Damian will be able to edit pages related to projects he works on, since the old system was very, very scary, and very, very ugly. (I thought the old system would be a good idea! My bad!)

Second, source will soon be available via anonymous SVN. We have one or two more things to put into place, but will be making source code available shortly. (If you can't wait, send me an email at matt at transterpreter dot org.) Tied into this is our switch from Mantis to Trac for project management. You can view

which lets you view the source to the Transterpreter, skroc, and slinker, as well as submit bugs and whatnot. We'll make a Subversion checkout URL available shortly.

42, a compiler for ...

Third, we've begun work on a new compiler for an oocam-like language. The existing compiler is too large and monolithic to consider extending, and the up-and-coming NOCC is already 50,000 lines of C. I've already said my peace on the foolishness of writing a compiler in C, so I'll move along swiftly.

Our goal is for our compiler to be extensible, first and foremost. To do interesting research in areas surrounding concurrent programming languages, we're going to need a platform upon which to experiment and test ideas. Furthermore, I want it to be a platform that undergraduates who are interested in doing research can contribute to in meaningful ways. So, we've started from scratch with a subset of occam; we're calling the compiler 42.

I'll probably show similar pictures again in the future, but I wanted to show a few syntax errors that the compiler generates so far. You see, I'm a bit picky about syntax errors, and if we do our job right, this compiler will have better syntax errors than any other compiler for any other language out there.

An undefined process instance

If you're trying to call a pre-defined process, the question is: is it defined? This is much like trying to invoke a method in Java that hasn't been defined, or calling a function in Haskell that you hadn't yet actually written the code for. Currently, we just say 'process-name' not defined; in my book, this isn't enough. And since the compiler is written in a micro-pass style (very much like the Indiana nanopass style (PDF)), I can improve this error while writing this post.

One file is responsible for this check, and it is called check-instances-exist.scm. I can open it up and improve the message very simply. I like the improved error a great deal more.

An improved, undefined process instance message

That took one minute. Perhaps less. Now, instead of just saying 'bar' is undefined, the message says

	'bar' is not defined. For it to be defined
	you must have something that looks like
		 (proc bar (...) ...)
	in your code, where the '...'s might be filled
	in with code you write yourself.

 in: (seq (:= x 4) (foo x) (bar))

Along with this message, the relevant code is highlighted, so the user can see exactly where their syntax error took place. (To do this, I'm leveraging the excellent and extensible DrScheme programming environment, which was designed to allow end-users the ability to add new programming languages to the environment.)

Adding new syntax errors to the compiler takes only slightly more time than modifying an existing error message, but in general we have broken every check or test out into its own, separate unit of functionality---a separate compiler pass. This makes maintaining and extending the compiler a manageable, and even enjoyable, task. Finding code that fails unit tests also becomes a snap.

Another error a programmer might make is to define a process more than once; again, this is like having two methods in Java with the same name and same arguments---and this isn't allowed. Likewise, in our little language, you can only define a named process once.

Duplicate definitions

Here, the first instance is highlighted (I can't easily highlight all of them, unfortunately---but we'll work on that), and the error message reports all the lines in the code where the process is defined. In this example, we have three definitions of foo in the source code. The compiler catches all three, tells me where they are, and highlights the first offending instance.

My personal goal is for every error message in the compiler to be this rich and informative. I'm considering instrumenting the compiler to catch errors it has never seen before, and report those errors directly back to us (with the user's permission, of course). This way, as new syntax errors with poor messages are found, we can improve the compiler using real user data. This would provide valuable research data and a paper or two along the way.

As we now have our source code viewable, we'll soon be making it publicly available. Please drop a note if you're interested in working with or contributing to any of the tools we're developing.


  • Posted: June 22, 2006
  • Author: Matthew Jadud
  • Comments: None
  • Tags: None