Creating a better preprocessing compiler generator language

The rate of newly created programming languages increases constantly. While a lot of these languages have compilers that make it easy to process them, many new languages don’t have compilers because of the expense and difficulty associated with writing a good compiler for a new language. The Tosca preprocessor addresses this issue. It is an open source project maintained by IBM.

1430724 42c2 2

Current compiler tools

There are current front-end tools that help compiler writers construct the initial passes of new compilers quickly, generating efficient code in an extensible and maintainable manner. These tools include tokenizer-generators and parser-generators — Lex and Yacc, later Flex and Bison, and more recently ANTLR.

For the compiler back end, a set of frameworks exists for generating various kinds of code. Although they seem less standardized as a whole, each one is incredibly useful to its own users.

But what about the middle of the compiler?

Transforming your intermediate representations (IR)

After you’ve tokenized and parsed the input language, formed an abstract syntax tree (AST) or some intermediate representations (IR), but before you convert the IR into some back-end target language — in the “big middle” — that’s where you need to transform your IR.

Transforming your IR can mean to normalize down to a kernel of your language, type-assess, rewrite your IR, optimize the program flow, or other transformations. IRs are considered to be a critical part of your language and its compilation.

So, where are the tools to help you rewrite your IR? Are there any as useful as Flex and Bison? There are alternatives, but they aren’t as useful as they could be or are too complex to learn.

This is where Tosca comes into the picture. It is a lightweight preprocessor that increases a developer’s productivity when dealing with syntax-driven, source-to-source transformation.

Tosca in use

For instance, we would like to directly manipulate the programming language concrete syntax like this:

rule Compile( myExpr? if #condition then #expr1 else #expr2 ? )
? javaExpr? ( ?Compile(#condition?? ) ? ?Compile(#expr1)? else ?Compile(#expr2)? ?

Tosca or github’s tosca-lang organization is officially “the preprocessing compiler generator language”. Its goal is to provide a simple language that is quick to learn and easy to use, to specify the inner source-to-source transformations inside your compiler and quickly emit C++ or Java code to run that transformation.

Tosca as just a preprocessor and a thin one at that. It generates easy-to-read-and-debug C++ or Java code that directly maps from your Tosca source, so that you understand it and know just what it’s doing.

Getting started

System requirements

Tosca has been tested on OSX and Ubuntu. git and Java 8 must be installed on your system.

Get the code and run a sample

Open a terminal and type these commands:

git clone
cd tosca
./gradlew smoketest
alias tosca="java -jar $PWD/tests/build/tosca.jar"

You should see a bunch of tests running successfully (ignore the warnings). To run the hello world sample, type these commands:

cd samples/hello
tosca run rules=hello.tsc

Install the Atom Tosca package

The recommended way to write Tosca programs is to use Atom. There is a package that integrates Tosca and Atom, providing goodies such as syntax highlighting and syntax checking. To install the Tosca package in Atom, go to Preferences (ctrl-, or command-, depending on your system), click on Install and search for language-tosca. Then click Install.