LR Parser Generator

A few weeks ago I decided that I wanted to write a parser generator. This is one of those projects I’ve wanted to do for a long time but never got started on. When I finally started I decided that it would be fun to learn Scala at the same time, so I wrote the parser generator in Scala.

One of the things that motivated me to get started writing a parser generator is some ideas I have about how parser generator diagnostic messages can be improved. Currently, LR parser generators provide only the most limited information about a shift-reduce or reduce-reduce conflict. For example, the Beaver parser generator can give you something like this:

Warning: Resolved Shift-Reduce conflict by selecting (PLUS: SHIFT; goto 3) over (PLUS: REDUCE E = E PLUS E) using precedence.

It can take a lot of time, ever for an expert, to figure out what the problem is and how to fix the grammar to remove the conflict. I think this can be significantly improved.

Currently my parser generator only generates an LR(0) parser, though my goal is to make it generate LR(1) parsers. The Scala code became surprisingly compact. For example, here is the main parse loop of the parser after the action and goto tables have been generated:

def parse(in: LookaheadBuffer[Symbol], goal: ItemSet) {
  val state = new scala.collection.mutable.Stack[ItemSet]
  val stack = new scala.collection.mutable.Stack[ParseTree]
  state push goal
  while (true) {
    while (reduceRules contains {
      val rule = reduceRules(
      val lhs = rule.lhs
      var children = List[ParseTree]()
      for (i <- 1 to rule.rhs.size) {
        children = children :+ stack.pop
      stack push new ParseTree(lhs, children)
      state push map( goto
    val sym = if (!in.eof) in.pop else EOFToken
    if (sym != WhitespaceToken) {// skip whitespace
      val action = actions(
      action match {
        case Accept =>
        case shift: Shift =>
          stack push new ParseTree(sym)
          state push

I wouldn’t be surprised if there was some way to shave ten more lines off that loop.

Premature Optimization in Java

A few months ago I implemented the Möller-Trumbore ray-triangle intersection algorithm (trivia: the inventor, Tomas Möller, works in the graphics group at my CS department). My initial implementation looked like this (slightly edited):

// e1 and e2 are triangle edges, o is origin
// ray = input ray
Vec pvec = new Vec(0, 0, 0);
Vec qvec = new Vec(0, 0, 0);
Vec tvec = new Vec(0, 0, 0);

pvec.cross(ray.d, e2);
double det =;
if (det > -EPSILON && det < EPSILON)
    return false;

tvec.sub(ray.o, o);

double u = / det;

if (u < 0 || u > 1)
    return false;

qvec.cross(tvec, e1);

double v = / det;

if (v < 0 || (u+v) > 1)
    return false;

double t = / det;
return t < tNear;

After making sure the above was correct I started optimizing it. First I replaced the three divisions by det by calculating the reciprocal of det and then multiplying with that. This replaces three divisions with one division and three multiplications. This should be an improvement if floating point division is at least 1.5 times slower than floating point multiplication, which is generally true.

Next, I removed the temporary Vec objects. This is where things get interesting. The idea is to avoid allocating these temporary objects on the heap and replace them with stack-allocated local variables. The code gets a lot bigger because you have to inline all the cross product and dot product code in the intersection algorithm. The inlined version looks like this:

double px=0,py=0,pz=0;
double qx=0,qy=0,qz=0;
double tx=0,ty=0,tz=0;

px = ray.d.y * e2.z - ray.d.z * e2.y;
py = ray.d.z * e2.x - ray.d.x * e2.z;
pz = ray.d.x * e2.y - ray.d.y * e2.x;
double det = e1.x*px + e1.y*py + e1.z*pz;
if (det > -EPSILON && det < EPSILON)
    return false;
double recip = 1 / det;

tx = ray.o.x - o.x;
ty = ray.o.y - o.y;
tz = ray.o.z - o.z;

double u = (tx*px + ty*py + tz*pz) * recip;

if (u < 0 || u > 1)
    return false;

qx = ty * e1.z - tz * e1.y;
qy = tz * e1.x - tx * e1.z;
qz = tx * e1.y - ty * e1.x;

double v = (ray.d.x*qx + ray.d.y*qy + ray.d.z*qz) * recip;

if (v < 0 || (u+v) > 1)
    return false;

double t = (e2.x*qx + e2.y*qy + e2.z*qz) * recip;
return t < tNear;

The code certainly became less readable, but is it faster? The answer may depend on the JVM you use to run the code, but in most cases the above will NOT be faster. It can even be slower than the non-inlined version.

The Java language has no built-in way to explicitly allocate reference objects on the stack, as C++ does. However, the Java runtime can still just-in-time optimize your code to avoid unnecessary heap allocations. The Java JIT compiler tries to detect objects that don’t escape a method and allocate those on the stack instead. An object does not escape a method if it is not returned from the method, not assigned by reference in some other object, and not passed as an argument to some other method.

It can be difficult to figure out exactly when things are moved from heap to stack, but using the Caliper microbenchmark framework you can get a good idea. I benchmarked two versions of my intersection code against each other. One that used inlining and one that did not. It turned out that the one using inlining was consistently slower, though not by much, than the one not using manual inlining. The small difference in performance that occurred is probably caused by the way the JIT compiler happened to layout the code differently in each case, incidentally giving a slightly better cache coherence advantage to the non-manually-inlined version.

I also benchmarked the versions using the reciprocal multiplication optimization I discussed above and found that using this technique was beneficial, as originally expected.

Java LR Parsing Challenges

I recently gave a talk at the Parsing@SLE workshop in Västerås about LR parsing problems in Java 8. I thought it would be nice to have a short list available of all parsing challenges including previous Java versions too, so that is what you will find in this post. I will mostly not talk about the different ways to solve the parsing challenges listed here, but that may be the topic of follow up posts.

Java Evolution

Obviously the Java language has evolved a lot from it’s original form. It is crazy to think that people used Java without generics once upon a time, yet that is how it started and now we have Lambdas, the shiny new feature that everyone will soon also take for granted.

Adding Java features does have a significant cost. Ideally this cost is paid only by the language designer. Java is pretty good at not breaking old code, which is usually one of the biggest costs when a programming language changes. Providing good backward compatibility is paid for instead by investing a lot in the design effort for new features. Personally I think that Java 8 is well designed, given the constraints of backward compatibility that were placed on the design. With that said though, there are definitely a whole bunch of new parsing problems that resulted from the evolution of Java. The Java grammar is no longer an LR(1) grammar, but with some massaging and scanner hacks you can get it to parse anyway.

LR Parsing Challenges

The Java parsing challenges that I am aware of are listed below. They are described in the context of an LR parser.

1. Right Angle Brackets Have Multiple Meanings

This is one of the most well known parsing problems in Java. Whenever two or three angle brackets are encountered right after each other a regular Java lexer will tokenize them as the bit shift operators >> or >>>. Consider the below example:

class X<A<B>> { }
class X<A<B> > { }

On the first line the right angle brackets are not tokenized individually, but on the second line they are. The parser needs to be able to parse both variations.

This is not too difficult to solve, it has been done many times before in different Java parsers. Usually the bracket nesting depth is encoded for in the parser productions, so that, for example, when parsing inside two or more nested brackets the parser can accept the signed right shift operator >>.

2. Conflict Between Generic Type Cast and Less-Than Expressions

Example of conflict:

(T<A>) x    // casting to generic type
(T<A)       // less-than expression

As you can see there is a common prefix between the two expressions. In LR parsing common prefixes are not a problem if they involve the same terminals or nonterminals. However in this case the same terminal must be reduced to two different nonterminals, RelationalExpression versus ReferenceType, and so we have a reduce-reduce conflict.

3. Conflict Between Generic Type Cast and Relational Expressions

Example of conflict:

(T<A<B>>) x   // generic type cast
(T<A<B)        // relational expressions

It is not possible to use relational operators with boolean value expressions, in other words the second line in the above example is really semantically invalid. However the JLS still allows that syntax – the expression is just discarded during type checking. Another way of putting it is that the relational operators are associative when they could really be non-associative.

The fact that relational operators are associative causes a conflict in the above example. Even if we disregard for a while the common-prefix reduce-reduce conflict you would still get a shift-reduce problem after encountering the second less-than because the parser does not know if it should reduce the previous to a relational expression or shift to build a type argument list. This is a shift-reduce conflict.

4. Ambiguous Grammar Specification for Lambda Expressions

An example of an ambiguous string:

(T) () -> a * b

This is a lambda expression being cast to some type T. The Java Language Specification (JLS) argues for lambda expressions wanting to consume as big expressions as they can, i.e., they have lowest precedence. However in the JLS grammar you can find the following productions (simplified):

Expression -> Lambda
Cast -> "(" Type ")" Lambda

Lambda being in the Expression production makes it live in the top of the expression hierarchy, meaning that it should have the lowest precedence. However, the Cast production turns this on its head and makes the above sample expression parseable as if the expression had the following logical form:

((T) () -> a) * b

This interpretation of the example is incorrect, as the JLS points out. However, the correct interpretation is also a valid parse:

(T) (() -> a * b)

If there are multiple valid parse trees of a single expression it means that the grammar is ambiguous.

5. Conflict Between the Lambda Parameter List and Parenthesized Less-Than Expressions

Example of conflicting expressions:

(T<A> x) -> x    // lambda expression
(T<A)            // less-than expression

This is essentially the same conflict as described in 2. This conflict was introduced in Java 8, as opposed to the reduce-reduce conflict discussed in 2, which was added in Java 5.

6. Reduce-Reduce Conflict for Modifiers

In Java 8 there are new nonterminals for different kinds of modifiers, for example there exist the new nonterminals InterfaceMethodModifier and ClassModifier. Both of these kinds of modifier may exist inside an interface declaration:

interface I {
    static class C { } // static is ClassModifier
    public void m();   // public is InterfaceMethodModifier

The problem is that both static and public are valid productions of both these nonterminals, so there is a reduce-reduce conflict whenever the parser encounters either of these modifiers, and a few other modifiers, inside an interface body.

You can solve this reduce-reduce conflict by letting all modifiers be parsed as the same nonterminal and discard incorrect modifier uses during semantic error checking, however then you will run into a shift-reduce conflict in the switch statement, which is really interesting. This could be a topic for a whole other blog post.

7. Conflict Between Method References and Relational Expressions

Example of conflicting expressions:

f(T<A, B>::m)   // method reference as argument
f(T<A, B>  m)   // two relational expression arguments

Note that the method reference is qualified by a parameterized type. This is probably a very uncommon use of method references, nevertheless a parser must be complete and correct so it is important to take care even of these seemingly unimportant cases.

As seen in the above example there is a common prefix between the method reference and the list of relational expressions. This case is different from the case in 2 in that the list can be arbitrarily long.

8. Conflict Between Intersection Type Cast and Bitwise-And Expressions

An intersection type cast is simply a cast expression that looks like this:

(A & B) x

There can be many types in the intersection type, and these type casts conflict with parenthesized bitwise-and expressions:

(A & B & C) x   // intersection type cast
(A & B & C)     // bitwise-and

The type of the expression is determined by what follows the closing parenthesis. This is a shift-reduce conflict as the parser can not decide whether to shift an identifier onto a type list, or reduce to a bitwise-and expression.


This post was just a brief rundown of the parsing problems we encountered in implementing Java parsing using a generated LALR parser. The explanations were very short and I mostly skipped describing how to solve the challenges. This was done to save space and time. Nevertheless, I hope that this was somewhat useful and/or interesting.

Personally I think the parsing problems I have described here were interesting because they show how very specific and complex parsing problems can be. It is amazing to me that the lambda expression, which is syntactically very flexible, has broken so little in the Java grammar.

Sky Improvements

I’ve been meaning to improve the sky rendering in Chunky for a long time, and finally I have started to work on this project. I had first just planned on adding support for sky maps with full 180° vertical resolution, as opposed to the current 90° sky maps:

Skymap Resolution

The increased vertical resolution means that the entire sky can be textured, without mirroring the sky map at the horizon. Most scenes do need to render the sky below the horizon specifically, however it is not uncommon some part of the render shows a bit of sky mirroring happening:

Distracting sky mirroring at horizon.

The mirroring is quite distracting, and detracts from the overall quality of the render. I think supporting full 360 by 180 degree sky maps is an important improvement.

While I was adding support for more sky maps I thought it would be cool if you could use spherical sky maps, or skyboxes as well. So, skyboxes and spherical sky maps can now also be used in Chunky. Spherical sky maps are pretty cool, because it’s quite easy to make one yourself, just take a picture of a shiny metal sphere and you have a ghetto spherical sky map.

We used spherical sky maps, or light probes as they are called if they use a high dynamic range format, in the rendering course I took last semester. I thought that it would be a shame if I did not add support for HDR formats as well, so I did. Now images using the PFM or RGBE (.hdr) formats can be loaded by Chunky. This means that Chunky is now capable of image-based lighting. Image-based lighting really gives renders a much more realistic appearance. I found several free HDR sky maps online, so I’ve been testing some out for rendering:

Desert Vista

While studying the RGBE HDR image format, I also read a bit about the RADIANCE rendering system and through some links I arrived at Mark Stock’s RADIANCE page and saw that he had also implemented the Preetham et. al. sky model. I was not really satisfied with my own implementation, the one used in Chunky for the default sky, because the sky appeared too pink. My results differed slightly from the images in the paper by Preetham et. al., A Practical Analytic Model for Daylight. When I saw Mark Stock’s results, they seemed much closer to the results in the original paper, so I decided to investigate why. He provides the sources for his implementation on his website. After comparing the implementations for a while I finally arrived at what I believe is the only functional difference: he uses a different matrix to transform from the XYZ color space to RGB! I switched to using the same transformation matrix and voila! The sky looks much better, in my opinion! Here is a comparison video I made (original on the left, corrected version on the right):

Sky Color Variations from Jesper Öqvist on Vimeo.

Color Picker

This post is a bit of a rant about the current state of color pickers in different applications. It also talks a bit about a color picker I created myself, and how I decided to make it different from most other color pickers.

The Terrible JColorChooser

The default color picker interface in the Java Swing API is terrible. I have been using it in Chunky because there was really no simple alternative. Here is a screenshot of the Swing color picker:

Swing JColorChooser

Out of the three tabs in the dialog, the first one is the second least useful. I won’t even show the unspeakably bad third one. I hate the above tab because the color swatches in the palette are too small – it would have been better if that palette was just a continuous gradient of colors. I also hate that the palette has a grid in it with some kind of bevel effect – if there is some grid between the colors in a palette it should just be one solid achromatic color. With the current small color swatches the grid muddies up the perception of the individual colors. It’s a real shame that the above tab is the default one, it makes the JColorChooser much shittier than it would have been if the second one was the default tab.

So, let’s look at the second tab:

Swing JColorChooser

This second tab is better, but still bad. Here are a few problems I have with it:

  • I can not click on the hue bar. To change hue I have to either use that stupid little text field or the shitty slider that is clipped for some reason. Oddly, though, the nice big color palette to the left of the hue slider does not update while dragging the slider – only after you let go of the slider thumb. That makes it very annoying to try to find the correct slider position, it would have been so much better if the color palette updated ASAP while tweaking the hue value.
  • The RGB display. I do not care what the RGB values for the current color are! Give me the hexadecimal number for the color instead!
  • The HSL text fields are pointless. I can not think of any situation where I’d want to input a color by hue, saturation, and lightness unless I want to copy a color from another application, in which case a hexadecimal input would have been much superior. What matters here are the colors, not the numbers.
  • The color preview area at the bottom is too cluttered with stuff. It should have just been one solid swatch of color. The idea of displaying the color side-by-side with black and white for comparing brightness is good, but here it just went overboard.

The Problem

I’ve been looking at other color pickers in different applications and most of the ones I’ve seen so far are awful. The main problem is usually that the interface is very fiddly, due to too small sliders or sliders that are otherwise difficult to adjust. There seems to be some weird idea that the components in a color picker should be small and compact, but the smaller they are the worse they are to use. As a user I want big palettes that can display a color gradient over a large area so I can get a better idea of the different nuances at a glance.

Another common problem is that color pickers lack a good color preview. The preview should be one large swatch of the color, uninterrupted by other crap.

The Fix

After being irritated that I had to use JColorChooser for far too long I finally decided I would invest the effort required to rid myself of JColorChooser for good. This is what I created:


The interface is really simple, and it has only the features that I needed, but I think it is superior to most other color choosers I have tried. What makes it so good?

  1. The sliders are nice and large, and you can click anywhere in the sliders and the thumb jumps directly there. The saturation and lightness sliders update directly whenever any one of the other sliders is changed.
  2. The preview swatch is huge. It shows the current color in all its glory.
  3. There are some smaller swatches with previously selected colors, and a series of colors generated pseudo-randomly from the current color. These give some interesting alternative colors. Again, these swatches are much larger than they would be in other color pickers.
  4. Click the big swatch when you are satisfied to close the dialog. If the dialog loses focus it is closed automatically. These are nice features which make the “Cancel” and “Done” buttons obsolete, however the buttons are still there in case you forget.


I’ve started working on entity support for Chunky. This means that Chunky will be able to render a number of different things that it can currently not render because they did not fit inside a single voxel (a single block in the Minecraft world).

From the beginning Chunky would only render perfectly cuboid objects such as stone blocks, grass blocks etc. After a while I started adding support for non-cuboid things such as fence posts, stairs, torches, etc.

Fence and stair blocks

Most non-cuboid blocks still fit inside a voxel, like the stairs and fence in the above image, so I reused the same representation for both voxel-filling cuboid blocks and non-filling blocks. Without going into too much detail, this meant that although I could render things with more interesting shapes than pure cubes, they could not extend outsize a 1x1x1 meter area. This is a problem for things such as paintings, cows, player models, beacon effects, and a bunch of other miscellaneous blocks or effects that are currently not be rendered by Chunky.


The above image was rendered using the current partial entity support I’ve been working on. Entities support is exciting because it allows rendering a lot of things that could not previously be rendered properly! Only paintings are fully handled right now, and there’s a bit more to do before this is ready for release. I hope to release it soon.