Java LR Parsing Challenges

I recently gave a talk at the Parsing@SLE workshop in Västerås about LR parsing problems in Java 8. I thought it would be nice to have a short list available of all parsing challenges including previous Java versions too, so that is what you will find in this post. I will mostly not talk about the different ways to solve the parsing challenges listed here, but that may be the topic of follow up posts.

Java Evolution

Obviously the Java language has evolved a lot from it’s original form. It is crazy to think that people used Java without generics once upon a time, yet that is how it started and now we have Lambdas, the shiny new feature that everyone will soon also take for granted.

Adding Java features does have a significant cost. Ideally this cost is paid only by the language designer. Java is pretty good at not breaking old code, which is usually one of the biggest costs when a programming language changes. Providing good backward compatibility is paid for instead by investing a lot in the design effort for new features. Personally I think that Java 8 is well designed, given the constraints of backward compatibility that were placed on the design. With that said though, there are definitely a whole bunch of new parsing problems that resulted from the evolution of Java. The Java grammar is no longer an LR(1) grammar, but with some massaging and scanner hacks you can get it to parse anyway.

LR Parsing Challenges

The Java parsing challenges that I am aware of are listed below. They are described in the context of an LR parser.

1. Right Angle Brackets Have Multiple Meanings

This is one of the most well known parsing problems in Java. Whenever two or three angle brackets are encountered right after each other a regular Java lexer will tokenize them as the bit shift operators >> or >>>. Consider the below example:

class X<A<B>> { }
class X<A<B> > { }

On the first line the right angle brackets are not tokenized individually, but on the second line they are. The parser needs to be able to parse both variations.

This is not too difficult to solve, it has been done many times before in different Java parsers. Usually the bracket nesting depth is encoded for in the parser productions, so that, for example, when parsing inside two or more nested brackets the parser can accept the signed right shift operator >>.

2. Conflict Between Generic Type Cast and Less-Than Expressions

Example of conflict:

(T<A>) x    // casting to generic type
(T<A)       // less-than expression

As you can see there is a common prefix between the two expressions. In LR parsing common prefixes are not a problem if they involve the same terminals or nonterminals. However in this case the same terminal must be reduced to two different nonterminals, RelationalExpression versus ReferenceType, and so we have a reduce-reduce conflict.

3. Conflict Between Generic Type Cast and Relational Expressions

Example of conflict:

(T<A<B>>) x   // generic type cast
(T<A<B)        // relational expressions

It is not possible to use relational operators with boolean value expressions, in other words the second line in the above example is really semantically invalid. However the JLS still allows that syntax – the expression is just discarded during type checking. Another way of putting it is that the relational operators are associative when they could really be non-associative.

The fact that relational operators are associative causes a conflict in the above example. Even if we disregard for a while the common-prefix reduce-reduce conflict you would still get a shift-reduce problem after encountering the second less-than because the parser does not know if it should reduce the previous to a relational expression or shift to build a type argument list. This is a shift-reduce conflict.

4. Ambiguous Grammar Specification for Lambda Expressions

An example of an ambiguous string:

(T) () -> a * b

This is a lambda expression being cast to some type T. The Java Language Specification (JLS) argues for lambda expressions wanting to consume as big expressions as they can, i.e., they have lowest precedence. However in the JLS grammar you can find the following productions (simplified):

Expression -> Lambda
Cast -> "(" Type ")" Lambda

Lambda being in the Expression production makes it live in the top of the expression hierarchy, meaning that it should have the lowest precedence. However, the Cast production turns this on its head and makes the above sample expression parseable as if the expression had the following logical form:

((T) () -> a) * b

This interpretation of the example is incorrect, as the JLS points out. However, the correct interpretation is also a valid parse:

(T) (() -> a * b)

If there are multiple valid parse trees of a single expression it means that the grammar is ambiguous.

5. Conflict Between the Lambda Parameter List and Parenthesized Less-Than Expressions

Example of conflicting expressions:

(T<A> x) -> x    // lambda expression
(T<A)            // less-than expression

This is essentially the same conflict as described in 2. This conflict was introduced in Java 8, as opposed to the reduce-reduce conflict discussed in 2, which was added in Java 5.

6. Reduce-Reduce Conflict for Modifiers

In Java 8 there are new nonterminals for different kinds of modifiers, for example there exist the new nonterminals InterfaceMethodModifier and ClassModifier. Both of these kinds of modifier may exist inside an interface declaration:

interface I {
    static class C { } // static is ClassModifier
    public void m();   // public is InterfaceMethodModifier
}

The problem is that both static and public are valid productions of both these nonterminals, so there is a reduce-reduce conflict whenever the parser encounters either of these modifiers, and a few other modifiers, inside an interface body.

You can solve this reduce-reduce conflict by letting all modifiers be parsed as the same nonterminal and discard incorrect modifier uses during semantic error checking, however then you will run into a shift-reduce conflict in the switch statement, which is really interesting. This could be a topic for a whole other blog post.

7. Conflict Between Method References and Relational Expressions

Example of conflicting expressions:

f(T<A, B>::m)   // method reference as argument
f(T<A, B>  m)   // two relational expression arguments

Note that the method reference is qualified by a parameterized type. This is probably a very uncommon use of method references, nevertheless a parser must be complete and correct so it is important to take care even of these seemingly unimportant cases.

As seen in the above example there is a common prefix between the method reference and the list of relational expressions. This case is different from the case in 2 in that the list can be arbitrarily long.

8. Conflict Between Intersection Type Cast and Bitwise-And Expressions

An intersection type cast is simply a cast expression that looks like this:

(A & B) x

There can be many types in the intersection type, and these type casts conflict with parenthesized bitwise-and expressions:

(A & B & C) x   // intersection type cast
(A & B & C)     // bitwise-and

The type of the expression is determined by what follows the closing parenthesis. This is a shift-reduce conflict as the parser can not decide whether to shift an identifier onto a type list, or reduce to a bitwise-and expression.

Conclusions

This post was just a brief rundown of the parsing problems we encountered in implementing Java parsing using a generated LALR parser. The explanations were very short and I mostly skipped describing how to solve the challenges. This was done to save space and time. Nevertheless, I hope that this was somewhat useful and/or interesting.

Personally I think the parsing problems I have described here were interesting because they show how very specific and complex parsing problems can be. It is amazing to me that the lambda expression, which is syntactically very flexible, has broken so little in the Java grammar.

Sky Improvements

I’ve been meaning to improve the sky rendering in Chunky for a long time, and finally I have started to work on this project. I had first just planned on adding support for sky maps with full 180° vertical resolution, as opposed to the current 90° sky maps:

Skymap Resolution

The increased vertical resolution means that the entire sky can be textured, without mirroring the sky map at the horizon. Most scenes do need to render the sky below the horizon specifically, however it is not uncommon some part of the render shows a bit of sky mirroring happening:

Distracting sky mirroring at horizon.

The mirroring is quite distracting, and detracts from the overall quality of the render. I think supporting full 360 by 180 degree sky maps is an important improvement.

While I was adding support for more sky maps I thought it would be cool if you could use spherical sky maps, or skyboxes as well. So, skyboxes and spherical sky maps can now also be used in Chunky. Spherical sky maps are pretty cool, because it’s quite easy to make one yourself, just take a picture of a shiny metal sphere and you have a ghetto spherical sky map.

We used spherical sky maps, or light probes as they are called if they use a high dynamic range format, in the rendering course I took last semester. I thought that it would be a shame if I did not add support for HDR formats as well, so I did. Now images using the PFM or RGBE (.hdr) formats can be loaded by Chunky. This means that Chunky is now capable of image-based lighting. Image-based lighting really gives renders a much more realistic appearance. I found several free HDR sky maps online, so I’ve been testing some out for rendering:

Desert Vista

While studying the RGBE HDR image format, I also read a bit about the RADIANCE rendering system and through some links I arrived at Mark Stock’s RADIANCE page and saw that he had also implemented the Preetham et. al. sky model. I was not really satisfied with my own implementation, the one used in Chunky for the default sky, because the sky appeared too pink. My results differed slightly from the images in the paper by Preetham et. al., A Practical Analytic Model for Daylight. When I saw Mark Stock’s results, they seemed much closer to the results in the original paper, so I decided to investigate why. He provides the sources for his implementation on his website. After comparing the implementations for a while I finally arrived at what I believe is the only functional difference: he uses a different matrix to transform from the XYZ color space to RGB! I switched to using the same transformation matrix and voila! The sky looks much better, in my opinion! Here is a comparison video I made (original on the left, corrected version on the right):

Sky Color Variations from Jesper Öqvist on Vimeo.

Color Picker

This post is a bit of a rant about the current state of color pickers in different applications. It also talks a bit about a color picker I created myself, and how I decided to make it different from most other color pickers.

The Terrible JColorChooser

The default color picker interface in the Java Swing API is terrible. I have been using it in Chunky because there was really no simple alternative. Here is a screenshot of the Swing color picker:

Swing JColorChooser

Out of the three tabs in the dialog, the first one is the second least useful. I won’t even show the unspeakably bad third one. I hate the above tab because the color swatches in the palette are too small – it would have been better if that palette was just a continuous gradient of colors. I also hate that the palette has a grid in it with some kind of bevel effect – if there is some grid between the colors in a palette it should just be one solid achromatic color. With the current small color swatches the grid muddies up the perception of the individual colors. It’s a real shame that the above tab is the default one, it makes the JColorChooser much shittier than it would have been if the second one was the default tab.

So, let’s look at the second tab:

Swing JColorChooser

This second tab is better, but still bad. Here are a few problems I have with it:

  • I can not click on the hue bar. To change hue I have to either use that stupid little text field or the shitty slider that is clipped for some reason. Oddly, though, the nice big color palette to the left of the hue slider does not update while dragging the slider – only after you let go of the slider thumb. That makes it very annoying to try to find the correct slider position, it would have been so much better if the color palette updated ASAP while tweaking the hue value.
  • The RGB display. I do not care what the RGB values for the current color are! Give me the hexadecimal number for the color instead!
  • The HSL text fields are pointless. I can not think of any situation where I’d want to input a color by hue, saturation, and lightness unless I want to copy a color from another application, in which case a hexadecimal input would have been much superior. What matters here are the colors, not the numbers.
  • The color preview area at the bottom is too cluttered with stuff. It should have just been one solid swatch of color. The idea of displaying the color side-by-side with black and white for comparing brightness is good, but here it just went overboard.

The Problem

I’ve been looking at other color pickers in different applications and most of the ones I’ve seen so far are awful. The main problem is usually that the interface is very fiddly, due to too small sliders or sliders that are otherwise difficult to adjust. There seems to be some weird idea that the components in a color picker should be small and compact, but the smaller they are the worse they are to use. As a user I want big palettes that can display a color gradient over a large area so I can get a better idea of the different nuances at a glance.

Another common problem is that color pickers lack a good color preview. The preview should be one large swatch of the color, uninterrupted by other crap.

The Fix

After being irritated that I had to use JColorChooser for far too long I finally decided I would invest the effort required to rid myself of JColorChooser for good. This is what I created:

color_picker

The interface is really simple, and it has only the features that I needed, but I think it is superior to most other color choosers I have tried. What makes it so good?

  1. The sliders are nice and large, and you can click anywhere in the sliders and the thumb jumps directly there. The saturation and lightness sliders update directly whenever any one of the other sliders is changed.
  2. The preview swatch is huge. It shows the current color in all its glory.
  3. There are some smaller swatches with previously selected colors, and a series of colors generated pseudo-randomly from the current color. These give some interesting alternative colors. Again, these swatches are much larger than they would be in other color pickers.
  4. Click the big swatch when you are satisfied to close the dialog. If the dialog loses focus it is closed automatically. These are nice features which make the “Cancel” and “Done” buttons obsolete, however the buttons are still there in case you forget.

Entities

I’ve started working on entity support for Chunky. This means that Chunky will be able to render a number of different things that it can currently not render because they did not fit inside a single voxel (a single block in the Minecraft world).

From the beginning Chunky would only render perfectly cuboid objects such as stone blocks, grass blocks etc. After a while I started adding support for non-cuboid things such as fence posts, stairs, torches, etc.

Fence and stair blocks

Most non-cuboid blocks still fit inside a voxel, like the stairs and fence in the above image, so I reused the same representation for both voxel-filling cuboid blocks and non-filling blocks. Without going into too much detail, this meant that although I could render things with more interesting shapes than pure cubes, they could not extend outsize a 1x1x1 meter area. This is a problem for things such as paintings, cows, player models, beacon effects, and a bunch of other miscellaneous blocks or effects that are currently not be rendered by Chunky.

Gallery

The above image was rendered using the current partial entity support I’ve been working on. Entities support is exciting because it allows rendering a lot of things that could not previously be rendered properly! Only paintings are fully handled right now, and there’s a bit more to do before this is ready for release. I hope to release it soon.

Improving the Chunky 2D map

The past couple of days I have been polishing the 2D map rendering in Chunky. The 2D map is a feature that has been mostly neglected since I started working on the 3D rendering.

Originally Chunky was only a 2D mapping program. I created Chunky to explore my own small Minecraft worlds and look for hidden treasures near my mineshafts. The default rendering mode was the “layer” mode, in which a single layer of blocks is rendered top-down. In order to allow quickly switching between layers I designed Chunky store most of the loaded chunk data and only throw it away when out of memory. This was done to avoid reading from the hard disk whenever a new layer was loaded.

The original layer mode

The original layer mode

Nowadays Chunky is used quite differently and the requirements for the 2D map have changed. The 2D map is now mostly a tool for selecting chunks to be rendered in 3D. This means that the layer mode is seldom used — the default “surface” mode is mainly used for chunk selection. In addition much larger views have to be rendered by the 2D map because when selecting a portion of the map to render you usually want a good overview of the entire area to be rendered.

Rendering a Minecraft map covering a large area is a significant challenge, especially for interactive applications. The number of chunks that need to be loaded is immense. If the map is zoomed out as far as possible (equivalent to one chunk per pixel) Chunky, at the default window size, will display about half a million chunks. Simply loading all those chunks takes significant time even if you are loading from a fast SSD drive. The memory requirements for storing the map are also massive. Just storing color values for each block in such a map would require 480 megabytes. There are additional overheads which bring it up to about one gigabyte of data stored for just the 2D map.

The red square is 32x32 chunks (each chunk is 16x16 Minecraft blocks)

The red square is 32×32 chunks (each chunk is 16×16 Minecraft blocks)

One thing I did to improve the efficiency of the 2D map was to remove caching of chunk data. The chunk data includes information about each block in a chunk, biome metadata, block metadata etc. The memory required for the complete chunk data set is much larger than just the 2D map rendering. Removing this caching may lead to many extra disk reads when creating a 3D scene, but it noticeably improved the loading speed for the 2D map.

I also removed pre-rendering of background maps such as the layer map, biome map (seen above), or cave map. Only the currently viewed map is stored by Chunky. This means that chunks need to be re-loaded if the map mode is changed, however this happens so seldom that the memory savings are likely worth more than the extra cost of switching to another map mode.

Surface map mode

Surface map mode

In addition to removing the caching of chunk data and background maps, I improved the way the map is updated. The 2D map is interactive, meaning that it is redrawn whenever a chunk is loaded for the first time or whenever a chunk is re-loaded because someone changed it on disk. It used to be that most of the map was redrawn if something changed but now it is supposed to redraw only the region or chunk that was updated, depending on the zoom level. To test the smarter redrawing I made the map renderer highlight each pixel that is drawn with a color indicating in which rendering pass it was drawn. It looked kinda funky:

Each chunk is colored based on when it was last redrawn

Each chunk is colored based on when it was last redrawn

During this process I was able to fix some tearing issues that were caused by bugs in the previous map redrawing code.

Another not so noticeable but important feature of the 2D map is that it loads some chunks outside of the current map view, this gives a margin of already-loaded chunks around the current view which makes panning behave a lot smoother than it otherwise would.

The 2D map should be more responsive in the next release. I have implemented several smaller usability improvements in addition to the above performance improvements.

Future Work

There are still improvements that could be made to the 2D map to make it even more responsive. These have not been implemented yet, but I might return to it in the future and add some of these features:

  • Cache each region at max zoom level on disk
  • Don’t load heightmap data at max zoom level
  • Use bloom filters instead of chunk sets
  • Fine tune surface rendering mode

Stochastic Progressive Photon Mapping

The last assignment in the rendering course was to create your own scene and implement some rendering technique not covered by the previous assignments. I chose to implement depth of field because it was the simplest thing I could think of doing. My scene was a pile of LEGO bricks in different colors. In order to make the depth of field work well with my photon mapper I also implemented stochastic progressive photon mapping:

LEGO

Since the stochastic variant of my renderer re-runs the eye pass for each photon pass I implemented a more efficient BVH construction algorithm using Morton codes. The BVH construction time was reduced by an order of magnitude with the more efficient implementation!