Competitive Coding 2018-03-27

The past few weeks I have been solving problems on the online competitive coding judge Kattis. Kattis is a site where you can create an account and solve problems for points. You submit your code to Kattis, then Kattis compiles/runs it to check it against hidden test cases.

The problems vary in difficulty from trivial to very hard. I have done mostly hard ones so far, though that is not effective for gaining points. I tend to get stuck on the hard problems, and then I can’t let the problem go until I have solved it. Sometimes it takes me a few days before finally finding a solution to a hard problem. During this time it is tempting to constantly think about the problem even when I’m not actively working on it. This process often makes me feeling stupid and frustrated, though the feeling of relief when I finally solve a difficult problem is incredible and addictive.

I am not really interested in the competitive part of competitive programming. I mostly think it is fun to solve new and interesting problems. Fundamentally, I think that problem solving is what makes programming fun. By solving these problems I might also become better at solving other problems that arise in my normal programming work.

An example of a problem that I was stuck on for a long time is Around the Track, in which you have to find the shortest looping path between two polygons.

To aid in debugging my solution, I made a program that renders an image for the computed shortest path. Below is one such image (the red path is the shortest path, the inside of the blue area is the outer polygon, and the outside of the green area is the inner polygon).

No Comments on Competitive Coding
Categories: Uncategorized

LuxColorPicker: a better JavaFX color picker 2018-02-19

LuxColorPicker is a simple HSV color picker for JavaFX with a highly usable interface. The main features of LuxColorPicker are:

  • Only one color selection mode: Hue/Saturation/Value.
  • Large components improve usability.
  • Minimalistic design: no unnecessary components or text to distract users.

LuxColorPicker is easy to integrate in new JavaFX projects. The code is at GitHub.

This is a screenshot of the color picker in action:

Why LuxColorPicker?

JavaFX already has a built-in color picker component, so you might wonder why LuxColorPicker is needed. In my opinion the standard JavaFX color picker is functional but flawed: I don’t like its default color selection mode, and it has unnecessarily small components which make it fiddly to use.

If you agree with these points, then LuxColorPicker might be the right color picker for you.

Only HSV

Unlike many other color pickers, LuxColorPicker has only one color selection mode: Hue/Saturation/Value (HSV). In my opinion the HSV mode is the best selection mode, making all others useless.

Large Components

Small interface components are fiddly and less visible than large components. The smaller a component is, the longer it will take to accurately hit with the mouse (unless the component touches the edge of the screen). This is not my opinion, it’s UX science.

In LuxColorPicker, all components are oversized compared to most other color pickers. This not only makes the components easier to use, but also increases the visibility of the colors on display. In particular the hue selection bar is much larger than I have seen in other programs. I made the hue bar this large because it helps in making the available hues clearly visible.

The preview color swatch is large for the same reason: it is important to clearly see the current selected color.


There is a “Save” button to confirm the current color selection, but the user does not need to click it. LuxColorPicker is non-modal and focus-shy: simply clicking outside the color picker closes it and confirms the selection. Pressing escape cancels the current selection and closes the color picker. The buttons are there just for users who have not yet learned the optimal interactions.

History and Random Swatches

The only nonessential embellishments in LuxColorPicker are the color swatches below the preview swatch. These are the history and random swatches. The random swatches show random colors close to the current selected color, and the history swatches contain the previously selected colors.

The random colors are useful when you don’t know exactly what color you want to pick – maybe one of the random colors will look nice.

No Comments on LuxColorPicker: a better JavaFX color picker
Categories: Uncategorized

The Flattening of Minecraft 2018-02-10

Minecraft is undergoing a major internal change called The Flattening. I decided to write this overview and explanation of The Flattening because I think it is interesting from a programming and system design perspective. I hope you find it interesting, too!

Block IDs

To explain what The Flattening means, I first have to describe how the first versions of Minecraft represented blocks in the world. As you probably know, the game world in Minecraft is made up of blocks like stone, grass, and dirt. Even air is represented as a type of block.

In the beginning of Minecraft development, Notch decided to represent each type of block by integer IDs. Air was given ID zero, stone was one, grass two, dirt three, and so on. This worked great, because there were not that many different kinds of blocks. Notch decided to use a one-byte encoding for blocks, meaning that 256 different block types (including air) could exist in the game.


A chunk, in Minecraft, is a vertical slice of the gameworld. Chunks contain 16x16x256 blocks each in current Minecraft versions. Previously, chunks were only 128 blocks tall.

Blocks are stored in a chunk as an array of IDs. In Minecraft Alpha 1.1, chunks contained 32k blocks, so the total block data in a chunk took 32kB of memory. However, when stored on disk, chunks were compressed to about 5kB or less using the DEFLATE algorithm.


To draw different blocks, different textures are needed. In Minecraft Alpha 1.1, textures were stored in a texture atlas. Several textures are packed together into one large texture, and then parts of the large texture are used to draw different types of game objects.

By looking at the texture atlas, we can estimate the number of blocks types in Minecraft Alpha 1.1 to somewhere around 60. Notice that some blocks like gold blocks and chests take up multiple texture squares.

There are several advantages of using texture atlases. An incidental advantage is that you can make a texture pack for the game by just replacing one file. That’s exactly what the first Minecraft modders did to change the look of the game.

Of course, the texture atlas was not going to last. It only had room for 256 textures, and to make matters worse, many blocks needed multiple textures. Since Minecraft 1.5 the terrain.png file was completely replaced by individual texture files. By that point, Minecraft already had built-in support for loading custom texture packs (later known as resource packs).

Block Data and TileEntities

The first blocks in Minecraft only had one state each: stone was always just stone, dirt could be covered by grass but that was treated as a separate kind of block. However, Notch eventually added blocks that could have different states. For example, the “Wheat” block is a growing wheat field, with seven different states depending on how ripe the wheat is.

The wheat block could have been represented as seven different blocks: one for each growth stage. That would have exhausted many block IDs, so instead Notch added a separate array of data for the blocks in a chunk. Notch figured that 4 bits of data would be enough. It doesn’t seem like much, but that essentially adds 50% more data to the basic chunk format.

The block ID range was at some point extended with an optional 4 bits, giving 4096 total possible block types. Still all the default Minecraft blocks were represented as single bytes. I’m not sure what the extended ID range was for. It may have been for mod blocks.

Some kinds of data were not at all well suited to store in a fixed-width format. For example: signs can have custom text, armor stands can hold many combinations of items, etc. This new data was stored in something called the TileEntities structure in the chunk. TileEntities did not have a size limit: it could store as much extra information as needed.

To summarize, here is all data used to represent the blocks of a chunk in the current Minecraft release (1.12):

  • Blocks – block IDs (4096 bytes)
  • Add (optional) – additional block data (4 bits per block: 2048 bytes)
  • Data – block state information (4 bits per block: 2048 bytes)
  • TileEntities – additional block data

The block data uniquely identifies a single block, mapping to a fixed set of block IDs:

The Flattening

With all that background information I can now explain The Flattening. The main change is that the fixed block IDs are being removed. Instead, blocks will be defined by identifiers like “minecraft:stone”, “minecraft:dirt”, etc. The “miencraft:” prefix is to identify the default Minecraft blocks. With these free-form identifiers there is an unlimited amount of possible block types, and this opens up for mods to add as many block types as they wish.

The new block identifiers are augmented with state information in a format like the TileEntities structure discussed above. This state information does not have a size limit.

The chunk format was changed so that each chunk has its own block palette (technically, each 16x16x16 section has a palette). I tried to illustrate it with this drawing:

The idea with the palette is similar to that of palettes in image compression. The palette contains a fixed number of entries, which can be indexed by a fixed-size number.

If there are multiple variants of the same block type with different state, then each different set of block states is represented with a separate palette entry.

The new format is clever, because now there can exist a completely ludicrous amount of different types of blocks and state combinations in the game, but only the blocks that exist in a chunk will be stored in the palette. As far as I have seen, untouched generated chunks often need fewer than 16 different palette entries. Chunks that have player-made structures will generally use more combinations of blocks and block states.

World Size

An additional detail about the new format is that chunks with 16 or fewer kinds of blocks (including state variations) need 16 or fewer palette indexes, which can be stored in 4 bits per block. Even if the palette is larger, the palette indexes are often less than 8 bits per block (the number of bits per block is the 2-logarithm of the number of palette entries, rounded up). This in itself should save a fair bit of space. In addition, the 4 bits of block state per block was removed and the optional 4 bits of block ID data is no longer needed.

I don’t know what the total change in world size is for typical worlds, but the palette system might save some space overall.

1 Comment on The Flattening of Minecraft
Categories: Uncategorized

Rope and Ring Puzzle 2018-01-13

A few days ago I saw a puzzle that I recognized: a rope threaded through a large loop and a ring, with some large discs and beads. The puzzle is quite common; you have probably seen it. It is sometimes called the “Oliver Ring” puzzle, but I’m not sure what is the most common name or where it comes from.

I decided I wanted to have this puzzle, so I designed the parts in FreeCAD:

Then, I printed it on my 3D printer:

The puzzle worked well even though I didn’t know exactly which parts would need to fit inside the big loop for it to work. I used the image on this page as reference when designing the parts: rope puzzles. The ring part could have been much better – the first few layers of the print turned out pretty ugly. I might print a new ring, but otherwise I’m very happy with this small project.

No Comments on Rope and Ring Puzzle
Categories: Uncategorized

Concurrent Reference Attribute Grammars 2017-12-12

In my PhD studies I have been working with something called Reference Attribute Grammars. They are a formalism used for building compilers. My most recent research in this field was about parallelizing attribute computations. It was a very fun and challenging project, and we got very exciting results. I also worked on using these concurrent attributes to parallelize an existing compiler, called ExtendJ.

I presented our research on concurrent attributes at the SLE 2017 conference. The talk was recorded and is now available on YouTube if you are interested:

I am currently focusing on my PhD work and don’t have time for other projects like Chunky, unfortunately. I will try to do some maintenance on Chunky during Christmas.

No Comments on Concurrent Reference Attribute Grammars
Categories: Uncategorized

Missing JVM Optimization 2017-11-18

A professor at my department was benchmarking a small Java program to see what impact changing the type of a loop variable had. While doing this he found a case where changing the loop bound variable type had unexpected impact. My simplified version of his loop is below:

static void test() {
  int N = 10000; // long N is much slower!
  long sum = 0;
  for (int i = 0; i < N; i++) {
    sum += i;

When the professor changed the type of N from int to long, the loop ran much slower. He measured this by running the test method a million times and measuring the total execution time.

His results showed that when N had type long, the loop ran many orders of magnitude slower. That seemed very odd: N is constant during the loop, and he ran the program on a 64-bit machine so it should not cost that much to compare two 64-bit integers.

Here is the main method of the benchmark program if you want to try it for yourself:

public static void main (String args[]) {
  long start = System.currentTimeMillis();
  for (int i = 0; i < 1000000; i++) {
  long time = System.currentTimeMillis() - start;
  System.out.format("time = %d ms%n", time);

When I ran the version where N has type int, using Java 8 (1.8.0_112) on a 64-bit Intel machine with Linux, it took 5-7 milliseconds. When I ran the version where N has type long, it took 10 seconds! At this point it was pretty obvious what is going on. You may have already noticed that test does not return the sum that the loop computes. This means that the loop is dead: it could be deleted from the program without affecting the observable output of the program.

To confirm that the long N version is indeed suffering from lack of dead code elimination, I used a diagnostic JVM option to print the optimized assembly code for the test method:

java -XX:+UnlockDiagnosticVMOptions \
    -XX:CompileCommand=print,*.test TestI

This gave a lot of output. First it prints the C1 optimized version of the method twice. That’s lightly optimized bytecode according to the HotSpot glossary. Then it prints the C2 optimized version twice. When a method is run enough times, the JVM will try to optimize it using the more powerful (and slower to compile) C2 compiler. I don’t know why the disassembly is printed twice for the C1 and C2 versions.

When looking at the C2 compiled version of the TestI program (with int N), it is easy to see that the loop inside the test method is entirely eliminated:

Compiled method (c2)      84   17       4       TestI::test (25 bytes)
[Entry Point]
[Verified Entry Point]
  # {method} {0x00007fdb68fc9410} 'test' '()V' in 'TestI'
  #           [sp+0x20]  (sp of caller)
  0x00007fdb6d111840: sub    $0x18,%rsp
  0x00007fdb6d111847: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - TestI::test@-1 (line 13)
  0x00007fdb6d11184c: add    $0x10,%rsp
  0x00007fdb6d111850: pop    %rbp
  0x00007fdb6d111851: test   %eax,0x1823a7a9(%rip)        # 0x00007fdb8534c000
                                                ;   {poll_return}
  0x00007fdb6d111857: retq

The first two instructions are just setting up the stack frame, and then the next two instructions pop the stack frame. I don’t know what the purpose of the test instruction is here, but it just compares a register to some value at some specific memory address.

I ran the same command again, but using TestL (with long N). Here is the decompiled C2 code:

Compiled method (c2)      77   16       4       TestL::test (30 bytes)
[Entry Point]
[Verified Entry Point]
  # {method} {0x00007f2df15cf428} 'test' '()V' in 'TestL'
  #           [sp+0x20]  (sp of caller)
  0x00007f2df5111ba0: sub    $0x18,%rsp
  0x00007f2df5111ba7: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - TestL::test@-1 (line 13)

  0x00007f2df5111bac: xor    %r10d,%r10d
  0x00007f2df5111baf: xor    %r11d,%r11d
  0x00007f2df5111bb2: mov    $0x1,%r8d
  0x00007f2df5111bb8: jmp    0x00007f2df5111bcc
  0x00007f2df5111bba: nopw   0x0(%rax,%rax,1)
  0x00007f2df5111bc0: mov    %r8d,%r9d
  0x00007f2df5111bc3: inc    %r9d
  0x00007f2df5111bc6: mov    %r8d,%r11d
  0x00007f2df5111bc9: mov    %r9d,%r8d          ;*iinc
                                                ; - TestL::test@23 (line 15)
  0x00007f2df5111bcc: movslq %r8d,%r9
  0x00007f2df5111bcf: movslq %r11d,%r11
  0x00007f2df5111bd2: add    %r11,%r10          ; OopMap{off=53}
                                                ; - TestL::test@26 (line 15)

  0x00007f2df5111bd5: test   %eax,0x1881f425(%rip)        # 0x00007f2e0d931000
                                                ; - TestL::test@26 (line 15)
                                                ;   {poll}
  0x00007f2df5111bdb: cmp    $0x2710,%r9
  0x00007f2df5111be2: jl     0x00007f2df5111bc0  ;*ifge
                                                ; - TestL::test@14 (line 15)
  0x00007f2df5111be4: add    $0x10,%rsp
  0x00007f2df5111be8: pop    %rbp
  0x00007f2df5111be9: test   %eax,0x1881f411(%rip)        # 0x00007f2e0d931000
                                                ;   {poll_return}
  0x00007f2df5111bef: retq

You can already see that this version is doing a lot more stuff. Notice the cmp and jl instructions near the end: those are for testing the loop condition and running another iteration. This confirms that the loop has not been eliminated in the long N case.

Why is the loop not eliminated? There could be several reasons. One case where it would not be safe to delete the loop is if N was a parameter of the test method, and the loop variable had type int. In that case, the loop variable could in some cases never be larger or equal to N. It might be that the optimizer is conservatively treating all cases with a long upper bound as this potential special case. The C2 compiler does not have a lot of time to do its work, so it is possible that it’s overly conservative in this case to save time in other similar cases.

No Comments on Missing JVM Optimization
Categories: Uncategorized