Missing JVM Optimization 2017-11-18

A professor at my department was benchmarking a small Java program to see what impact changing the type of a loop variable had. While doing this he found a case where changing the loop bound variable type had unexpected impact. My simplified version of his loop is below:

static void test() {
  int N = 10000; // long N is much slower!
  long sum = 0;
  for (int i = 0; i < N; i++) {
    sum += i;

When the professor changed the type of N from int to long, the loop ran much slower. He measured this by running the test method a million times and measuring the total execution time.

His results showed that when N had type long, the loop ran many orders of magnitude slower. That seemed very odd: N is constant during the loop, and he ran the program on a 64-bit machine so it should not cost that much to compare two 64-bit integers.

Here is the main method of the benchmark program if you want to try it for yourself:

public static void main (String args[]) {
  long start = System.currentTimeMillis();
  for (int i = 0; i < 1000000; i++) {
  long time = System.currentTimeMillis() - start;
  System.out.format("time = %d ms%n", time);

When I ran the version where N has type int, using Java 8 (1.8.0_112) on a 64-bit Intel machine with Linux, it took 5-7 milliseconds. When I ran the version where N has type long, it took 10 seconds! At this point it was pretty obvious what is going on. You may have already noticed that test does not return the sum that the loop computes. This means that the loop is dead: it could be deleted from the program without affecting the observable output of the program.

To confirm that the long N version is indeed suffering from lack of dead code elimination, I used a diagnostic JVM option to print the optimized assembly code for the test method:

java -XX:+UnlockDiagnosticVMOptions \
    -XX:CompileCommand=print,*.test TestI

This gave a lot of output. First it prints the C1 optimized version of the method twice. That’s lightly optimized bytecode according to the HotSpot glossary. Then it prints the C2 optimized version twice. When a method is run enough times, the JVM will try to optimize it using the more powerful (and slower to compile) C2 compiler. I don’t know why the disassembly is printed twice for the C1 and C2 versions.

When looking at the C2 compiled version of the TestI program (with int N), it is easy to see that the loop inside the test method is entirely eliminated:

Compiled method (c2)      84   17       4       TestI::test (25 bytes)
[Entry Point]
[Verified Entry Point]
  # {method} {0x00007fdb68fc9410} 'test' '()V' in 'TestI'
  #           [sp+0x20]  (sp of caller)
  0x00007fdb6d111840: sub    $0x18,%rsp
  0x00007fdb6d111847: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - TestI::test@-1 (line 13)
  0x00007fdb6d11184c: add    $0x10,%rsp
  0x00007fdb6d111850: pop    %rbp
  0x00007fdb6d111851: test   %eax,0x1823a7a9(%rip)        # 0x00007fdb8534c000
                                                ;   {poll_return}
  0x00007fdb6d111857: retq

The first two instructions are just setting up the stack frame, and then the next two instructions pop the stack frame. I don’t know what the purpose of the test instruction is here, but it just compares a register to some value at some specific memory address.

I ran the same command again, but using TestL (with long N). Here is the decompiled C2 code:

Compiled method (c2)      77   16       4       TestL::test (30 bytes)
[Entry Point]
[Verified Entry Point]
  # {method} {0x00007f2df15cf428} 'test' '()V' in 'TestL'
  #           [sp+0x20]  (sp of caller)
  0x00007f2df5111ba0: sub    $0x18,%rsp
  0x00007f2df5111ba7: mov    %rbp,0x10(%rsp)    ;*synchronization entry
                                                ; - TestL::test@-1 (line 13)

  0x00007f2df5111bac: xor    %r10d,%r10d
  0x00007f2df5111baf: xor    %r11d,%r11d
  0x00007f2df5111bb2: mov    $0x1,%r8d
  0x00007f2df5111bb8: jmp    0x00007f2df5111bcc
  0x00007f2df5111bba: nopw   0x0(%rax,%rax,1)
  0x00007f2df5111bc0: mov    %r8d,%r9d
  0x00007f2df5111bc3: inc    %r9d
  0x00007f2df5111bc6: mov    %r8d,%r11d
  0x00007f2df5111bc9: mov    %r9d,%r8d          ;*iinc
                                                ; - TestL::test@23 (line 15)
  0x00007f2df5111bcc: movslq %r8d,%r9
  0x00007f2df5111bcf: movslq %r11d,%r11
  0x00007f2df5111bd2: add    %r11,%r10          ; OopMap{off=53}
                                                ; - TestL::test@26 (line 15)

  0x00007f2df5111bd5: test   %eax,0x1881f425(%rip)        # 0x00007f2e0d931000
                                                ; - TestL::test@26 (line 15)
                                                ;   {poll}
  0x00007f2df5111bdb: cmp    $0x2710,%r9
  0x00007f2df5111be2: jl     0x00007f2df5111bc0  ;*ifge
                                                ; - TestL::test@14 (line 15)
  0x00007f2df5111be4: add    $0x10,%rsp
  0x00007f2df5111be8: pop    %rbp
  0x00007f2df5111be9: test   %eax,0x1881f411(%rip)        # 0x00007f2e0d931000
                                                ;   {poll_return}
  0x00007f2df5111bef: retq

You can already see that this version is doing a lot more stuff. Notice the cmp and jl instructions near the end: those are for testing the loop condition and running another iteration. This confirms that the loop has not been eliminated in the long N case.

Why is the loop not eliminated? There could be several reasons. One case where it would not be safe to delete the loop is if N was a parameter of the test method, and the loop variable had type int. In that case, the loop variable could in some cases never be larger or equal to N. It might be that the optimizer is conservatively treating all cases with a long upper bound as this potential special case. The C2 compiler does not have a lot of time to do its work, so it is possible that it’s overly conservative in this case to save time in other similar cases.

No Comments on Missing JVM Optimization
Categories: Uncategorized

Waterbot 2.0 2017-10-06

In 2016 I built an automatic watering system, to keep my plants alive for a couple of months. I used an oversized water tank because I was unsure how much water would be needed for two months when nobody would be in the apartment. It turned out that the water tank would have lasted for over a year, based on how much water was used during the summer.


The watering system took too much space on my living room floor, so I eventually dismantled it and rebuilt it as a much smaller system that could fit in the window sill. I changed the pump to a much smaller pump, and the water tank was replaced by a 1 liter ice cream box. The electronics still took up too much space. Here are the components the system used:

  • Arduino Uno to generate the pump control signal and measure soil moisture.
  • Breadboard to connect moisture sensors to the Arduino.
  • Prototype board for pump power switching.
  • Raspberry Pi 2B powering the Arduino, and providing remote login shell.
  • Two power plugs to power the Raspberry Pi and the water pump.

The moisture sensors were fun to play with but in practice they were not that useful, so I removed them. That removed the need for analog signal input which meant I could remove the Arduino and just use the Raspberry Pi to generate the pump control signal.

I bought a cheap buck converter so I could power the Raspberry Pi with the same power source as the water pump. Now I only need a single 12V supply to power the system. I redesigned the pump control board to integrate the converter to the same board, and the 5V output for the Raspberry Pi is connected using a scrap Micro USB connector.

With the new electronics design I could fit everything into a small plastic box. The water pump is hot glued to the same box, and power is connected by an XT60 connector so that I can easily disconnect and thread the power cable through obstacles if needed.

Water Schedule

Watering is scheduled with cron:

$ crontab -l
# Water for 20 seconds each day at 17:22:
22 17 * * * /home/pi/git/water/water 20

The control software is a very simple C program that controls a GPIO pin on the Raspberry Pi.

Raspberry Pi GPIO

The control signal for the pump is output via a GPIO pin on the Raspberry Pi. In recent Raspbian versions you can access the GPIO pins via the Linux file tree, so I wrote a small C program to interface with the GPIO pins. The code is very simple, but you can make it even simpler if you use a library like wiringPi. I wrote my own GPIO code for fun. You can reuse my code freely if you want to. Here is the full source:

No Comments on Waterbot 2.0
Categories: Uncategorized

Nixie Clock Project 2017-09-24

I have had a few sets of Nixie tubes lying around for a while, waiting for a use. I bought them on ebay with the plan of eventually building a clock out of them. I had entertained the idea of designing my own Nixie clock PCB, but I finally decided to buy a kit instead. I purchased the Nixie Clock QTC IN-14 kit from PV Electronics. It was fun to solder the kit and the finished clock is beautiful. I love the look of Nixie tubes – they have a very nice warm glow.

Here is a time-lapse video I made of me soldering the kit:

I drew the clock case in FreeCAD. The sides of the case were cut from acrylic sheets and the corner brackets were printed on my 3D printer.

No Comments on Nixie Clock Project
Categories: Uncategorized

Making Tools 2017-09-18

In The Mythical Man-Month, Fred Brooks describes his vision of an ideal software engineering team, called The Surgical Team. In this team, there is a dedicated toolmaker to help the lead programmer/system architect. The Surgical Team model hardly exists in practice nowadays. Instead, a software engineer is expected to fill multiple roles, including toolmaking. In my opinion, toolmaking is an essential skill for any productive programmer.

In my experience, it feels very fulfilling to building a tool to improve my workflow or solve a specific problem or need. I love programming because I love to solve problems, and building small tools is like an easier kind of programming with faster feedback. You also avoid a lot of the bureaucracy that comes with building a full application: writing tests, issue tracking, writing documentation, building releases, and so on. These things can be added later if the tool will be shared with others. However, I often make tools just for myself, and that simplifies the development process and makes it much more fast and fun.

When I think about programmer tools, I group them into three categories: simple one-off scripts, complex command-line applications, and GUI tools.

One-off Scripts

The most common type of tool that programmers build are small single-purpose scripts to perform a simple but repetitive task. Typically these small scripts are used to perform file-oriented tasks like batch renaming of files, uploading files to a server, or parsing data from a file and converting it into some other format.

Terminal emulators have simple scripting functions built-in so that you can compose scripts in a single command line. These are known as one-liners, and they often appear in Stack Overflow answers and tweets. There is even a website dedicated to Bash one-liners.

Simple scripts are great for automation. By using cron, you can make certain things happen at regular intervals. For example, I use cron scripts to regularly update websites when I commit to certain repositories. I also use cron to water my plants.

Command-line Tools

The natural evolution from a one-off script is to make a more complicated command-line utility. One of the main distinguishing factors is that command-line tools usually take input arguments, whereas a script usually does not.

A command-line tool may be a more reusable version of a script, that fits a more general purpose. In other cases, a command-line tool is used to solve more complex tasks that require much more code.

Interpreted languages like Python are popular for making command-line tools because they enable fast prototyping. With Python, you can test your tool without first compiling it. Python also has a REPL that allows testing small pieces of code quickly.

Here are some examples of tasks I have solved by making command-line tools:

  • Converting Markdown to HTML with additional template expansions.
  • Automated benchmarking for various applications.
  • Software release automation.

GUI Tools

Command-line tools can be very useful, but they are not ideal for data entry. Data can be provided to a tool via data files in a readable format, like JSON. However, for a tool that requires lots of user input it is often better to build a GUI for the tool. This enables modifying the input without restarting the tool.

It is not easy to build a GUI, especially if you are not familiar with a GUI toolkit. It may pay off in the long run to learn a GUI toolkit just to be able to build interactive tools for yourself. Simple web applications can work as a great platform to make tools with GUIs.

I often use JavaFx to build tools that require editable input. It is not the easiest toolkit to learn, but I use JavaFx in Chunky so I didn’t learn it specifically to make tools for myself. It works well for me because I prefer to use Java for most of my projects. Building tools in the same language makes it easy to integrate with the rest of the project.

I’ll show some examples of tools I made using JavaFx. The first one is a profiling tool:

This tool collects execution traces from a network socket. The data is displayed in interactive graphs, so that I can zoom in on interesting regions. The tracing data comes from applications that I want to profile. The tool helps me identify which parts of the applications take most time. The graphs give a great overview of the runtime behavior of an application.

The most recent GUI tool I made was a 3D model editor. I built this for Chunky development. In Chunky, all block models are made up of quadrilateral surfaces that are coded manually in Java. The spacial and texture coordinates of each surface must be precisely specified for it to render correctly. This works fine for simple models, but it becomes cumbersome for more complex models. For example, this is part of the code for the lever model:

The first, probably obvious, problem of using this format is figuring out which coordinates to plug in to the code. The second, and less obvious, problem is that I often have to restart Chunky to see what effect a change has. This takes a long time and is especially annoying when making tiny changes. It’s easy to get something wrong and have to tweak it multiple times, causing multiple restarts, to fix a problem.

To improve rapid prototyping I made a render testing tool that makes it easier to construct these 3D models in Java. The tool was written so that it could efficiently use hot code reloading in IntelliJ to reload code changes in a running instance of the tool. This helps a lot in building new models, but it was still very tedious to define 3D models manually in Java code. I usually resorted to drawing models on paper first to figure out the coordinates I needed.

Recently I was working on adding armor stand rendering, and I decided to build an interactive model editor to help with the modeling process. Here is a screenshot from the current version of the model editor:

The model editor is a great help in building new models, because I can immediately see the spacial relation between blocks and it takes away much of the tedium of manually defining quadrilaterals. The geometry in the editor is block-based, so the six individual faces of each block are automatically generated based on the parameters of one single block. Texturing the faces is also significantly easier using the tool, since I can directly see if any texture coordinate is incorrect.

No Comments on Making Tools
Categories: Uncategorized

Learning Coq 2017-06-29

I recently read the free book Software Foundations by Benjamin Pierce. The book starts by introducing basic proof theory and logic. In later chapters programming language topics like Hoare logic and type systems are taught. The Coq Proof Assistant is used in all examples and exercises in the book. In fact, the whole book consists entirely of Coq scripts written in a literate programming style.

Part of my motivation for reading the book was to learn how to use Coq. By solving all the examples I now feel pretty confident in using Coq to construct proofs for simple theorems.

Since this was the first time I have used a proof assistant, I thought I would share my thoughts about learning Coq.

Using Coq

It took me a long time to learn how to use Coq efficiently. The exercises in Software Foundations do a great job of introducing one small concept at a time and then practicing it in many different ways. The exercises do take a long time to solve, but I thought it was very fun to solve them. Each exercise consists of proving a theorem. It feels extremely rewarding to finally complete a difficult proof that took a long time to solve.

I got addicted to solving Coq exercises. Some exercises took me more than a week to solve. Some exercises kept me up late at night, because I could not stop thinking about them. To me it seemed like solving Coq exercises was very similar to solving puzzles in The Witness. The kick I got out of solving an exercise was very similar to the kick I got out of solving puzzles in The Witness, or other puzzle games.

The term “Proof Assistant” is deceptive. You get no assistance in figuring out how to prove properties by using Coq, however the tool assists you in checking that the proof is correct. This can be a huge help because you can be sure that you have not made a mistake, that the proof really proves what the theorem states. When I write an informal proof I always worry that I might have made a simple mistake that is not apparent to myself on inspection. This worry is erased almost entirely when using Coq. Instead, you only really have to check the theorem to see that the property being proven is actually the one you wanted to prove.

When I got deep into working on a proof, I sometimes lost track of what I was trying to prove, and how the particular subproof I was working on related to the overall theorem. It was very useful to sometimes take a break and try to think through everything from the high-level theorem again. Sometimes it was very useful to illustrate small examples on paper.

One of the exercises in Software Foundations that I found most difficult was the exercise to prove the pigeonhole principle. I was very happy with my proof when I finally completed it, so I tidied it up and published the proof on GitHub. My solution does not use the principle of the excluded middle, which is much more difficult than the version you have to prove in the original exercise.

Functional Programming

Part of learning Coq is to learn the programming language used to write proofs. The language, Gallina, is a very advanced dependently typed functional programming language. Most concepts are familiar if you have already programmed in a functional language, which I had. The part I had most difficulty with was the way inductively defined types are declared and used in induction and case analysis.

Some aspects of functional programming can be very difficult to work with if you are mainly used to imperative programming languages. The exercises on church numerals were the most difficult for me.

Proof Theory

I was not confident in my ability to write mathematical proofs before reading Software Foundations. I am now much more confident that I can prove simple theorems. I think this is mainly due to working through the exercises in the first six chapters of the book. These chapters introduce general techniques used to construct proofs.

The most important concepts in mathematical proofs are direct proof, case analysis, and induction. I had learned a bit about these concepts from my undergraduate studies, but I developed a much more systematic way of using them by working through the exercises in the intro chapters of Software Foundations.

Type Systems

The later parts of the book become very advanced, but it was easier to work through the exercises after becoming familiar with Coq.

After the introduction to proof theory, logic, and functional programming, Software Foundations continues on to Hoare logic and type systems. Some of the exercises are about proving that simple programs work according to some specification. A system for constructing such proofs in a systematic manner is introduced in the Hoare logic chapters. The chapters on type systems are all about proving that a well-typed program always makes progress (does not crash).

This was the first time I have read about type systems and Hoare logic. The book seemed to give a good introduction to these topics. I really enjoyed the exercises that were about constructing correctness proofs for simple programs.

The best thing about the book is that all the exercises can be run in interactive mode in the Coq IDE, to see how everything works. This made it fun to read the chapters about type systems.

1 Comment on Learning Coq
Categories: Uncategorized

Uninteresting Code Metrics 2017-05-25

I was browsing the GitHub repository rapid7/metasploit-framework and noticed that the README displayed a “code climate” badge. The badge displayed a number of unknown significance: 2.7.

It was clear from the name that the badge had something to do with static analysis. Clicking on the badge leads to this page:

Yikes! 3694 seems like a pretty large issue count. It leads me to believe that the developers either don’t care about their Code Climate score (but then why would they display the badge in the README?), or that they don’t find the Code Climate issues useful.

During my first internship at Google I worked on the static analysis team. One thing I learned from the team was that static analysis findings are only useful if they are actionable and have a low false positive rate. This paper from the team lists these as the first two requirements for selecting static analyses:

I looked at the issues Code Climate reported for the repository in question, and it seems Code Climate is reporting mostly non-actionable issues regarding code complexity. The other type of issues are duplicated code. I looked at several of the reported duplicated code fragments, and they were all pretty simple code blocks with no more than five lines, and most had only two occurrences. The duplicated code issues seemed nitpicky.

Complexity Score

Code complexity metrics (including line counts) are not useful for fixing real problems in code. If you try to enforce some maximum complexity you will end up with pointless rewrites just to try to fix the complexity score. This in itself can lead to needless bugs being introduced from trying to “fix” your complexity issues.

Another issue is that the complexity score in itself is entirely arbitrary. What does it mean that a method has a complexity of 300? Is that too much? Where do you draw the line? If you make the computation of the complexity score more sophisticated, it just gets more difficult to figure out what part to change to minimize the score.


Sometimes it can be useful to find that some replicated piece of code can be refactored into a common method. However, this often leads to introducing a more abstract method that handles more cases, increasing your code complexity. Often, a new parameter is added to the method, and some conditional statements as well. Consider for example Figure 1 from Narasimhan (2015):

In my opinion the above refactoring is not useful. The more general method will run slower than the original method due to the new conditional branch, unless it is inlined by the compiler. The more general method is also harder to read and has introduced a boolean trap.

No Comments on Uninteresting Code Metrics
Categories: Uncategorized