LR Parser Generator 2014-10-11

A few weeks ago I decided that I wanted to write a parser generator. This is one of those projects I’ve wanted to do for a long time but never got started on. When I finally started I decided that it would be fun to learn Scala at the same time, so I wrote the parser generator in Scala.

One of the things that motivated me to get started writing a parser generator is some ideas I have about how parser generator diagnostic messages can be improved. Currently, LR parser generators provide only the most limited information about a shift-reduce or reduce-reduce conflict. For example, the Beaver parser generator can give you something like this:

Warning: Resolved Shift-Reduce conflict by selecting (PLUS: SHIFT; goto 3) over (PLUS: REDUCE E = E PLUS E) using precedence.

It can take a lot of time, ever for an expert, to figure out what the problem is and how to fix the grammar to remove the conflict. I think this can be significantly improved.

Currently my parser generator only generates an LR(0) parser, though my goal is to make it generate LR(1) parsers. The Scala code became surprisingly compact. For example, here is the main parse loop of the parser after the action and goto tables have been generated:

def parse(in: LookaheadBuffer[Symbol], goal: ItemSet) {
  val state = new scala.collection.mutable.Stack[ItemSet]
  val stack = new scala.collection.mutable.Stack[ParseTree]
  state push goal
  while (true) {
    while (reduceRules contains state.top) {
      val rule = reduceRules(state.top)
      val lhs = rule.lhs
      var children = List[ParseTree]()
      for (i <- 1 to rule.rhs.size) {
        state.pop
        children = children :+ stack.pop
      }
      stack push new ParseTree(lhs, children)
      state push map(state.top)(lhs)// goto
    }
    val sym = if (!in.eof) in.pop else EOFToken
    if (sym != WhitespaceToken) {// skip whitespace
      val action = actions(state.top)(sym)
      action match {
        case Accept =>
          println("accept")
          stack.pop.printTree
          return
        case shift: Shift =>
          stack push new ParseTree(sym)
          state push shift.next
      }
    }
  }
}

I wouldn’t be surprised if there was some way to shave ten more lines off that loop.

Categories: Programming Scala

Leave a Reply

Your email address will not be published. Required fields are marked *