Code Quarterly Challenge: Parsing with Python 1

Posted on September 16, 2010

Okay, so I’ve been messing around with a few approaches to the inaugral Code Quarterly challenge. In my previous two posts I talked a bit about my initial experiments with using co-routines which didn’t end up working out. From that experience I’ve assumed that co-routines don’t map well to recursive-descent parsing techniques. So I’ve switched over to a generator-based solution and things are looking better. I can see moving forward with my current design and have it hold up.

The one thing I am biting my nails over is the character scanner. I’ve taken a very bottom-up approach to learn as much as I can about parsing from the challenge. Obviously my first instinct was to use regular-expressions instead of attempting to match patterns by hand, but I wanted to understand the entire machine. What I’m learning is that I’m spending a lot of time getting the lexer to work and be understandable and not a lot of time parsing. I’ve read a Perl implementation that uses regular expressions and it’s really nice to read. One glance at the top of the file and you can tell what character patters the lexer will tokenize. My implementation forces you to look at the spec or dig through the code and assemble a mental model of the pattern yourself. On the upside, it’s pretty fast and I’m learning a lot.

The question remains however if I could have a working lexer by now and be working on the parser instead had I decided to use regexes instead of rolling my own character scanner.