Parsing with Python Coroutines: Fail

Posted on Sep 13, 2010

I had some time to hack on the Code Quarterly challenge last night. My initial idea to base the parsing process on coroutines failed to work out. I got as far as implementing a simple read-ahead scanner, but when I got to the lexing stage things started to get ugly. In the end I’ve decided to move away from coroutines and am having more success using generators instead.

I was having a problem consuming tokens using coroutines. As my lexer read characters from the scanner it would note when a character was part of the beginning of a token. At that point, the lexer needs to drop into an inner loop and advance through the scanner’s stream until it consumes the whole token to send off to it’s target. However, the continuation point within the lexer function would be set at the most recent yield statement. This meant that as soon as I dropped into some inner loop and called yield to get the next character from the scanner, I couldn’t break out of the loop and the lexer would end up only sending one kind of token to its target.

I had some ideas for getting it to work. The first I thought of was setting up the lexer as a trampoline for reader functions. The other idea I had was to add state to the lexer function so that I could wrap the inner loops in a while clause based on the state and break out of that to the top-level loop once the token is consumed. However, each approach proved too complicated.

I had hoped coroutines would make the process simple, but it turns out the most simple solution was plain old generators. I was just one step above the most simple solution and got stuck in wishful thinking for a while. All was not lost however as I learned quite a bit from the experience. Coroutines are good for iterative processing and with trampolines they’re good at acting like light-weight threads. However, I cannot seem to get them to work recursively. I have a sneaking suspicion that it’s still possible, but for now the easier solution is really the more simple and obvious one.