Run Length Encoding in Lisp

Posted on May 23, 2011

Run length encoding in Lisp is kind of a trivial problem. Not one that would typically come up in most applications. I came across a treatment while reading the third chapter of ANSI Common Lisp by Paul Graham. I’ve read The Little Schemer and a friend just handed me a copy of ANSI Common Lisp. My style has been greatly influenced by The Little Schemer so when I encountered Paul Graham’s example, my nose crinkled a little and I wondered if I could write a more concise solution. I thought it would be a simple kata.

The “problem” goes like this. You have a list like:

(1 1 1 0 1 0 0 0)

And you want a more concise representation. Paul Graham suggests:

((3 1) 0 1 (3 0))

For example. Here’s Paul Graham’s solution. I hope I understand fair-use correctly:

(defun ansi-compress (lst)                                                           
  (if (consp lst)                                                                    
      (compr (car lst) 1 (cdr lst))                                                  
      lst))                                                                          

(defun compr (elt n lst)                                                             
  (if (null lst)                                                                     
      (list (n-elts elt n))                                                          
      (let ((next (car lst)))                                                        
        (if (eql next elt)                                                           
            (compr elt (1+ n) (cdr lst))                                             
            (cons (n-elts elt n)                                                     
                  (compr next 1 (cdr lst)))))))                                      

(defun n-elts (elt n)                                                                
  (if (> n 1)                                                                        
      (list n elt)                                                                   
      elt)) 

And here is my attempt at a more “Little Schemer” style solution:

(defun compress-b (p c lat)                                                          
  (cond ((null lat) (cons (cons c p) '()))                                           
        ((equal (first lat) p)                                                       
         (compress-b (first lat) (1+ c) (rest lat)))                                 
        (t (cons (cons c p) (compress-b (first lat) 1 (rest lat))))))                

(defun compress (lat)                                                                
  (cond ((null lat) '())                                                             
        (t (compress-b (car lat) 1 (cdr lat)))))

As you can see, it’s a few lines shorter. It’s also in many ways, functionally identical. However, my return data structure is a little different. Even if there is a single element, I return a cons. With a little extra code I could probably return an identical data structure. But it will require some way of emulating “n-elts.” However I think I like the uniformity, even if it trades off a tiny, tiny but of space. My function will always return a list of cons cells unless the input list is empty (in which case it will return an empty list).

Of course I’m still learning. I did a trace of the “compress-b” and “compr” functions to test my assumptions. When I first expanded the calls to “compress-b” I was slightly alarmed by the number of returns on the tail end of the function. However, I see that Paul Graham’s solution is the same. The question I have left is whether this is normal and desirable behavior. It seems to me that a single return of the final “chunk” would be better. If there are any Lisp experts in the crowd, would you care to enlighten me?