Here are two talks I presented at the Brisbane Functional Programming Group on 20 October 2009.
Just a short talk to note some of the themes and highlights of the International Conference on Functional Programming, 2009, with some links to further resources.
Introduction to functional data structures. The main points I wanted to convey here are:
I spent too much time talking about folds, and so I didn't get through all the material I had prepared. (Folds are somewhat tangential to the topic, but I think that understanding folds is important for developing a deeper appreciation of recursive data structures). The slides are all there, though they may be difficult to follow without the intended commentary. I'm happy to discuss any of the material, including the stuff I didn't get to talk about. Following are some further notes on the material presented.
At slide 24 (real-time queues), I had a mental blank on the purpose of the schedule in this structure. Thanks to Chris Okasaki's excellent book (and to Brad for lending me his copy), I can now explain how it works.
A real-time queue is a data structure which has worst-case constant-time (O(1)) enqueue and dequeue operations. This means that there is some constant t, so that every individual operation on the queue takes time less than t, regardless of the number of items in the queue.
enqueue :: Queue a -> a -> Queue a
dequeue :: Queue a -> Maybe (a, Queue a)
The underlying structure of the implementation consists of three lists: a forward list (f), a reverse list (r) and a schedule (s). The abstract contents of the queue consist of the forward list followed by the reverse list backwards (f ++ reverse r). The schedule does not form part of the abstract state, and is only used to ensure the worst-case constant-time operation of the queue.
data Queue a = Q [a] [a] [a]
Items can be enqueued in constant time by prepending to the reverse list, and can be dequeued in constant time by taking them from the front of the forward list. The main problem is finding a suitable method to migrate items from the reverse list to the forward list. We cannot use reverse, since it takes time proportional to the length of its input list before it yields any results. (Okasaki describes reverse as a monolithic function). Instead, Okasaki combines reversal and concatenation into a single function rotate, which performs its work incrementally.
rotate [] [y] ac = y:ac
rotate (x:xs) (y:ys) ac = x : rotate xs ys (y:ac)
There are several things to note about rotate. First, it is only well-defined when the length of the second parameter (ys) is exactly one greater than the length of the first parameter (xs). Later, we'll show that we only call rotate when this condition is satisfied.
Second, note that when it satisfies the above condition, it also satisfies the following equation, which makes it useful for implementing this queue structure:
rotate f r [] == f ++ reverse r
The third parameter (ac) is an accumulating parameter, used to build a reversed version of the second parameter (ys).
Third, rotate is lazy. (Remember, Haskell is a non-strict, or lazy language). That is, rotate can produce elements one step at a time, without ever having to perform a monolithic reverse. Note that for every item extracted from the first parameter (xs), rotate moves a single item from the second parameter (ys) to the accumulating parameter (ac). This means that rotate takes constant time to produce each element, if we assume that the arguments to rotate were already fully evaluated at the time rotate was called. Later, we'll show that this assumption is true for this queue implementation.
The remainder of the Queue implementation is as follows:
exec f r [] = let { fr = rotate f r [] } in Q fr [] fr
exec f r (x:s) = Q f r s
enqueue (Q f r s) x = exec f (x:r) s
dequeue (Q [] r s) = Nothing
dequeue (Q (x:f) r s) = Just (x, exec f r s)
We maintain an invariant on the lengths of f, r and s, to ensure correct operation: |f| = |r| + |s|.
In particular, if the forward list (f) is empty, then the reverse list (r) and schedule (s) must also be empty. This means that dequeue need only check the forward list to determine whether the queue is empty.
The purpose of the helper function exec is to ensure that the various conditions mentioned previously are satisfied. To do its job, it expects to be called with three parameters, which correspond to the three components of the Queue structure, but which must satisfy a slightly different precondition: |f| + 1 = |r| + |s|. Intuitively, exec creates and executes the schedule, which acts as a pointer into the unevaluated part of the forward list.
If the third argument to exec (s) is empty, then |f| + 1 = |r|, so it is necessary to perform a rotation to restore the length invariant. The length invariant is satisfied by installing the result of rotate in both the forward list and the schedule. Note that in this case, the precondition to exec also satisfies the first precondition to rotate, that the length of its second argument (ys) must be one greater than the length of the first (xs). If the third argument to exec is not empty, the length invariant is restored by simply dropping a single item from the schedule. In both cases, it is easy to see that exec executes in constant time, if we recall that rotate is lazy.
Looking at the definitions of enqueue and dequeue, we can see that the length invariant implies that the precondition to exec is satisfied, since we only call exec after either adding a single item to the reverse list (r) or taking a single item from the forward list (f).
It only remains to be shown that the arguments to rotate are fully evaluated at the time rotate is called. Since rotate is the only part of the implementation which is lazy, we just need to show that the previous call to rotate has been fully evaluated by the time the next call is made. This follows from the fact that exec only calls rotate when the previous schedule (which was the result of the previous call to rotate) has been fully traversed.
During the talk (slide 20), I showed on the whiteboard how to derive an efficient implementation of depth-first in-order traversal of a binary tree from an inefficient implementation. This is a good example of equational reasoning, so I'll reproduce it here.
First, the definition of the Tree structure and its foldTree function:
data Tree a = Leaf | Branch a (Tree a) (Tree a)
foldTree :: (a -> b -> b -> b) -> b -> Tree a -> b
foldTree f z = fold where
fold Leaf = z
fold (Branch x l r) = f x (fold l) (fold r)
Following is an inefficient (O(n2)) implementation of a depth-first in-order traversal. It is inefficient because lists of items from left subtrees are repeatedly concatenated as the traversal proceeds up the tree.
inorder :: Tree a -> [a]
inorder = foldTree (\x l r -> l ++ [x] ++ r) []
We want to obtain an efficient version of this function. We begin by defining a different but related function:
inapp :: Tree a -> [a] -> [a]
The property we want inapp to satisfy is as follows:
inapp t k == inorder t ++ k
Then, once we've defined inapp, we can obtain the new version of inorder as follows:
inorder t = inapp t []
At this point, it's hard to see why we've begun this way, but it will become clear before too long. For the moment, just accept that it's a standard trick that needs to be learned (Google for difference lists).
The property we specified is enough to define the implementation of inapp. As usual, we proceed by cases:
inapp Leaf k
= inorder Leaf ++ k
= [] ++ k
= k
inapp (Branch x l r) k
= inorder (Branch x l r) ++ k
= (inorder l ++ [x] ++ inorder r) ++ k
= inorder l ++ ([x] ++ (inorder r ++ k))
= inorder l ++ ([x] ++ (inapp r k))
= inorder l ++ (x : inapp r k)
= inapp l (x : inapp r k)
In these derivations, I've made use of the original (inefficient) implementation of inorder, the defining property of inapp, and various properties of list concenataion, most notably that list concatenation is associative. Indeed, it is the reorganisation of the grouping of the list concatenations which will make our new implementation of inorder efficient. Also, we can now see why we added the k parameter to our specification of inapp: it allowed us to begin the process of elimiating all of the list concatenations from our definition of inapp.
We now express inapp in terms of foldTree. First, see that we can write inapp in a form that closely resembles the definition of foldTree:
inapp Leaf = id
inapp (Branch x l r) = \k -> inapp l (x : inapp r k)
With a bit more refactoring, it is an exact match, but with specialized versions of f and z:
inapp Leaf = id
inapp (Branch x l r) = (\x l r k -> l (x : r k)) x (inapp l) (inapp r)
Thus:
inapp = foldTree (\x l r k -> l (x : r k)) id
And using the original defining property of inapp:
inorder = foldTree (\x l r k -> l (x : r k)) id []
One can use a similar derivation to express foldl on lists in terms of foldr.