<?xml version='1.0' encoding='UTF-8'?><?xml-stylesheet href="http://www.blogger.com/styles/atom.css" type="text/css"?><feed xmlns='http://www.w3.org/2005/Atom' xmlns:openSearch='http://a9.com/-/spec/opensearchrss/1.0/' xmlns:georss='http://www.georss.org/georss' xmlns:gd='http://schemas.google.com/g/2005' xmlns:thr='http://purl.org/syndication/thread/1.0'><id>tag:blogger.com,1999:blog-35609023</id><updated>2012-01-31T00:40:33.914+01:00</updated><category term='Haskell'/><category term='darcs'/><category term='Tutorial'/><category term='git'/><category term='Lisp'/><title type='text'>nominolo's Blog</title><subtitle type='html'></subtitle><link rel='http://schemas.google.com/g/2005#feed' type='application/atom+xml' href='http://nominolo.blogspot.com/feeds/posts/default'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default?max-results=100'/><link rel='alternate' type='text/html' href='http://nominolo.blogspot.com/'/><link rel='hub' href='http://pubsubhubbub.appspot.com/'/><author><name>Thomas Schilling</name><uri>http://www.blogger.com/profile/04274984206279511399</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><generator version='7.00' uri='http://www.blogger.com'>Blogger</generator><openSearch:totalResults>11</openSearch:totalResults><openSearch:startIndex>1</openSearch:startIndex><openSearch:itemsPerPage>100</openSearch:itemsPerPage><entry><id>tag:blogger.com,1999:blog-35609023.post-4138604717056548410</id><published>2010-04-30T17:25:00.005+02:00</published><updated>2010-04-30T18:07:51.894+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Haskell'/><title type='text'>Haskell Tip: Redirect stdout in Haskell</title><content type='html'>Have you ever wanted to make sure that a call to a library cannot print anything to &lt;code&gt;stdout&lt;/code&gt;?  The following does this except that it redirects stdout globally and not just across a library call.  This should be doable, but I haven't needed it yet.

&lt;pre&gt;import GHC.IO.Handle   -- yes, it's GHC-specific
import System.IO

main = do
  stdout_excl &lt;- hDuplicate stdout
  hDuplicateTo stderr stdout  -- redirect stdout to stderr
  
  putStrLn "Hello stderr" -- will print to stderr
  hPutStrLn stdout_excl "Hello stdout" -- prints to stdout&lt;/pre&gt;

The above code first creates a new handle to the standard output resource using &lt;code&gt;hDuplicate&lt;/code&gt;.  The call to &lt;code&gt;hDuplicateTo&lt;/code&gt; redirects any output to the &lt;em&gt;Haskell&lt;/em&gt; handle &lt;code&gt;stdout&lt;/code&gt; to go to the handle &lt;code&gt;stderr&lt;/code&gt;.  The Haskell handle &lt;code&gt;stdout_excl&lt;/code&gt; is now our only handle to the standard output resource.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35609023-4138604717056548410?l=nominolo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nominolo.blogspot.com/feeds/4138604717056548410/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35609023&amp;postID=4138604717056548410' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/4138604717056548410'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/4138604717056548410'/><link rel='alternate' type='text/html' href='http://nominolo.blogspot.com/2010/04/haskell-tip-redirect-stdout-in-haskell.html' title='Haskell Tip: Redirect &lt;em&gt;stdout&lt;/em&gt; in Haskell'/><author><name>Thomas Schilling</name><uri>http://www.blogger.com/profile/04274984206279511399</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35609023.post-2474098222229030274</id><published>2008-05-12T08:22:00.009+02:00</published><updated>2008-05-13T12:34:06.896+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='darcs'/><category scheme='http://www.blogger.com/atom/ns#' term='git'/><title type='text'>The Thing That Should Not Be (Or: How to import 18500+ patches from Darcs into Git in less than three days)</title><content type='html'>&lt;p&gt;I like &lt;a href="http://darcs.net/"&gt;Darcs&lt;/a&gt;.  Really.  It is easy to learn and use and for smallish projects I never had any real problems.  Unfortunately, it still has some performance problems and it is likely that some operations will never be fast.&lt;/p&gt;

&lt;p&gt;An extreme example of where you run into those problems is the &lt;a href="http://haskell.org/ghc/"&gt;GHC&lt;/a&gt; repository.  It consists of over 18500 patches and spans over 12 years of history.  When I tried to build the latest version I ran into a linker error which I know I didn't get with the snapshot from one month ago.  As GHC builds take quite a while I wanted to use an efficient way to find which exact change introduced the problem.  More precisely I wanted &lt;code&gt;git bisect&lt;/code&gt;.&lt;/p&gt;

&lt;p&gt;I know that &lt;a href="http://www.cse.unsw.edu.au/~dons/"&gt;Don&lt;/a&gt; had converted Darcs repositories to Git in order to get &lt;a href="http://www.ohloh.net/"&gt;ohloh&lt;/a&gt; statistics, but he reported that this process was rather painful.  It took four weeks(!) to convert the GHC repository.&lt;/p&gt;

&lt;p&gt;So, I looked what tools were out there, and how to improve them.  I know that there is &lt;a href="http://progetti.arstecnica.it/tailor"&gt;Tailor&lt;/a&gt;, but I looked at &lt;a href="http://www.sanityinc.com/articles/converting-darcs-repositories-to-git"&gt;darcs-to-git&lt;/a&gt; by Steve Purcell first and found it very hackable.  I didn't like that it saved the Darcs patch ID in the Git commit message, so I changed that and I extended it to properly translate Darcs escape sequences.  I also added a parameter to only pull a number of patches at the same time, so that I can import a big repository in stages and I allowed custom mapping from committer names to other committer names.  I used this to map various pseudonyms to (a unique) full name and email address.  (I hope no one minds being credited with his or her full name. ;) )&lt;/p&gt;

&lt;p&gt;It worked rather well for smallish repositories (a bit less than 2000 patches) but I had serious problems to get it to work with GHC.&lt;/p&gt;

&lt;ul&gt;
&lt;li&gt;Darcs has a bug on case-insensitive volumes (which OS X uses by default), so Steve suggested using a case-sensitive sparse image.  This works, but it is probably a bit slower.  I tried running it on my FreeBSD home server, but it has only 256 MB of RAM (usually fine for a home file server) so Darcs ran out of space and eventually got killed by the OS.  (Getting Darcs to compile on my server was an adventure in itself--first a few hours to update the ports tree, then one more to compile GHC 6.8 which then just failed to install...)  Fortunately, my Laptop has 2 GB, so it works fine there.
&lt;/li&gt;
&lt;li&gt;At startup darcs-to-git reads the full Darcs patch inventory.  For such a big repo as GHC this takes over a minute (and lots of RAM).  Caching it in a file didn't seem to help much.  I could have lived with that, but there was a more serious problem:  the approach used by darcs-to-git (and, it seems, also by Tailor) doesn't work!
&lt;/li&gt;
&lt;li&gt;darcs-to-git pulls one patch at a time by giving it's ID to &lt;code&gt;darcs pull --match 'hash ...id...'&lt;/code&gt;, then &lt;code&gt;git add&lt;/code&gt;s the changes on the Git side and &lt;code&gt;git commit&lt;/code&gt;s it with the appropriate commit message.  The patches are pulled in the order in which they were applied in the source repo, so any dependencies should be fulfilled.  Nevertheless, Darcs refused to apply some patches -- silently.  Darcs just determined that I didn't want to pull any patches and didn't do anything.  This is most likely a Darcs bug, but I heard it was only a known bug for some development version of Darcs 2 (I used Darcs 1.0.9 at that time).  Anyways, that didn't work; it failed at about patch 30 of the GHC repository.
&lt;/li&gt;
&lt;li&gt;OK.  So instead of pulling patches by ID we could fake user interaction.  Something like this:
&lt;pre&gt;
$ echo "yd" | darcs pull source-repo&lt;/pre&gt;
The input corresponds to "&lt;strong&gt;Y&lt;/strong&gt;es, I want to pull this patch" and "Ok, I'm &lt;strong&gt;d&lt;/strong&gt;one and want to pull all the selected patches".  This works reliably and also has the advantage that we don't have to read the whole history up front but instead can just retrieve the info for the last applied patch via
&lt;pre&gt;
darcs changes --last 1 --xml&lt;/pre&gt;
&lt;/li&gt;
&lt;li&gt;By now you might have guessed, though, that this still didn't work very well.  It took about 60 seconds per patch (with about 1 second of this used by Git), resulting in estimated 13 days(!) CPU time for the full repository.
&lt;/li&gt;
&lt;li&gt;Interestingly, most of those 60 seconds are spent before any patch choice is displayed, so apparently Darcs is doing something to calculate which patches to show.  After that, displaying more choices is relatively quick.  Apparently, the startup time depends on the number of patches &lt;emph&gt;not yet pulled&lt;/emph&gt;.
&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;This leads to the following trick.&lt;/p&gt;

&lt;p&gt;We use two intermediate repositories.  We use one to pull several patches at a time from the source repository.  I use 15 patches, ie.:
&lt;pre&gt;
$ cd tmp/ghc.pull
$ echo "yyyyyyyyyyyyyyyd" | darcs pull /path/to/ghc&lt;/pre&gt;
We now could import from this intermediate repository into Git, since the startup time to pull from this repo is now much lower.  However, we'd like to already start and pull the next 15 patches into the temporary repository.  Pulling from and into the same repo at the same time doesn't work (Dars locks the repo), so we also need to mirror this temporary repository.  A &lt;code&gt;cp -r&lt;/code&gt; would work, but as the repository grows larger, this would do unnecessary work.  So I just pull the changes at once.
&lt;pre&gt;
$ cd /tmp/ghc.pull_mirror
$ darcs pull --all /tmp/ghc.pull  # this is pretty quick now&lt;/pre&gt;
Now we can import into our git mirror from there, and already start pulling the next 15 patches (proper term for this is "macro pipelining", I believe).
&lt;pre&gt;
$ cd /path/to/ghc.git
$ ./darcs-to-git /tmp/ghc.pull_mirror &amp;  # run in background
$ cd /tmp/ghc.pull
$ echo "yyyy..." # etc&lt;/pre&gt;
Of course, before pulling from the first mirror into the second mirror we have to make sure that &lt;code&gt;darcs-to-git&lt;/code&gt; has finished pulling from the second mirror.  I have implemented this as a shell script on top of darcs-to-git, but I may move it into darcs-to-git at some point.&lt;/p&gt;

&lt;p&gt;&lt;a href="http://github.com/nominolo/darcs-to-git"&gt;My fork of darcs-to-git&lt;/a&gt; as well as &lt;a href="http://github.com/purcell/darcs-to-git"&gt;Steve's main repo&lt;/a&gt; are both available at Github. I haven't pushed all of my local changes yet, but I plan to implement pulling dars patches "interactively" as a possible option for darcs-to-git, so maybe check the repo in a week or two.&lt;/p&gt;

&lt;p&gt;With this approach I am down to about 200 seconds per 15 patches or about 68 hours fo the 18500 patches of the GHC repo which is just below the promised three days.  (Of course, YMMV)&lt;/p&gt;

&lt;p&gt;So the moral of this story?  Darcs is very slow for biggish repositories, especially for rarely used border cases (such as pulling patches one by one).  It may be possible to fix them, but I doubt that this will be easy.  I tried using the new hashed format and the darcs-2 format, but converting the GHC repo didn't work for me.  I certainly hope that things get better, and I plan to help at least a little by submitting several bug reports in the next couple of days about the problems I ran into in the past days.  Let's see what happens.&lt;/p&gt;

&lt;p&gt;Oh, and Darcs needs a killer-app like &lt;a href="http://github.com/"&gt;Github&lt;/a&gt;!&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35609023-2474098222229030274?l=nominolo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nominolo.blogspot.com/feeds/2474098222229030274/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35609023&amp;postID=2474098222229030274' title='17 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/2474098222229030274'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/2474098222229030274'/><link rel='alternate' type='text/html' href='http://nominolo.blogspot.com/2008/05/thing-that-should-not-be-or-how-to.html' title='The Thing That Should Not Be (Or: How to import 18500+ patches from Darcs into Git in less than three days)'/><author><name>Thomas Schilling</name><uri>http://www.blogger.com/profile/04274984206279511399</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>17</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35609023.post-4533934079118896517</id><published>2008-03-16T16:34:00.005+01:00</published><updated>2008-03-16T20:36:15.960+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Haskell'/><title type='text'>A short reminder</title><content type='html'>Folds and maps are Haskell's default iteration combinators.  Mapping is easy enough, but folds can often be rather messy, especially if nested.  For example, given a map of sets of some values, we want to write a function to swap keys and values.  The function's type will be:
&lt;pre&gt;&lt;code&gt;
type SetMap k a = Map k (Set a)

invertSetMap :: (Ord a, Ord b) =&gt; SetMap a b -&gt; SetMap b a
&lt;/code&gt;&lt;/pre&gt;
The resulting map should contain a key for each value of type b, occurring in any set in the original map.  The new values (of type &lt;code&gt;Set a&lt;/code&gt;) are all those original keys for which the new key occurred in the value set.  Intuitively, if the original set was representing arrows from values of type a to values of type b, this function should reverse all arrows.

We can easily implement this function using two nested folds.
&lt;pre&gt;&lt;code&gt;
invertSetMap sm = 
  M.foldWithKey 
      (\k as r -&gt; 
          S.fold (\a r' -&gt; M.insertWith S.union a (S.singleton k) r')
                 r
                 as)
      M.empty
      sm    
&lt;/code&gt;&lt;/pre&gt;
That's not pretty at all!

I had written quite a bit of this kind of code (and hated it each time), until I finally remembered a fundamental Haskell lesson.  Haskell uses lists to simulate iteration and specify other kinds of control flow.  In particular list comprehensions are often extremely cheap, since the compiler can automatically remove many or all intermediate lists and generate very efficient code.  So let's try again.
&lt;pre&gt;&lt;code&gt;
invertSetMap sm = M.fromListWith S.union
    [ (a, S.singleton k) | (k, as) &lt;- M.assocs sm
                         , a       &lt;- S.toList as ]
&lt;/code&gt;&lt;/pre&gt;
So much more readable!

A quick benchmark also shows that it's slightly faster (a few percent for a very big map).

Lesson to take home:  If your folds get incomprehensible consider list comprehensions.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35609023-4533934079118896517?l=nominolo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nominolo.blogspot.com/feeds/4533934079118896517/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35609023&amp;postID=4533934079118896517' title='6 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/4533934079118896517'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/4533934079118896517'/><link rel='alternate' type='text/html' href='http://nominolo.blogspot.com/2008/03/short-reminder.html' title='A short reminder'/><author><name>Thomas Schilling</name><uri>http://www.blogger.com/profile/04274984206279511399</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>6</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35609023.post-7409687571168447757</id><published>2007-10-05T15:07:00.000+02:00</published><updated>2007-10-05T15:18:02.508+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Tutorial'/><category scheme='http://www.blogger.com/atom/ns#' term='Haskell'/><title type='text'>New Haskell Tutorial</title><content type='html'>&lt;a href="http://lisperati.com/"&gt;Conrad Barski&lt;/a&gt; recently made a new &lt;a href="http://lisperati.com/haskell/"&gt;Haskell Tutorial&lt;/a&gt; available.  A while ago I stumbled upon Conrad's excellent &lt;a href="http://lisperati.com/casting.html"&gt;Lisp Tutorial&lt;/a&gt; which was well-received in the community and actually lead to a pretty cool (Common) &lt;a href="http://www.lisperati.com/logo.html"&gt;Lisp-logo&lt;/a&gt;.  I haven't yet read the tutorial completely, but they are usually very well-written, newbie-friendly and based on interesting problems.  So, check it out!

&lt;p&gt;PS: Greetings from the second Hackathon 2007.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35609023-7409687571168447757?l=nominolo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nominolo.blogspot.com/feeds/7409687571168447757/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35609023&amp;postID=7409687571168447757' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/7409687571168447757'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/7409687571168447757'/><link rel='alternate' type='text/html' href='http://nominolo.blogspot.com/2007/10/new-haskell-tutorial.html' title='New Haskell Tutorial'/><author><name>Thomas Schilling</name><uri>http://www.blogger.com/profile/04274984206279511399</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35609023.post-4085990236348928385</id><published>2007-05-21T22:29:00.001+02:00</published><updated>2009-04-05T16:38:56.768+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Haskell'/><title type='text'>Network.HTTP + ByteStrings</title><content type='html'>&lt;p&gt;&lt;strong&gt;Update:&lt;/strong&gt; I mixed some numbers.  I wrote about 375 MB, but it were 175 MB.  (Noone seemed to have noticed though.  Anyways, the argument still holds.)&lt;/p&gt;

Haskell's &lt;a href="http://www.haskell.org/http"&gt;Network.HTTP package&lt;/a&gt; isn't quite as good as it could be.  Well, to be precise, it is &lt;em&gt;not at all&lt;/em&gt; as good as it &lt;em&gt;should&lt;/em&gt; be.  In addition to API problems (for which I proposed a solution in &lt;a href="http://nominolo.blogspot.com/2007/05/towards-better-error-handling.html"&gt;my previous blog entry&lt;/a&gt;) there's also a major performance problem, due to strictness and use of regular list-based &lt;code&gt;String&lt;/code&gt;s.  A simple &lt;code&gt;wget&lt;/code&gt;-style program written in Haskell used like &lt;code&gt;./get http://localhost/file.big&lt;/code&gt; on a local 175 MB file almost locked my 1GB laptop due to constant swapping.  I had to kill it, as it was using up more than 500 MB of RAM (still swapping).  At this point it had run for 50 seconds at had written not a single byte to the output file.  At the same time a normal &lt;code&gt;wget&lt;/code&gt; completed after abound 10 seconds.  Since the file was retrieved from a local server I assume overall performance was inherently limited by disk speed (or the operating system's caching strategy).

The current implementation performed so badly for two reasons:
&lt;ul&gt;
&lt;li&gt;Since it uses list-based strings each retrieved byte will take up (at least) 8 byte in program memory (one cons cell, or tag + data + pointer to tail).&lt;/li&gt;
&lt;li&gt;It implements custom, list-based buffering.  The buffer size is 1000 characters/bytes, which is rather OK for line-based reading, but if the HTTP protocol requests to read a large block of data, this block will be read in 1000 byte chunks and then be &lt;em&gt;appended&lt;/em&gt; to the part that has alrady been read.  So if we read a block of 8000 bytes, the first block will be read and consequently be copied 8-times(!).  Let's not think about reading a block of 175000000 bytes.  Also because we already know the answer.&lt;/li&gt;&lt;/ul&gt;

But let's not flame the original author(s).  It's better than nothing and it gave me and my project partner &lt;a href="http://www.dtek.chalmers.se/~tox/site/"&gt;Jonas&lt;/a&gt; an interesting project topic.

So we decided to overcome the evil at its root and replace Strings using &lt;a href="http://www.cse.unsw.edu.au/~dons/fps.html"&gt;ByteString&lt;/a&gt;s--this way we would get buffering for free.  To give you a taste for what this accomplishes:

&lt;table&gt;
&lt;tr&gt;&lt;th&gt;Program&lt;/th&gt;&lt;th&gt;Runtime&lt;/th&gt;&lt;th&gt;Memory Use&lt;/th&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;wget&lt;/td&gt;&lt;td style="text-align:right"&gt;~10s&lt;/td&gt;&lt;td style="text-align:right"&gt;~0.5MB&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;./get using strict ByteStrings&lt;/td&gt;&lt;td style="text-align:right"&gt;~18s&lt;/td&gt;&lt;td style="text-align:right"&gt;~175MB&lt;/td&gt;&lt;/tr&gt;
&lt;tr&gt;&lt;td&gt;./get using lazy ByteStrings&lt;/td&gt;&lt;td style="text-align:right"&gt;~11s&lt;/td&gt;&lt;td style="text-align:right"&gt;~3MB&lt;/td&gt;&lt;/tr&gt;
&lt;/table&gt;

Adding strict ByteStrings was relatively straightforward.  &lt;code&gt;Network.HTTP&lt;/code&gt; already implements a simple Stream abstraction with a simple interface:
&lt;pre class="example"&gt;
class Stream x where 
    readLine   :: x -&gt; IO (Result String)
    readBlock  :: x -&gt; Int -&gt; IO (Result String)
    writeBlock :: x -&gt; String -&gt; IO (Result ())
    close      :: x -&gt; IO ()
&lt;/pre&gt;

Implementing this for strict ByteStrings is just a matter of calling the corresponding functions from the ByteStrings module.  With one small annoyance:  The HTTP parsing functions expect &lt;code&gt;readLine&lt;/code&gt; to return the trailing newline, which &lt;code&gt;hGetLine&lt;/code&gt; does not include, so we have to append it manually, which  in turn is an O(n) operation.

For simplicity, we also didn't convert the header parsing and writing functions to use ByteStrings, but instead inserted the appropriate calls to &lt;code&gt;pack&lt;/code&gt; and &lt;code&gt;unpack&lt;/code&gt;.  This could become a performance bottleneck if we have many small HTTP requests.  OTOH, we might soon have a &lt;a href="http://code.google.com/soc/haskell/appinfo.html?csaid=B97EF4562EF3B244"&gt;Parsec version that works on ByteStrings&lt;/a&gt;.

As could be seen from the above benchmarks, using strict ByteStrings still forces us to completely load a packet into memory before we can start using it, which may result in unnecessary high memory usage.  The obvious solution to this problem is to use lazy ByteStrings.

For lazy ByteStrings things work a bit differently.  Instead of calling &lt;code&gt;hGet&lt;/code&gt; and &lt;code&gt;hGetLine&lt;/code&gt; inside the stream API, we call &lt;code&gt;hGetContents&lt;/code&gt; when we open the connection.  This gives us a lazy ByteString which we store in the connection object and then use regular list functions on that string to implement the required API.

&lt;pre class="example"&gt;
openTCPPort uri port = 
    do { s &lt;- socket AF_INET Stream 6
       -- [...]
       ; h &lt;- socketToHandle s ReadWriteMode
       ; bs &lt;- BS.hGetContents h  -- get the lazy ByteString
       ; bsr &lt;- newIORef bs       -- and store it as an IORef
       ; v &lt;- newIORef (MkConn s a h bsr uri) 
       ; return (ConnRef v)
       }

readBlock c n =
        readIORef (getRef c) &gt;&gt;= \conn -&gt; case conn of
          ConnClosed -&gt; return (Left ErrorClosed)
          MkConn sock addr h bsr host -&gt;
              do { bs &lt;- readIORef bsr 
                 ; let (bl,bs') = BS.splitAt (fromIntegral n) bs
                 ; writeIORef bsr bs'  
                 ; return $ Right bl
                 }
   
    readLine c =
        readIORef (getRef c) &gt;&gt;= \conn -&gt; case conn of
          ConnClosed -&gt; return (Left ErrorClosed)
          MkConn sock addr h bsr host -&gt;
              do { bs &lt;- readIORef bsr
                 ; let (l,bs') = BS.span (/='\n') bs
                 ; let (nl,bs'') = BS.splitAt 1 bs'
                 ; writeIORef bsr bs''
                 ; return (Right (BS.append l nl)) -- add '\n'
                 }
        `Prelude.catch` \e -&gt; [...]
&lt;/pre&gt;

There are two main problems with this implementation, though:
&lt;ul&gt;
&lt;li&gt;ByteStrings currently only work on handles not on sockets.  Thus we have to turn sockets into handles using &lt;code&gt;socketToHandle&lt;/code&gt; which, according to the source code linked from the Haddock documentation will fail if we're in a multithreaded environment.  (search for "PARALLEL_HASKELL" in &lt;a href="http://darcs.haskell.org/packages/network/Network/Socket.hsc"&gt;Network.Socket's source&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Furthermore, after converting a socket to a handle we should no longer use this socket.  So we can't change any settings of the socket, but close it by calling &lt;code&gt;hClose&lt;/code&gt; on the handle.  

HTTP allows the user to specify whether a socket should be closed after the response has been received.  This is a bit more tricky when we use lazy ByteStrings since our request function will return immediately with a lazy ByteString as a result but no data has been read (except, maybe, on block).  We thus must not close the socket right away, but only after all its contents have been read.  So we must rely on &lt;code&gt;hGetContents&lt;/code&gt; to close our handle (and thus socket) -- which is does not!  From recent #haskell comments this seems to be bug.  In any case though we'd want to be able to specify the behavior, as we might as well keep the socket open.&lt;/li&gt;
&lt;/ul&gt;

There are further issues to consider.  E.g., can we rely on the operating system to buffer everything for us if we don't read it right away?  I don't know the details, but I assume this is handled by some lower layer, possibly dropping packages and re-requesting them if necessary.  That's just guessing though.

Unfortunately, I will not have the time to work out these issues anytime soon, as I will be busy with my Google Summer of Code project (cabal configurations).  There also is a &lt;a href="http://code.google.com/soc/haskell/appinfo.html?csaid=D4DEE221DAC4E810"&gt;SoC project to replace Network.HTTP with libcurl bindings&lt;/a&gt;, but it would probably be a good idea to still have a reasonable Haskell-only solution around.  So if anyone wants to pick it up, you're welcome!

You can get the sources for the lazy version with

&lt;code&gt;darcs get &lt;a href="http://www.dtek.chalmers.se/~tox/darcs/http"&gt;http://www.dtek.chalmers.se/~tox/darcs/http&lt;/a&gt;&lt;/code&gt;

and for the strict version

&lt;code&gt;darcs get &lt;a href="http://www.dtek.chalmers.se/~tox/darcs/http-strict"&gt;http://www.dtek.chalmers.se/~tox/darcs/http-strict&lt;/a&gt;&lt;/code&gt;

If you're interested you can take a look at &lt;a href="http://www.dtek.chalmers.se/~tox/site/http.php4"&gt;our project page&lt;/a&gt;.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35609023-4085990236348928385?l=nominolo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nominolo.blogspot.com/feeds/4085990236348928385/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35609023&amp;postID=4085990236348928385' title='10 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/4085990236348928385'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/4085990236348928385'/><link rel='alternate' type='text/html' href='http://nominolo.blogspot.com/2007/05/networkhttp-bytestrings.html' title='Network.HTTP + ByteStrings'/><author><name>Thomas Schilling</name><uri>http://www.blogger.com/profile/04274984206279511399</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>10</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35609023.post-3847320652489280203</id><published>2007-05-07T12:45:00.000+02:00</published><updated>2007-05-07T15:59:01.358+02:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Haskell'/><title type='text'>Towards Better Error Handling</title><content type='html'>&lt;p&gt;A while ago Eric Kidd wrote a &lt;a href="http://www.randomhacks.net/articles/2007/03/10/haskell-8-ways-to-report-errors"&gt;rant about inconsistent error reporting mechanisms&lt;/a&gt; in Haskell.  He found eight different idioms, none of which were completely satisfying.  In this post I want to propose a very simple but IMO pretty useful and easy-to-use scheme, that works with standard Haskell.&lt;/p&gt;
&lt;p&gt;The Haskell &lt;a href="http://www.haskell.org/http/"&gt;HTTP Package&lt;/a&gt; is a good test case for such scheme.  The most immediate requirements are:&lt;/p&gt;
&lt;ul&gt;
&lt;li&gt;It should work from within any monad (not just &lt;code&gt;IO&lt;/code&gt;).&lt;/li&gt;
&lt;li&gt;It should be possible to catch and identify any kind of error that happened inside a call to a library routine.&lt;/li&gt;
&lt;li&gt;It should be possible to ignore the error-handling (e.g., for simple scripts that just die in case of error)&lt;/li&gt;
&lt;/ul&gt;
&lt;p&gt;So far, the public API functions mostly have a signature like&lt;/p&gt;
&lt;pre class="example"&gt;
type Result a = Either ConnError a

simpleHTTP :: Request -&amp;gt; IO (Result Response)
&lt;/pre&gt;
&lt;p&gt;This requires C-style coding where we have to check for an error after each call.  Additionally, we might still get an &lt;code&gt;IOException&lt;/code&gt;, and have to catch it somewhere else (if we want to).  A simple workaround is to write a wrapper function for calls to the HTTP API.  For example:&lt;/p&gt;
&lt;pre class="example"&gt;
data MyErrorType = ... | HTTPErr ConnError | IOErr IOException
instance Error MyErrorType where
    noMsg    = undefined  -- who needs these anyways?
    strMsg _ = undefined

instance MonadError MyErrorType MyMonad where ...

-- | Perform the API action and transform any error into our custom
--   error type and re-throw it in our custom error type.
ht :: IO (Result a) -&amp;gt; MyMonad a
ht m = do { r &amp;lt;- io m
          ; case r of
              Left cerr -&amp;gt; throwError (HTTPErr cerr)
              Right x   -&amp;gt; return x
          }

-- | Perform an action in the IO monad and re-throw possible
--   IOExceptions as our custom error type.
io :: IO a -&amp;gt; MyMonad a
io m = do { r &amp;lt;- liftIO $
                   (m &amp;gt;&amp;gt;= return . Right)
                   `catchError` (\e -&amp;gt; return (Left e))
          ; case r of
              Left e -&amp;gt; throwError (IOErr e)
              Right a -&amp;gt; return a
          }
&lt;/pre&gt;
&lt;p&gt;We defined a custom error type, because we can have only one error type per monad.  Exceptions in the IO monad and API error messages are then caught immediately and wrapped in our custom error type.&lt;/p&gt;
&lt;p&gt;But why should every user of the library do that?  Can't we just fix the library?  Of course we can!  Now, that we have a specific solution we can go and generalize.  Let's start by commenting out the type signatures of &lt;code&gt;ht&lt;/code&gt; and &lt;code&gt;io&lt;/code&gt; and ask &lt;code&gt;ghci&lt;/code&gt; what it thinks about the functions' types:&lt;/p&gt;
&lt;pre class="example"&gt;
*Main&amp;gt; :t io
io :: (MonadIO m, MonadError MyErrorType m) =&amp;gt; IO a -&amp;gt; m a
*Main&amp;gt; :t ht
ht :: (MonadIO t, MonadError MyErrorType t) =&amp;gt;
IO (Either ConnError t1) -&amp;gt; t t1
&lt;/pre&gt;
&lt;p&gt;Alright, this already looks pretty general.  There's still our custom &lt;code&gt;MyErrorType&lt;/code&gt; in the signature, though.  To fix this we apply the standard trick and use a type class.&lt;/p&gt;
&lt;pre class="example"&gt;
data HTTPErrorType = ConnErr ConnError | IOErr IOException

-- | An instance of this class can embed 'HTTPError's.
class HTTPError e where
    fromHTTPError :: HTTPErrorType -&amp;gt; e
&lt;/pre&gt;
&lt;p&gt;Our wrapper functions now have a nice general type, that allows us to move them into the library.&lt;/p&gt;
&lt;pre class="example"&gt;
throwHTTPError = throwError . fromHTTPError

ht :: (MonadError e m, MonadIO m, HTTPError e) =&amp;gt;
      IO (Result a) -&amp;gt; m a
ht m = do { r &amp;lt;- io m
          ; case r of
              Left cerr -&amp;gt; throwHTTPError (ConnErr cerr)
              Right a   -&amp;gt; return a
          }

-- | Perform an action in the IO monad and re-throw possible
--   IOExceptions as our custom error type.
io :: (MonadError e m, MonadIO m, HTTPError e) =&amp;gt;
      IO a -&amp;gt; m a
io m = do r &amp;lt;- liftIO $
                 (m &amp;gt;&amp;gt;= return . Right) 
                 `catchError` (\e -&amp;gt; return (Left e))
          case r of
            Left e  -&amp;gt; throwHTTPError (IOErr e)
            Right a -&amp;gt; return a
&lt;/pre&gt;
&lt;p&gt;After wrapping, all exported functions will have a signature of the form:
&lt;pre class="example"&gt;
f :: (MonadError e m, MonadIO m, HTTPError e) =&amp;gt;
     ... arguments ... -&amp;gt; m SomeResultType
&lt;/pre&gt;&lt;/p&gt;
&lt;p&gt;Now the user is free to choose whichever monad she wants (that allows throwing errors and I/O).  The only added burden is for the user to specify how to embed a &lt;code&gt;HTTPError&lt;/code&gt; in the respective error type of the monad.  We can already specify the instance for &lt;code&gt;IO&lt;/code&gt;, though.&lt;/p&gt;
&lt;pre class="example"&gt;
instance HTTPError IOException where
    fromHTTPError (IOErr e) = e
    fromHTTPError (ConnErr e) = userError $ show e
&lt;/pre&gt;
&lt;p&gt;This way, our modified API works nicely out of the box whenever we just use the &lt;code&gt;IO&lt;/code&gt; monad and we can use it in our custom monad by writing only one simple instance declaration.&lt;/p&gt;
&lt;pre class="example"&gt;
data MyErrorType = ... | HTTPErr HTTPErrorType

instance HTTPError MyErrorType where
    fromHTTPError = HTTPErr

test1 req = do { r &amp;lt;- simpleHTTP req
               ; putStrLn (rspCode r)
               } `catchError` handler
  where handler (HTTPErr (ConnErr e)) = putStrLn $ &amp;quot;Connection error.&amp;quot;
        handler (HTTPErr (IOErr e))   = putStrLn $ &amp;quot;I/O Error.&amp;quot;
        handler _                     = putStrLn $ &amp;quot;Whatever.&amp;quot;
&lt;/pre&gt;
&lt;p&gt;If we don't care about the error and thus don't want to implement the instance, we can still force our API to be in the &lt;code&gt;IO&lt;/code&gt; monad and thus reuse &lt;code&gt;IOException&lt;/code&gt; to embed possible HTTP errors.&lt;/p&gt;
&lt;pre class="example"&gt;
test2 req = do { r &amp;lt;- liftIO $ simpleHTTP req
               ; putStrLn (rspCode r)
               }
&lt;/pre&gt;
&lt;p&gt;I think this is a very simple but useful scheme.  I already implemented this with a friend in the HTTP package&amp;mdash;and it works (&lt;em&gt;without&lt;/em&gt; &lt;code&gt;-fglasgow-exts&lt;/code&gt;).&lt;/p&gt;
&lt;p&gt;In addition to the added type class, there is the further potential drawback that an &lt;code&gt;IOException&lt;/code&gt; will always be wrapped in an API-specific error type.  So when a program uses more than one API that uses this scheme, an &lt;code&gt;IOException&lt;/code&gt; may be wrapped in either, which may or may not be what is desired.  A more sophisticated system, that deals with this problem and provides additional features, is explain in Simon Marlow's paper &lt;a href="http://www.haskell.org/~simonmar/papers/ext-exceptions.pdf"&gt;&amp;quot;An Extensible Dynamically-Typed Hierarchy of Exceptions&amp;quot; (PDF)&lt;/a&gt;.&lt;/p&gt;
&lt;p&gt;Comments welcome.&lt;/p&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35609023-3847320652489280203?l=nominolo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nominolo.blogspot.com/feeds/3847320652489280203/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35609023&amp;postID=3847320652489280203' title='1 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/3847320652489280203'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/3847320652489280203'/><link rel='alternate' type='text/html' href='http://nominolo.blogspot.com/2007/05/towards-better-error-handling.html' title='Towards Better Error Handling'/><author><name>Thomas Schilling</name><uri>http://www.blogger.com/profile/04274984206279511399</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>1</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35609023.post-1196155203145462688</id><published>2006-12-20T22:18:00.000+01:00</published><updated>2006-12-21T00:01:50.360+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Lisp'/><category scheme='http://www.blogger.com/atom/ns#' term='Haskell'/><title type='text'>More on Syntax</title><content type='html'>My last post &lt;a href="http://programming.reddit.com/info/utz4/comments"&gt;appeared on reddit&lt;/a&gt;--thanks dons!  This induced some comments I'd like to respond to.

First of all, there already is a macro system for Haskell, called (somewhat misleadingly) &lt;a href="http://www.haskell.org/th/"&gt;Template Haskell&lt;/a&gt;.  It already provides the capabilities to generate arbitrary Haskell code.  (More correctly, Haskell 98 code, since extensions like Generalized ADTs are not supported.)  It also provides the given quasi-quotation mechanism I used in my last post's mock-ups: &lt;code&gt;[| ... |]&lt;/code&gt;.

However it has two problems.  Firstly, macros are marked specially using the &lt;code&gt;$(&lt;em&gt;macro&lt;/em&gt; ...)&lt;/code&gt; syntax, which is not as seamless as it could be, although there might be good reasons to keep it, namely to make it easily recognizable, when macros are involved.  Secondly, its quasi-quotation syntax is very limited, i.e., you cannot introduce new bindings and it's hard to modularize code--but I might be wrong with here since I might not have pushed it as far as possible.  The problem is: when you cannot use the quasi-quotation syntax then you're left building up the quite complex Haskell parse tree yourself.  Due to limited documentation and Haskell's syntax rules you usually write your macros by first getting the AST of some sample code, e.g. using:
&lt;pre&gt;-- | print a human-readable representation of a given AST
printAST :: ExpQ -&gt; IO ()
printAST  ast = runQ ast &gt;&gt;= putStrLn . show

pp = printAST [| let x = $([|(4+)|]) in x 5 |]&lt;/pre&gt;
which then (reformatted) looks like this:
&lt;pre&gt;$ pp
LetE [ValD (VarP x_0)
           (NormalB (InfixE (Just (LitE (IntegerL 4)))
                            (VarE GHC.Num.+) Nothing))
                     []]
     (AppE (VarE x_0) (LitE (IntegerL 5)))&lt;/pre&gt;
Then you try to customize this for your purposes.  Not pretty.

My actual attempt was to take a type name as a parameter, inspect it, and then generate some boilerplate code.  Well, I tried but gave up after being unable to construct some type.  Maybe I didn't try hard enough.  Anyways, macro-writing should be that hard!

My proposed solution certainly is just a sketch of an idea, essentially pointing to prior art.  I don't claim that this will in fact work nicely or even that it will work at all.  I am pretty confident that it &lt;em&gt;might&lt;/em&gt;, though, and I am planning to give it a shot later on.  Maybe extending Template Haskell with features similar t o Scheme's &lt;code&gt;syntax-case&lt;/code&gt; might be enough, for a start.

And yet, I don't consider this a high-priority project, since a lot of uses for macros in Lisp can be solved differently in Haskell, as has also been mentioned in the comments to my previous post:

&lt;ul&gt;&lt;li&gt;Controlling the order of evaluation is not necessary in Haskell since, due to lazyness.  And if we have to control it somehow, we mostly use monads.&lt;/li&gt;
&lt;li&gt;The whole category of &lt;code&gt;(with-&lt;em&gt;something&lt;/em&gt; (&lt;em&gt;locally bound vars&lt;/em&gt;) ...)&lt;/code&gt; can be implemented almost as conveniently using &lt;code&gt;with&lt;em&gt;Foo&lt;/em&gt; \&lt;em&gt;locally bound vars&lt;/em&gt; -&gt; do ...&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;A lot of cases for special syntax can be achieved using clever operator and constructor naming.  E.g., in &lt;a href="http://wxhaskell.sourceforge.net/"&gt;wxHaskell&lt;/a&gt;: &lt;code&gt;t &amp;lt;- timer f [interval := 20, on command := nextBalls vballs p]&lt;/code&gt;, or, for an in-progress project of mine I simulate a convenient assembler syntax by allowing a notation like: &lt;code&gt;res &amp;lt;-- a `imul` c&lt;/code&gt;.  However, I was not able to use the &lt;code&gt;&amp;lt;-&lt;/code&gt; notation, since I have different scoping rules than Haskell and I'm not in a monad.&lt;/li&gt;
&lt;li&gt;Many cases of boilerplate code generation can be covered using generic programming, e.g. using &lt;a href="http://www.cs.vu.nl/boilerplate/"&gt;Scrap Your Boilerplate&lt;/a&gt;.&lt;/li&gt;
&lt;/ul&gt;

So where would (more usable) macros still make sense?

&lt;ul&gt;&lt;li&gt;Allow more flexible syntax for domain-specific embedded languages (DSELs), e.g. an XML library or a parser library might profit from this right now. (Yes, I think &lt;a href="http://www.cs.uu.nl/~daan/parsec.html"&gt;Parsec&lt;/a&gt; could be more readable).  Also, DSLs like Happy would be even nicer if embedded directly into Haskell.  Arrows and Monads were considered general enough concepts to introduce new syntax for them, but I think there's more out there that deserves it.  I also think that an upcoming project of mine might hit the limits of what's currently possible in Haskell.  &lt;a href="http://article.gmane.org/gmane.comp.lang.haskell.cafe/17735"&gt;Some people seem to agree&lt;/a&gt;.&lt;/li&gt;
&lt;li&gt;Speaking of ParseC there's still one common use for Lisp-macros:  optimizing at compile-time.  You can get quite far by carefully designing your combinators for your DSELs.  However, combining nice syntax and performance is very hard.  In Lisp, the &lt;code&gt;loop&lt;/code&gt; embedded language, for example, does quite heavy transformations on the given code.  Partial evaluation is probably the more general solution here, but it seems to be not quite ready for primetime, yet.&lt;/li&gt;
&lt;li&gt;The point, that a powerful enough system would essentially make syntactic sugar a library can be seen as a positive side effect, too.  But I think this doesn't have much practical significance.&lt;/li&gt;&lt;/ul&gt;

Bottom line:  There certainly are less uselful applications of macros in Haskell than, e.g. in Lisp, but there are serious enough arguments to at least consider them.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35609023-1196155203145462688?l=nominolo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nominolo.blogspot.com/feeds/1196155203145462688/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35609023&amp;postID=1196155203145462688' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/1196155203145462688'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/1196155203145462688'/><link rel='alternate' type='text/html' href='http://nominolo.blogspot.com/2006/12/more-on-syntax.html' title='More on Syntax'/><author><name>Thomas Schilling</name><uri>http://www.blogger.com/profile/04274984206279511399</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35609023.post-1160951543470307842</id><published>2006-12-14T12:29:00.000+01:00</published><updated>2006-12-14T17:47:50.337+01:00</updated><title type='text'>Syntax</title><content type='html'>Coming to Haskell from Common Lisp, I soon started to miss macros, and tried out Template Haskell .. and gave up.  Haskell's syntax parse tree is way too complex to be usefully manipulated by Haskell functions.

You can get used to Lisp syntax, but you always have to justify it to outsiders and there certainly lies a lot of value in resembling mathematical syntax.  I certainly agree with &lt;a href="http://cgi.cse.unsw.edu.au/~dons/blog/2006/12/14#on-syntax"&gt;dons' post on this issue&lt;/a&gt;.  If only it weren't for the disadvantages!

Sure, syntactic extension have to be done carefully, and powerful abstraction mechanisms in the language always have to come first.  But having macros as an integral part of the language definition would result in desugaring being a library instead of a part of the compiler.  This is a very powerful way of syntactic abstraction.

I know that ML, being designed as a meta-language, has some ways of syntactic abstraction, though I haven't took a closer look at these, yet.  Let me however outline a way of achieving macros almost as powerful as general Lisp macros, which works for languages with less simple syntax.

This system is (partly) implemented in the &lt;a href="http://www.opendylan.org/"&gt;Dylan programming language&lt;/a&gt; and is called D-Expressions.  It is essentially an extension of the Lisp syntax.  

Lisp in its most primitive form it looks like this:

&lt;code&gt;Expr ::= Atom | ( Expr* )&lt;/code&gt;

This represents a simple tree.  However, why should we restrict ourselves to represent nested structure with parens?  We might as well use brackets or braces or keywords. So:

&lt;code&gt;Expr ::= Atom | ( Expr* ) | [ Expr* ] | { Expr* } | def Expr* end&lt;/code&gt;

This way we still retain the unambiguous nesting structure, but have a more varied syntax.  Reader macros in Common Lisp (and probably Scheme, too) do this and already perform some sort of desugaring, by translating &lt;code&gt;{&lt;/code&gt; ... &lt;code&gt;}&lt;/code&gt; into &lt;code&gt;(&lt;em&gt;some-macro ...&lt;/em&gt;)&lt;/code&gt;.  This however has one big problem: You can only have &lt;em&gt;one&lt;/em&gt; macro for &lt;code&gt;{&lt;/code&gt; ... &lt;code&gt;}&lt;/code&gt;.  Common Lisp "fixes" this by using some prefix to the parens, e.g. &lt;code&gt;#c(3 4)&lt;/code&gt; denotes a complex number.  This isn't exactly beautiful and doesn't scale well, though.

In fact we'd rather like to have the chance to use this more variable syntax &lt;em&gt;inside&lt;/em&gt; our macros and we'd need a way to make it scalable.  Dylan solves  this by allowing three kinds of macros:
&lt;ul&gt;&lt;li&gt;Definition-style macros have the form: &lt;code&gt;define &lt;em&gt;modifiers*&lt;/em&gt; &lt;em&gt;macro-name&lt;/em&gt; &lt;em&gt;Expr*&lt;/em&gt; end&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Function-style macros have the form: &lt;code&gt;&lt;em&gt;macro-name&lt;/em&gt;(&lt;em&gt;Expr*&lt;/em&gt;)&lt;/code&gt;&lt;/li&gt;
&lt;li&gt;Statement-style macros have the form: &lt;code&gt;&lt;em&gt;macro-name&lt;/em&gt; &lt;em&gt;Expr*&lt;/em&gt; end&lt;/code&gt;&lt;/li&gt;
&lt;/ul&gt;
Macros are defined using pattern matching rewrite rules, e.g.:
&lt;pre&gt;define macro when
  { when ?cond:expr ?body:body end }
    =&gt;  { if ?cond ?body else #f end }
end&lt;/pre&gt;
Here &lt;code&gt;?cond:expr&lt;/code&gt; states that the pattern variable &lt;code&gt;cond&lt;/code&gt; matches a form of the &lt;em&gt;syntactic category&lt;/em&gt; of an expression.  Similarly, for &lt;code&gt;?body:body&lt;/code&gt;.

Multiple patterns are possible and extending this to Lisp-style abstract syntax tree transformations is possible, too, as shown in this &lt;a href="http://people.csail.mit.edu/jrb/Projects/dexprs.htm"&gt;paper on D-Expressions&lt;/a&gt; .

Adding this feature to Haskell would probably require some small modifications to the syntax of Haskell, but I think we don't have to drop whitespace sensitivity.  This way embedding DSLs in Haskell should be even more seemlessly and &lt;a href="http://syntaxfree.wordpress.com/2006/12/12/do-notation-considered-harmful/"&gt;rants about the do-notation&lt;/a&gt; would not be needed.

Here's how the implementation of the syntactic sugar to list expressions could look like:
&lt;pre&gt;macro do {
  [| do { ?e:expr } |] =&gt; [| ?expr |]
  [| do { ?e:pat &lt;- ?e:expr; ??rest:* } |] 
    =&gt; [| ?expr &gt;&gt;= (\?pat -&gt; do { ??rest ... }) |]
  [| do { let ?pat = ?expr; ??rest:* } |] 
    =&gt; [| let ?pat = ?expr in do { ??rest ... } |]
 ...
}&lt;/pre&gt;
where &lt;code&gt;??rest:*&lt;/code&gt; matches anything up to the closing "}" (resulting the pattern variable &lt;code&gt;rest&lt;/code&gt; to be bound to a sequence of tokens) and &lt;code&gt;??rest ...&lt;/code&gt; expands all tokens in &lt;code&gt;rest&lt;/code&gt;.

Sure, there are a lot of open issues to be solved--e.g. type specifications representing a seperate sub-language of Haskell, and how to actually achieve whitespace sensitivie macros--but I think it would be very useful extension.

Comments welcome! :)

&lt;strong&gt;Edit:&lt;/strong&gt; The last example actually is just a mockup of some possible syntax and does not even make much sense in Dylan either.  But you get the idea (I hope).

I've been pointed to a similar idea, called &lt;a href="http://www.cs.uu.nl/people/arthurb/macros.html"&gt;Syntax Macros&lt;/a&gt;.  Seems to be along the same lines.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35609023-1160951543470307842?l=nominolo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nominolo.blogspot.com/feeds/1160951543470307842/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35609023&amp;postID=1160951543470307842' title='4 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/1160951543470307842'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/1160951543470307842'/><link rel='alternate' type='text/html' href='http://nominolo.blogspot.com/2006/12/syntax.html' title='Syntax'/><author><name>Thomas Schilling</name><uri>http://www.blogger.com/profile/04274984206279511399</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>4</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35609023.post-7971137899027374881</id><published>2006-11-18T01:28:00.000+01:00</published><updated>2006-11-18T02:02:09.629+01:00</updated><title type='text'>Specification-based Testing</title><content type='html'>I just had the seldom opportunity to hear a live talk by &lt;a href="http://www.cs.chalmers.se/%7Erjmh/"&gt;John Hughes&lt;/a&gt; given to a small group of &lt;a href="http://www.chalmers.se/"&gt;Chalmers&lt;/a&gt; students which I happened to part of.  He was re-giving his talk he held one week ago at the &lt;a href="http://www.erlang.se/euc/06/"&gt;Erlang User Conference&lt;/a&gt; about specification-based testing of Ericsson software (written in &lt;a href="http://www.erlang.org/"&gt;Erlang&lt;/a&gt;).

In the following I'll try to give a summary of what I consider the most essential parts of his very interesting talk.  For further information see John's (et al.) &lt;a href="http://www.ituniv.se/program/sem_research/Publications/2006/AHJW06/"&gt;paper&lt;/a&gt; or the &lt;a href="http://lambda-the-ultimate.org/node/1827"&gt;LtU discussion&lt;/a&gt;.

Users of &lt;a href="http://www.md.chalmers.se/%7Erjmh/QuickCheck/"&gt;QuickCheck&lt;/a&gt; know what nice advantages specification-based testing has:  Instead of lots and lots of test cases one writes &lt;span style="font-style: italic;"&gt;properties&lt;/span&gt; of functions, for example (in Erlang):
&lt;pre&gt;  prop_reverse() -&gt;
    ?FORALL(Xs, list(int())),
    ?FORALL(Ys, list(int())),
      reverse(Xs++Ys) ==
        reverse(Xs) ++ reverse(Ys).&lt;/pre&gt;
In fact this specification is &lt;em&gt;wrong&lt;/em&gt; as a quick check (pun intended) shows us:
&lt;pre&gt;  Failed! After 12 tests.
  [-3,2]
  [3,0] 
  Shrinking....(10 times)
  [0][1]&lt;/pre&gt;
QuickCheck provides two important features:

&lt;ul&gt;&lt;li&gt;It provides the interface to a &lt;em&gt;controlled&lt;/em&gt; generation of test cases (as opposed to simply random ones, which usually are of little help).&lt;/li&gt;
&lt;li&gt;It generates and runs tests of the specified properties and--this might be new to users of the &lt;a href="http://www.haskell.org/ghc/"&gt;GHC&lt;/a&gt; package &lt;code&gt;Test.QuickCheck&lt;/code&gt;--&lt;em&gt;shrinks&lt;/em&gt; them to smaller test cases.  This is important as it often hard to find the actual cause of a bug, especially if test cases are very large.  (John mentioned that the Mozilla developers actually offer T-Shirts to people just &lt;em&gt;reducing&lt;/em&gt; test cases as they found this to be an extremely time consuming task.&lt;/li&gt;&lt;/ul&gt;
In our examples the error was in the specification and can be fixed by substituting &lt;code&gt;Xs&lt;/code&gt; and &lt;code&gt;Ys&lt;/code&gt; in the last line.
&lt;h2&gt;Advantages of Specification-based Testing&lt;/h2&gt;
The most obvious advantage of using QuickCheck over usual test cases is that it allows us to dramatically reduce the number of test cases.  In fact it also does a great job in finding corner cases.  In their field study at Ericsson, while spending only 6 days to implement all the test properties (and a library for state-machine-based testing), they found 5 bugs in an already well-tested soon-to-release product, one of which would have been really unlikely to be triggered by any test case and revealed 9 bugs in an older version of the project, of which only one has been documented at this time. (see the paper). Neil Mitchell also recently &lt;a href="http://neilmitchell.blogspot.com/2006/11/systemfilepath-automated-testing.html"&gt;posted about the merits of QuickCheck&lt;/a&gt; can be.

John added one note, though.  The bugs they found were very small (due to shrinking) and in fact would have never been caused in the real system, because the command sequences that lead to the failures would have never been triggered by the real controller.  However, they tested against the documented (400+ pages) specification.   This leads us to one other important advantage of specification-based testing.

QuickCheck forces (or allows) us to actually state the specification as executable properties. therefore we might not only find errors in the program but, as seen in our example, also in the specification.  This can be very useful and he gave a nice demonstration of buggy (or incomplete) specification for the Erlang functions &lt;code&gt;register/2&lt;/code&gt;, &lt;code&gt;unregister/1,&lt;/code&gt; and &lt;code&gt;whereis/1&lt;/code&gt; that provide a sort of name server for Erlang processes.

To do this he generated a sequence of Erlang calls and checked if their results corresponded to the model of a state machine based on the specification.  That is our current state consisted of a simple map, representing the (assumed) state of the name server.  Using preconditions he controlled the allowable sequences of generated commands and checked their outcome using postconditions.

This small experiment showed quite a couple of incompletenesses in the Erlang specification, e.g., it does state that &lt;code&gt;register/2&lt;/code&gt; will fail if you try to register the same process under different names, but it does not state that the process is not added to the list of registered processes (although sensible to assume--nevertheless, the specification is incomplete).  (I think he had some more serious example, but I can't remember what exactly it was.)
&lt;h2&gt;Remarks&lt;/h2&gt;
Having it used myself I am very convinced of the advantages of QuickCheck.  In reply to my question about testability of non-deterministic faults, caused by the effects of concurrent execution of processes, John remarked, that you can get quite deterministic behavior (thus causing reproducible test results) by running the programs on a single CPU and rely on the deterministic scheduler.  I am not so sure how far this reaches, but then again, you should use Erlang's behaviors as much as possible.&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35609023-7971137899027374881?l=nominolo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nominolo.blogspot.com/feeds/7971137899027374881/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35609023&amp;postID=7971137899027374881' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/7971137899027374881'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/7971137899027374881'/><link rel='alternate' type='text/html' href='http://nominolo.blogspot.com/2006/11/specification-based-testing.html' title='Specification-based Testing'/><author><name>Thomas Schilling</name><uri>http://www.blogger.com/profile/04274984206279511399</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35609023.post-116311816194011777</id><published>2006-11-10T01:14:00.000+01:00</published><updated>2006-11-11T14:34:16.597+01:00</updated><category scheme='http://www.blogger.com/atom/ns#' term='Haskell'/><title type='text'>Being Lazy</title><content type='html'>Lazy? Me? No. Noo. Never!

But here's a &lt;a href="http://programming.reddit.com/info/pylx/comments"&gt;Reddit discussion&lt;/a&gt; (warning: long!) that--even though blown up by an obvious troll--has some nice statements about performance, usability, and composability implications of lazy evaluation.  Quite interesting (if you filter out the noise).&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35609023-116311816194011777?l=nominolo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nominolo.blogspot.com/feeds/116311816194011777/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35609023&amp;postID=116311816194011777' title='2 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/116311816194011777'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/116311816194011777'/><link rel='alternate' type='text/html' href='http://nominolo.blogspot.com/2006/11/being-lazy.html' title='Being Lazy'/><author><name>Thomas Schilling</name><uri>http://www.blogger.com/profile/04274984206279511399</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>2</thr:total></entry><entry><id>tag:blogger.com,1999:blog-35609023.post-116015087119096114</id><published>2006-10-06T18:01:00.000+02:00</published><updated>2006-11-11T14:19:10.917+01:00</updated><title type='text'>Uno</title><content type='html'>Everything has a beginning and an ending.

So, this is the beginning of my humble blog--let's hope it's not going to see its end soon.

This blog will (presumably) be about programming languages, compilers, concurrency, maybe a bit about a human-computer-interaction, and about some general life-related stuff.  With time I'll also try to adopt to some &lt;span style="text-decoration: underline;"&gt;&lt;a href="http://www.useit.com/alertbox/weblogs.html"&gt;blog usability...&lt;/a&gt;
&lt;/span&gt;&lt;div class="blogger-post-footer"&gt;&lt;img width='1' height='1' src='https://blogger.googleusercontent.com/tracker/35609023-116015087119096114?l=nominolo.blogspot.com' alt='' /&gt;&lt;/div&gt;</content><link rel='replies' type='application/atom+xml' href='http://nominolo.blogspot.com/feeds/116015087119096114/comments/default' title='Post Comments'/><link rel='replies' type='text/html' href='http://www.blogger.com/comment.g?blogID=35609023&amp;postID=116015087119096114' title='0 Comments'/><link rel='edit' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/116015087119096114'/><link rel='self' type='application/atom+xml' href='http://www.blogger.com/feeds/35609023/posts/default/116015087119096114'/><link rel='alternate' type='text/html' href='http://nominolo.blogspot.com/2006/10/uno.html' title='Uno'/><author><name>Thomas Schilling</name><uri>http://www.blogger.com/profile/04274984206279511399</uri><email>noreply@blogger.com</email><gd:image rel='http://schemas.google.com/g/2005#thumbnail' width='16' height='16' src='http://img2.blogblog.com/img/b16-rounded.gif'/></author><thr:total>0</thr:total></entry></feed>
