Thursday, December 14, 2006

Syntax

Coming to Haskell from Common Lisp, I soon started to miss macros, and tried out Template Haskell .. and gave up. Haskell's syntax parse tree is way too complex to be usefully manipulated by Haskell functions. You can get used to Lisp syntax, but you always have to justify it to outsiders and there certainly lies a lot of value in resembling mathematical syntax. I certainly agree with dons' post on this issue. If only it weren't for the disadvantages! Sure, syntactic extension have to be done carefully, and powerful abstraction mechanisms in the language always have to come first. But having macros as an integral part of the language definition would result in desugaring being a library instead of a part of the compiler. This is a very powerful way of syntactic abstraction. I know that ML, being designed as a meta-language, has some ways of syntactic abstraction, though I haven't took a closer look at these, yet. Let me however outline a way of achieving macros almost as powerful as general Lisp macros, which works for languages with less simple syntax. This system is (partly) implemented in the Dylan programming language and is called D-Expressions. It is essentially an extension of the Lisp syntax. Lisp in its most primitive form it looks like this: Expr ::= Atom | ( Expr* ) This represents a simple tree. However, why should we restrict ourselves to represent nested structure with parens? We might as well use brackets or braces or keywords. So: Expr ::= Atom | ( Expr* ) | [ Expr* ] | { Expr* } | def Expr* end This way we still retain the unambiguous nesting structure, but have a more varied syntax. Reader macros in Common Lisp (and probably Scheme, too) do this and already perform some sort of desugaring, by translating { ... } into (some-macro ...). This however has one big problem: You can only have one macro for { ... }. Common Lisp "fixes" this by using some prefix to the parens, e.g. #c(3 4) denotes a complex number. This isn't exactly beautiful and doesn't scale well, though. In fact we'd rather like to have the chance to use this more variable syntax inside our macros and we'd need a way to make it scalable. Dylan solves this by allowing three kinds of macros:
  • Definition-style macros have the form: define modifiers* macro-name Expr* end
  • Function-style macros have the form: macro-name(Expr*)
  • Statement-style macros have the form: macro-name Expr* end
Macros are defined using pattern matching rewrite rules, e.g.:
define macro when
  { when ?cond:expr ?body:body end }
    =>  { if ?cond ?body else #f end }
end
Here ?cond:expr states that the pattern variable cond matches a form of the syntactic category of an expression. Similarly, for ?body:body. Multiple patterns are possible and extending this to Lisp-style abstract syntax tree transformations is possible, too, as shown in this paper on D-Expressions . Adding this feature to Haskell would probably require some small modifications to the syntax of Haskell, but I think we don't have to drop whitespace sensitivity. This way embedding DSLs in Haskell should be even more seemlessly and rants about the do-notation would not be needed. Here's how the implementation of the syntactic sugar to list expressions could look like:
macro do {
  [| do { ?e:expr } |] => [| ?expr |]
  [| do { ?e:pat <- ?e:expr; ??rest:* } |] 
    => [| ?expr >>= (\?pat -> do { ??rest ... }) |]
  [| do { let ?pat = ?expr; ??rest:* } |] 
    => [| let ?pat = ?expr in do { ??rest ... } |]
 ...
}
where ??rest:* matches anything up to the closing "}" (resulting the pattern variable rest to be bound to a sequence of tokens) and ??rest ... expands all tokens in rest. Sure, there are a lot of open issues to be solved--e.g. type specifications representing a seperate sub-language of Haskell, and how to actually achieve whitespace sensitivie macros--but I think it would be very useful extension. Comments welcome! :) Edit: The last example actually is just a mockup of some possible syntax and does not even make much sense in Dylan either. But you get the idea (I hope). I've been pointed to a similar idea, called Syntax Macros. Seems to be along the same lines.

4 comments:

  1. I'm a little new to Haskell myself but I have been playing with it more lately.

    I think that with lazy evaluation, the effect in Haskell is pretty close to using Lisp with nothing but macros. I noticed you can, for example, implement "if" in Haskell:

    > my_if True x _ = x
    > my_if False _ y = y

    Because Haskell won't actually evaluate the other argument, you get the ability to define a control structure. I tested it with some putStrLn's and it seemed to work. I could be missing something.

    It's not true syntactic modification but it is something better than what you might expect in a strict language lacking macros.

    ReplyDelete
  2. I actually had this conversation with my professor the other day.

    His view was that since haskell is statically typed it would be a waste of time to design macros.

    In lisp, macros are useful because lisp is basically untyped and you may not know the types of your parameters and/or return types.

    In haskell, everything is statically typed and therefore has a predetermined range, so you might as well just just write the code straight up.


    However, we didn't get into IO and monads could do with macros and that ais probably another matter altogether.

    ReplyDelete
  3. I'm interested to see how you refine your ideas in the future.

    I have similar interests in bringing the power and ease of Lisp's macro facility to a static language like Haskell. I blogged about it and probably will do so in the future (I hope you will too!).

    ReplyDelete
  4. Hi! I too have been learning Template Haskell, and I agree, it's very complicated. But I think that a macro system is, if anything, more important in a statically-typed language than in a dynamically-typed one: static typing can lead to some horribly verbose code in some cases. Lazy evaluation handles a lot of the easy cases for which you need macros, but it doesn't do much for the hard cases. I would say that TH is too complicated to be much use in hard cases either, but Haskellites seem to have an unlimited capacity for difficult things, so maybe they'll cope :-)

    ReplyDelete