Extending a Language — Writing Powerful Macros in Scheme
Table of Contents
The canonical HTML-formatted version of this document can be found at https://mnieper.github.io/scheme-macros/README.html. The interactive Org document is hosted at https://github.com/mnieper/scheme-macros.
1. Preface
This document is an introduction to writing powerful macros in Scheme. It was initially written on the occasion of a tutorial I give at the BOB2023 Konferenz in Berlin on 17 March 2023.
The macro facility, especially its built-in hygiene, is one of the fundamental pillars of the Scheme programming language. While more complicated than the simple token-replacing macros of other languages like C, Scheme macros can be written in a way that make them robust and so that the abstractions they offer seamless blend into the language and cannot be distinguished from syntactic forms built into the language. It is often felt that this expressiveness makes writing Scheme macros more complicated (even something of a black art) than writing C or Common Lisp macros, for example. One goal of this tutorial is to convince the audience otherwise.
While the Scheme macro facility has always been avant-garde (and this is one of the reasons why Scheme was chosen as the implementation language for this tutorial), a lot of what is said here also applies to languages that provide corresponding features. It is also a appeal to language designers that languages should include a macro facility as Scheme does, as this allows for small language cores and enables the user to provide their own syntactic abstractions.
Another reason why the Scheme language is used in this tutorial is that it has an exceptionally clear semantics, is a compact language, and is easy to learn.
The document can both be read as is or used interactively in an Emacs session. In the following section, a possible setup is described.
2. Prerequisites
2.1. Chez Scheme
We need a Scheme implementation. This tutorial assumes Chez Scheme, which is one of the most mature, standard-compliant Scheme implementations. You can get Chez Scheme from its homepage. On Debian-based GNU/Linux system like Ubuntu, it is prepackaged.
2.2. Emacs
We will use GNU Emacs as our development environment, which has great tooling for Scheme. The typical GNU/Linux system ships with it.
For GNU Emacs < 28, enable the NonGNU Emacs Lisp Package Archive in
your init file or customize the variable package-archives
.
Likewise, enable the NonGNU-devel ELPA Packages.
2.3. Org
Org Mode is a GNU Emacs major mode that allows to document, edit and execute source code.
The current versions of Org packaged with Emacs hide Scheme evaluation errors. This is fixed in the version in Org's git repository, for which Org provides installation instructions.
Familiarize yourself with how one works with source code in an Org document, especially how to edit and execute code.
2.4. Geiser
Geiser is a GNU Emacs package that allows to runs Scheme processes in GNU Emacs, and which is used by Org's Babel. To install it, it is enough to install the package geiser-chez using GNU Emacs' package menu. We need the most recent development version.
2.5. Paredit
Paredit, a tool for parenthetical editing in Emacs makes working with Scheme code a lot more pleasant. Like Geiser, it can be installed through GNU Emacs' package manager.
2.6. Initialization
After the GNU Emacs packages have been installed, we want to customize them for our needs. The following should go into your init file unless you want to execute the following code every time you start GNU Emacs.
(require 'compile) (autoload 'enable-paredit-mode "paredit" "Turn on pseudo-structural editing of Lisp code" t) (add-hook 'scheme-mode-hook #'enable-paredit-mode) (show-paren-mode t) (setq auto-mode-alist (cons '("\\.ss" . scheme-mode) auto-mode-alist)) (setq auto-mode-alist (cons '("\\.sls" . scheme-mode) auto-mode-alist)) (setq auto-mode-alist (cons '("\\.sps" . scheme-mode) auto-mode-alist)) (add-to-list 'compilation-error-regexp-alist '("^\\(Exception\\|Warning\\).*: .* \\(line \\([0-9]+\\), char \\([0-9]+\\) of \\(.*\\)\\)" 5 3 4 nil 2)) (setq geiser-default-implementation 'chez) (org-babel-do-load-languages 'org-babel-load-languages '((scheme . t))) (setq org-confirm-babel-evaluate nil)
3. The Scheme programming language
Scheme is programming language of the Lisp family. Its defining properties are its uniform parenthesized syntax (inherited from Lisp), first-class procedures and continuations, lexical scoping, dynamic typing, proper tail calls and hygienic macros. It is primarly a functional programming language but allows many other programming paradigms.
The Scheme programming language was developed in the 1970s by Guy L. Steele and Gerald Jay Sussman. Since then it has been refined and further developed through a series of de facto standards called the Revisedn Report(s) on the Algorithmic Language Scheme (R/n/RS). The two current standards are R6RS (2007) and R7RS-small (2013). Despite the versioning and the timeline, R6RS is the more detailed, more advanced and more modern standard1.
In this tutorial, we work with the macro facility of R6RS, which is far more powerful than the one of R7RS-small, and also discuss some proposed or implemented extensions. Such extensions to the Scheme programming language are often proposed, discussed and implemented using the Scheme Requests for Implementation process, where everyone can submit a SRFI extending the Scheme programming language. Whenever we speak of the Scheme language in this text, we default to the R6RS dialect.
For practical programming, one needs, of course, an implementation. Scheme is possibly the programming language with the highest number of implementations. The R6RS language has some very high-quality implementations, including Chez Scheme, GNU Guile, Loko Scheme, and Racket, so for any application area, there will be a suitable Scheme system.
4. Some simple macros
Let us call a combination an expression in Scheme of the form
(operator operand ...)
An example is given by the following expression evaluating to the answer of life:
(* 21 2)
42
Such a combination is usually evaluated by evaluating the operator and the operands in some unspecific order and by then calling the procedure resulting from the operator evaluation with arguments resulting from the operand evaluations.
Scheme, however, also possesses special forms, which do not follow
this evaluation strategy. An example is given by the conditional if
.
(if (number? 2)
'ok
(/ 1 0))
ok
If the conditional were a normal combination, the operands, and (/ 1
0)
in particular, would have been evaluated first (and
unconditionally). Scheme recognizes special forms through the
operator in first position, namely if it is a keyword (a special type
of identifier). The Scheme macro facility allows the programmer to
define their own keywords.
4.1. Incrementing a variable
Let us ignore for a moment that mutation is frowned upon in functional
programming and let us assume that we have to frequently increase the
value of variables in our program. Given a variable x
, this is done
in Scheme through the following expression:
(set! x (+ x 1))
That the variable x
is repeated in this expression is unpleasant
(and may be considered a violation of the DRY principle), so we want
an operator akin to C's pre/post-increment operator. Unfortunately,
Scheme does not provide such an operator, but, fortunately, it doesn't
have to because we can build one ourself.
Our first attempt could be to write a procedure (the primary means of abstraction in functional programming languages)2:
(define incr! (lambda (x) (set! x (+ x 1))))
This attempt, however, is failed:
(define x 1) (incr! x) x
1
The reason that it doesn't work — the variable's value is still 1
and not 2 — is that (incr! x)
is a normal combination as
introduced earlier. As the arguments are evaluated first and the
procedure is called with their values, in this example, incr!
is
called with the argument 1
. This is then bound to a new variable
x
locally to incr!
. It is this variable, which is increased by 1
and not the top-level variable.
The solution is, of course, to define incr!
not as a procedure3
but as a keyword. In the Scheme programming language, the
define-syntax
keyword can be used for it:
(define-syntax incr! (syntax-rules () ((incr! x) (set! x (+ x 1)))))
This definition says that incr!
is defined to be a new keyword,
implemented as a macro. The syntax-rules
line shall be viewed as
boilerplate for the moment (and we will come back to it later).
Important are the next two lines. The form (incr! x)
is a pattern
saying that the macro matches against a use of the form (keyword
form)
(where keyword
is necessarily incr!
). When the macro is
used, the pattern variable x
is bound to the form
. The form
(set! x (+ x 1))
is a template. When the macro is used, the pattern
variables in the template are replaced with the forms they are bound
to and the substituted template is then used in place of the macro.
In the following example, (incr! y)
is effectively substituted by
(set! y (+ y 1))
, so we have achieved what we wanted4:
(define y 10) (incr! y) y
11
As a side note, we see from the discussion that set!
is another
keyword (like if
, it cannot be a procedure for the same reasons why
our attempt to write incr!
as a procedure doesn't work).
As any other identifier in Scheme, the identifier set!
can also be
rebound as in the following example:
(let ([set! (lambda (x y) (+ x y))]) (define x 1) (set! x 2))
3
In the body of the let
form, set!
has lost its usual meaning and
is bound to a procedure adding its two arguments. It is most
interesting to see what happens when we use our incr!
macro, which
refers to set!
, in the body of the let
form:
(let ([set! (lambda (x y) (/ 1 0))]) (define x 1) (incr! x) x)
2
This example yields the correct result 2
, although calling set!
within the let
body would raise an exception. The reason for this
is the already mentioned hygiene of Scheme macros. The identifier
set!
in the output of the incr!
macro didn't occur in its input
but came from the macro definition. Scheme macro hygiene now ensures
that it still refers to the lexical binding it had where it occured in
the program source. Note that the C preprocessor — as an example
for a very simple, if not primitive macro facility — wouldn't have
ensured it. Whether a C macro works correctly or not often depends on
the lexical environment of the macro use site.
We say that hygienic Scheme macros are referentially transparent. This is already known from procedures in functional programming languages and lexical scoping:
(define f (let ([x 1]) (lambda () x))) (list (f) (let ([x 2]) (f)))
(1 1)
Wherever the procedure f
is called, it always evaluates to 1
.
We finish this subsection with another example of hygiene:
(let ([set! 2]) (incr! set!) set!)
3
The result, which is the increment of the original value of the
variable set!
by one, can again be explained by hygiene and by
distinguishing the identifier set!
that appears in the macro use and
the same-named identifier set!
appearing in the macro source.
Without distinguishing both, the macro use (incr! set!)
is
transcribed to (set! set! (+ set! 1))
. In this transcription, the
first set!
originates from the macro transformer and thus still
refers to the lexical binding it had at that place. The other two
occurrences of set!
are copies from the macro input and thus refer
to the lexical binding of set!
as a let-bound variable.
4.2. A tracing let
Simple loops are often written using the named let
form as in the following example:
(define fact (lambda (n) (let f ([n n] [a 1]) (if (zero? n) a (f (- n 1) (* a n))))))
In order to facilitate debugging, let us define a version of the named
let
form that prints the arguments with which the loop recursion is
entered and with which it is exited5. As let
is a special
form, this has to be a special form as well, so let us write our second macro:
;; A form of a named let that prints information about each recursive ;; call. (define-syntax trace-let (syntax-rules () [(trace-let name ([var expr] ...) body1 ... body2) (let f ([depth 0] [var expr] ...) (define name (lambda (var ...) (f (+ depth 1) var ...))) (indent depth) (display "(") (display 'name) (begin (display " ") (display var)) ... (display ")") (newline) (call-with-values (lambda () body1 ... body2) (lambda val* (indent depth) (fold-left (lambda (sep val) (display sep) (display val) " ") "" val*) (newline) (apply values val*))))])) ;; Helper procedure referenced by the macro output of the macro above. (define indent (let ([pattern "| "]) (lambda (depth) (do ([i 0 (+ i 1)]) ((> i depth)) (display (string-ref pattern (mod i 2)))))))
In this macro, the pattern is given by (trace-let name ([var expr]
...) body1 ... body2)
, while the template makes up the bulk of the
macro. Already in the pattern, we see a new syntax, the ellipsis
...
. It means that the subpattern preceding it may appear repeated
zero or more times in the input. When such a subpattern is matched,
the contained pattern variables represent lists of forms.
In the template, the ellipsis means to repeat the preceding subtemplate as many times as the pattern variables contained in it represent forms. For this to work, every such subtemplate has to contain at least one pattern variable, obviously, and all pattern variables contained in it have to represent lists of forms of the same length.
Note the occurrence of begin
in the macro. Normally, in a procedure
body, (begin expression ...)
is equivalent to the list of
expressions
, here, however, we have to use it. The reason is that
following ellipsis refers the immediately preceding subtemplate, so it
is crucial that the two display commands (which we both want to
repeated once per variable) appear in a single form.
When we run the following test, we see the given result printed.
(define fact (lambda (n) (trace-let f ([n n]) (if (zero? n) 1 (* (f (- n 1)) n))))) (fact 3)
|(f 3) | (f 2) | |(f 1) | | (f 0) | | 1 | |1 | 2 |6
We can demonstrate another facet of hygiene with this particular
macro. In the macro template, which is part of the macro's source,
the identifier f
is introduced and is bound by let
appearing next
to in the source. In the particular use of the macro above, the
pattern variable name
represents another identifier name f
, namely
the identifier with that name that appears in the macro use. Although
f
coming from the macro use is bound in the macro output within the
scope of the binding of f
coming from the macro text, it does not
shadow the other f
as this would be a violation of hygiene.
Instead, the identifier f
coming from the macro text is renamed by
the Scheme macro expander, at least conceptually (as it isn't inserted
as a free identifier, the precise name obviously doesn't matter).
The ellipsis can also be used to turn our incr!
macro into one that
accepts more than one variable to increment:
(define-syntax incr! (syntax-rules () ((incr! x ...) (begin (set! x (+ x 1)) ...))))
Let us briefly test our new extended macro:
(define x 10) (define y 20) (incr! x y) (list x y)
(11 21)
The role of begin
in the macro definition of the extended incr!
differs from the role in our previous use of begin
. Here it is used
to solve the problem that the template that prescribes the macro
output has to be a single form.
One can also write the multi-variable incr!
macro without the
ellipsis by letting the macro expand into itself. This is not
necessarily how one would do it, but here it serves as a demonstration
for further macro techniques:
(define-syntax incr! (syntax-rules () ((incr!) (values)) ((incr! x . x*) (begin (set! x (+ x 1)) (incr! . x*)))))
First of all, this is our first macro with two transcription rules,
where each rule consists of a pattern and of a template. The pattern
of the first rule is (incr!)
, the pattern of the second rule is
(incr! x . x*)
. Scheme's macro expander tries to match the macro
input against the patterns in the order in which the patterns appear
in the syntax-rules
form.
The second new thing is a a pattern of the form (incr! x . x*)
,
which matches an (improper) list of at least two elements, the first
being the macro keyword and the second one being bound to the pattern
variable x
. The rest arguments are bound as an (improper) list to
the pattern variable x*
.
Finally, this example demonstrates a recursive macro, that is a macro that transforms the input into an instance of itself. As long as the output of a macro use involves a new macro use (possibly with the same keyword), the Scheme expander continues with transcribing the macro.
Let us not forget to test the new version of the macro:
(define x 100) (define y 200) (incr! x y) (list x y)
(101 201)
4.3. Accessing vector locations through variables
A vector in Scheme is a collection of locations in the store that
can be linearly addressed. A new vector can be allocated with the
vector
procedure:
(define v (vector 1 2 3)) v
#(1 2 3)
Vector elements can be retrieved using vector-ref
and mutated using vector-set!
:
(vector-ref v 2)
3
(vector-set! v 1 4) v
#(1 4 3)
Assume that we want to use our incr!
macro to increase the value of
one vector element. As incr!
expects a variable as its argument, we
have make the locations associated to a vector accessible as if they
were backed up by a variable. Another feature of the (R6RS) macro
system comes to our rescue:
(define-syntax v1 (identifier-syntax [v1 (vector-ref v 1)] [(set! v1 expr) (vector-set! v 1 expr)])) (incr! v1) v
#(1 5 3)
This macro isn't written with syntax-rules
but uses
identifier-syntax
. This is used to declare a keyword, v1
in our
case, that is transcribed differently, depending on whether it appears
in the form v1
or in the form (set! v1 expr)
in the source code.
To access the zeroth or the second element of the vector v
, we could
define identifier macros v0
and v2
similar to v1
but this would
mean mostly duplicating code and violating the DRY principle. A
better approach is to use the Scheme macro system once more. We
define a macro that, when used, defines a customized macro6:
(define-syntax define-vector-reference (syntax-rules () [(define-vector-reference var vec-expr idx-expr) (begin (define vec vec-expr) (define idx idx-expr) (define-syntax var (identifier-syntax [var (vector-ref vec idx)] [(set! var expr) (vector-set! vec idx expr)])))]))
We can now use this macro as follows:
(define-vector-reference initial-element v 0) (incr! initial-element) v
#(2 5 3)
Note that the arguments vec-expr
and idx-expr
can stand for
arbitrary expressions. We evaluate these expressions once and store
their values in the variables vec
and idx
(which will be suitably
renamed by the macro expander so that they won't clash with user
defined identifiers with the same name). If we didn't do this but
used vec-expr
and idx-expr
everywhere in place where vec
and
idx
appeared in the defined macro, the vector and the index
expressions would be evaluated every time, the vector reference
variable would be accessed.
5. Syntax objects
The Scheme reports define hygiene and referential transparency for macros as follows:
- If a macro transformer inserts a binding for an identifier (variable or keyword) not appearing in the macro use, the identifier is in effect rename throughout its scope to avoid conflicts with other identifiers.
- If a macro transformer inserts a free reference to an identifier, the reference refers to the binding that was visible where the transformer was specified, regardless of any local bindings that may surround the use of the macro.
The examples of the previous section make it hopefully a bit clear what is meant by these two points. Nevertheless, one may think that there still must be some magic at work and that it will be impossible to prove anything about these macros. The purpose of this section is to disassemble everything and to explain what is going on under the hood.
5.1. Identifiers
The Lisp languages, and thus Scheme as well, are homoiconic programming languages, which means that if the program's internal representation is a datum of the language. In first approximation, the internal representation of a Scheme expression (as of a Scheme program) is a Scheme datum value. For example, the program (expression)
(let ([x 1])
(+ x 2))
is represented by a list whose first element is the symbol let
,
whose second element is a list of a list with two elements and whose
third element is a list of the three data +
, x
, and 2
.
Due to existence of hygienic macros we have to amend this traditional picture. Consider the following example.
(let ([set! 10]) (incr! set!) set!)
To evaluate the let
expression, the macro use of incr!
has to be
expanded first. After the expansion, the expression would look like
(let ([set! 10]) (set! set! (+ set! 1)) set!)
if Scheme expressions were represented by Scheme datum values and
within, identifiers were represented by symbols. It is obvious that
this cannot be how the Scheme expander works because there would be no
way to tell which copy of the symbol set!
refers to which binding.
The point is that identifiers cannot be represented by symbols, which
only have a symbolic name. Instead, to an identifier both a
symbolic name and a lexical context are associated. When the binding
of an identifier is looked up, it is looked up in the lexical context
associated with it.
In Scheme, symbols are first-class values. The can be created using
the syntax (quote name)
, which can be abbreviated to 'name
:
'red
red
The same is true for identifiers. They are created just like symbols
but use the syntax (syntax identifier)
, which can be abbreviated to
#'identifier
, instead:
#'x
#<syntax x>
(The format of the output, #<syntax x>
, is implementation-specific,
because identifiers are not Scheme datum values and thus have no
standardized or faithful written representation.)
Evaluating of the form (syntax x)
(or #'x
) means the following for
the Scheme system: construct and return an identifier with the
symbolic name x
and with the lexical context at the place of the x
appearing in the syntax
form. We have to be aware of that the term
identifier
can be used in two (slightly) different contexts: When we
refer to set!
as an identifier in the example above, we speak about
a token being part of the code. When we refer to the expression #'x
evaluating to an identifier, we speak about a value of the language.
The expression #'x
contains an identifier in the first sense
(speaking about the language) and evaluates to an identifier (as a
value of the language).
The procedure syntax->datum
can be used to convert an identifier to
a symbol, namely its underlying symbolic name:
(syntax->datum #'x)
x
There are no standard procedures that allow us to look up the binding
of an identifier, but we can compare identifiers. Scheme defines two
equivalence relations, realized by the predicates bound-identifier=?
and free-identifier=?
. Two identifiers are "bound-identifier=?
"
if they are interchangeable when they appear bound in the output of a
macro. Two identifiers are "free-identifier=?
" if they are
interchangeable when they appear free in the output of a macro.
Neither equivalence implies the other. It will become clearer in the
course of this tutorial what this means, but some experiments will
already give some understanding:
(list (bound-identifier=? #'x #'x) (bound-identifier=? #'x #'y))
(#t #f)
The two identifiers to which the two evaluations of #'x
in the first
argument to list
evaluate are therefore "bound-identifier=?
" while
the differently named identifiers #'x
and #'y
(more precisely: the
identifiers returned by these expressions) are not. It is tempting to
say that the two (or three) instances of #'x
evaluate to the same
identifier, but for this to make sense, some equivalence relation
would have had to be fixed earlier.
Let us now consider two simple examples for free-identifier=?
:
(let ([x 1])
(free-identifier=? #'x #'x))
#t
If the identifiers to both instances of #'x
evaluate were inserted
in the code as free identifiers they both would refer to the variable
binding of the identifier x
introduced by let
.
The second example is a bit more interesting:
(let ([x 1]
[y 1])
(free-identifier=? #'x #'y))
#f
The answer is #f
(for false) because although the values of the two
variables x
and y
are both initialized to 1
they are bound to
different locations in the store (which can be exhibited by mutating
one of the two variables.
So far, in all examples bound-identifier=?
seems to give the same
result as free-identifier=?
. That this is not true is shown in the
next example.
(let ([x 1]) (define outer-x #'x) (let ([x 2]) (define inner-x #'x) (list (bound-identifier=? outer-x inner-x) (free-identifier=? outer-x inner-x))))
(#t #f)
Inserting inner-x
as a free identifier would not be equivalent to
inserting outer-x
because the former would refer to the binding of
the variable with value 2
and the latter to the binding of the
variable with value 1
. Thus identifiers that are
"bound-identifier=?
" are not necessarily "free-identifier
". We
hope that the connection of free-identifier=?
to the second hygiene
condition, the one about inserting free references to an identifier,
is apparent.
Again so far, it seems that identifiers are "bound-identifier=?
" if
and only if they have the same symbolic name. One implication is
correct, namely that identifiers that are interchangeable as bound identifiers
must have the same symbolic name, but the other implication is not. To show this, we have to employ a macro:
(let ([x 1]) (let-syntax ([outer-x (identifier-syntax #'x)]) (define inner-x #'x) (list (bound-identifier=? outer-x inner-x) (free-identifier=? outer-x inner-x))))
(#f #t)
Two remarks about the example code are in order before we discuss the
result. The binding form let-syntax
is to let
as define-syntax
is to define
; in other words, it allows us to locally bind keywords
to macro (transformers). Furthermore, we employ a short form of
identifier-syntax
here, which defines no set!
semantics but just
replaces an occurrence of the keyword outer-x
with #'x
.
Both the identifier x
in the definition of the macro outer-x
and
the identifier x
in the definition of the variable inner-x
refer
to the binding of x
introduced by the outer let
, which explains
that the values of outer-x
and inner-x
are "free-identifier=?
".
But they are not "bound-identifier=?
", so this example shows that
identifiers that "free-identifier=?
" need not necessarily be
"bound-identifier=?
".
The reason why they cannot be "bound-identifier=?
" is that the first
hygiene condition about inserting bindings for an identifier would be
violated otherwise. Consider the following example:
(let-syntax ([add1 (syntax-rules () [(add1 y) (let ([x 1]) (+ x y))])]) (let ([x 2]) (add1 x)))
3
The identifier x
appearing in the macro template is inserted as a
bound identifier in the macro output and thus is in effect renamed to
avoid conflict with the identifier x
appearing in the macro use.
Renaming means that the two identifiers named x
cannot be
"bound-identifier=?
" because they would otherwise be interchangeable
as bound identifiers.
Scheme implements this hygiene condition by assigning to identifiers besides their symbolic name and their lexical context another property, namely their historic context (or just history)7. The history of an identifier is the information when the identifier was first introduced in the program. All identifiers in the program source have the same history — they were already there when the program was started. An identifier introduced by a macro transformation (as part of its output) has a different history than identifiers that were already present in the program source. Identifiers introduced by different macro transformations have different histories and all identifiers introduced by the same macro transformation have the same history.
Let us take another view at this example:
(let ([x 1]) (let-syntax ([outer-x (identifier-syntax #'x)]) (define inner-x #'x) (list (bound-identifier=? outer-x inner-x) (free-identifier=? outer-x inner-x))))
(#f #t)
The identifier x
appears three times in the source. All three
identifiers have the same history. When the macro outer-x
is
expanded, the identifier x
is introduced in the macro output (as
part of the expression #'x
) and this particular identifier was not
part of the macro input, so the introduced identifier x
has a
different history than the identifier to which inner-x
is bound.
We are now in a situation to give alternative definitions for
bound-identifier=?
and free-identifier=?
: Two identifiers are
"bound-identifier=?
" if they have the same symbolic name and the
same history. Two identifiers are "free-identifier=?
" if they refer
to the same binding in their respective lexical contexts. (An unbound
identifier is, by definition, "free-identifier=?
" to another
identifier if the other identifier is also unbound and has the same
symbolic name.)
Scheme also allows to fudge identifiers. The procedure
datum->syntax
can turn a symbol into an identifier with that
symbolic name. For that, the user has to provide a lexical context
and a history. This is done by giving a "template" identifier from
which the context is taken.
(let ([x 1]) (define outer-x #'x) (let ([x 2]) (define outer (datum->syntax outer-x 'x)) (list (bound-identifier=? outer-x outer) (free-identifier=? outer-x outer))))
(#t #t)
In this example, the identifier outer
is an identifier with the
symbolic name x
and with the context as if it was introduced where
x
appears in the definition of outer-x
.
In the following example, the fudged identifier with the symbolic name
y
has the same history as the identifier x
appearing the macro use
of as-y
, and thus the same history as the identifier y
appearing
in the call to bound-identifier=?
.
(let-syntax ([as-y (syntax-rules () [(as-y x) (datum->syntax #'x 'y)])]) (bound-identifier=? #'y (as-y x)))
#t
5.2. Constructing syntax objects
In the previous section we learned that Scheme code cannot be represented by a Scheme datum value (a Scheme value that has a written representation like a list, a number, or a symbol), at least not during the expansion process, as identifiers cannot be represented by symbols.
The objects that do represent Scheme forms are called syntax objects. The basic idea is that a syntax object is like a datum value but with identifiers instead of symbols. So a list of identifiers or a vector of a number and an identifier, or a single string or identifier are all syntax objects. Moreover, there can be a wrap around a nonidentifier syntax object.
Formally, syntax objects can inductively be defined as follows:
- A nonpair, nonvector, or nonsymbol value is a syntax object.
- A pair of syntax objects is a syntax object.
- A vector of syntax objects is a syntax object.
- An identifier is a syntax object.
- A wrapped nonpair, nonvector, or nonsymbol value is a syntax object.
- A wrapped pair or vector of syntax objects is a syntax object.
To each syntax object corresponds a (datum) value by stripping all
wraps and converting all identifiers to their symbolic names. The
Scheme procedure that does this conversion is syntax->datum
. We
have already seen it converting identifiers to symbols. It is also
used in effect by the quote
special form: When Scheme evaluates an
expression like (quote (1 2 foo))
, the (internal) procedure
responsible for expanding or evaluating this expression will receive a
syntax object whose underlying datum value is (1 2 foo)
and will
evaluate to this underlying value.
We can construct the syntax object in the above example as a Scheme value:
(list 1 2 #'foo)
(1 2 #<syntax foo>)
It is a syntax object because it is a list of syntax objects (and Scheme lists are built from pairs and the empty list) and it has the expected corresponding (datum) value:
(syntax->datum (list 1 2 #'foo))
(1 2 foo)
The predicate identifier?
is a Scheme procedure that can be used to
test whether a syntax object is an identifier or not:
(list (identifier? 1) (identifier? (list #'x)) (identifier? #'x))
(#f #f #t)
In the previous section, we saw how to use syntax
keyword
(abbreviated by #'
) can be used to create identifiers. In fact, the
argument to the syntax
keyword does not have to be symbol but can be
any datum, so a syntax
expression can be used to build more
complicated syntax objects:
(syntax (1 2 foo))
#<syntax (1 2 foo)>
As the result shows, this is a wrapped syntax object, namely a wrapped list (of syntax objects). The Scheme system uses the wrap to attach source location information to the syntax object (facilitating debugging), and the expander makes use of the fact that syntax objects can be opaque (wrapped) to provide optimal algorithmic complexity for the expansion process.
Whether wrapped or not, we can apply syntax->datum
on this syntax object:
(syntax->datum #'(1 2 foo))
(1 2 foo)
Here, we used again the abbreviation #'
for syntax
.
5.3. Destructing syntax objects
The syntax object returned by #'(1 2 foo)
cannot be destructed using
list procedures like car
and cdr
although it represents a list as
it is wrapped. Scheme offers a special form, syntax-case
to
destruct syntax objects. A syntax-case
form contains clauses, each
consisting of a pattern of the form we already saw in connection with
syntax-rules
and an expression. An input syntax object is matched
against the patterns in order and the expression corresponding to the
first pattern that matches is evaluated:
(syntax-case #'(1 2 foo) ()
[(a b) 'case-1]
[(a b (c d)) 'case-2]
[(2 b c) 'case-3]
[(a b c d e ...) 'case-4]
[(a b c) 'case-5]
[x 'case-6])
case-5
(The empty list ()
appearing in the second argument of syntax-case
will be explained soon and plays the same role as the empty list we
saw in our syntax-rules
examples.)
The pattern of the last clause would have also matched but the matching ends as soon as a matching clause (the fifth in this example) is found. (The system will raise an exception if no match can be found.)
Let us try to distinguish the syntax objects returned by #'(1 2 foo)
and #'(1 2 bar)
.
(syntax-case #'(1 2 foo) ()
[(a b bar) 'bar]
[(a b foo) 'foo])
bar
That we don't get the expected (or hoped for) result is because
syntax-case
(as syntax-rules
) does treat every identifier
appearing in a pattern as a pattern variable by default. Thus, in the
first pattern, bar
is not matched against foo
but bar
is bound
to foo
. We can change this behavior by adding the identifiers that
we want to match literally to the list that appeared as the empty list
so far:
(syntax-case #'(1 2 foo) (bar foo)
[(a b bar) 'bar]
[(a b foo) 'foo])
foo
The equivalence predicate that syntax-case
uses to compare an input
identifier against a literal identifier is free-identifier=?
. In
the case of the example, both bar
and foo
are unbound and we
recall that unbound identifiers are "free-identifier=?
" if and only
if they have the same symbolic name. The next example demonstrates
how the binding comes into play:
(let ([foo 1]) (define input (let ([foo 2]) #'(1 2 foo))) (syntax-case input (foo) [(a b foo) 'match] [(a b c) 'no-match]))
no-match
We have now the tool to dispatch on the structure of a syntax object,
but what we also need is a way to get hold of the individual
components of a syntax object. This is done with pattern variables
(a
, b
, and c
in the example above). We said above that a
pattern variable is bound to the syntax object it is matched against.
This scope of this binding is the expression following the pattern in
the syntax-case
clause. Just a keywords are not ordinary variables,
pattern variables are neither. They may only be referenced inside the
syntax
form as in the following example:
(syntax-case #'(1 x) ()
[(1 y) #'y])
#<syntax x>
Here, #'y
does not resolve to the identifier y
(because y
is
bound to a pattern variable) but to the syntax object to which y
is
bound, which is the value of #'(1 x)
.
Mixing of pattern variables and non-pattern variable identifiers in
the same syntax
expression also works:
(syntax-case #'(1 x) ()
[(1 a) #'(b a)])
(#<syntax b> #<syntax x>)
As one can see, the result is not a wrapped syntax object but a list
of two syntax objects. This is no coincidence. When a pattern
variable appears in a syntax
template, all the substructure in which
the pattern variable is replaced by what it was matched against, is
unwrapped, so ordinary list and vector accessor procedures can be
used. The following is another example:
(syntax-case #'(1 2 3) ()
[(1 x ...) #'(a x ... b c)])
(#<syntax a> #<syntax 2> #<syntax 3> . #<syntax (b c)>)
As can be seen, the pattern variable x
is matched against the list
of syntax objects consisting of 2
and 3
. Up to the part (and
including it) where x
is substituted, the syntax object is
unwrapped. The ellipsis in the syntax
template works as the
ellipsis in the syntax-rules
templates (we will see below why this
is no coincidence).
In particular, we can use list procedures to reference individual elements or to calculate lengthes:
(define syntax-length (lambda (stx) (syntax-case stx () [(x ...) (length #'(x ...))]))) (syntax-length #'(a b c d))
4
We have already seen how literals in syntax-case
can be used for
literal matching of identifiers (using free-identifier=?
).
Otherwise, syntax-case
only matches per structure. If we want to
match structural element using special rules, fenders can be used as
in the following example:
(syntax-case #'(define 3 (+ 1 2)) () [(define id expr) (identifier? #'id) 'ok] [_ 'error])
error
The fender is the expression between the pattern and the final
expression in the first clause of syntax-rules
. If present, it is
evaluated when the pattern matches. If the evaluation yields #f
,
this clause is skipped and matching is continued with the next clause.
The scope of the pattern variables of a pattern includes a fender if
present.
The (sub)pattern _
matches anything (like a pattern variable) but
does not bind a pattern variable.
6. Syntax-case macros
6.1. Macro transformers
We started this tutorial with writing macros and discussing a number
of some example of such macros. Somehow, we seemed to have deviated
by talking about identifiers, syntax objects, and their construction
and destruction. In this section we will see how syntax-case
and
syntax
can be employed to write powerful macros. In fact, they are
the building blocks of macro transformers.
To make use of the forms syntax-case
and syntax
, we have to
understand what actually goes into a define-syntax
definition. The
general form of a syntax definition is (define-syntax identifier
transformer-expression)
(the analogous holds for bindings in a
let-syntax
expression). When the Scheme expanders encounters a
define-syntax
definition, it evaluates the transformer-expression
,
which is an ordinary Scheme expression. It's value must be a macro
transformer, which is then bound to the keyword given by identifier
.
Now, a macro transformer is just an ordinary Scheme procedure taking one argument, a syntax object, and returning one value, another syntax object. The input syntax object represents the macro use form, the output syntax object represents the transcribed macro use. Let us check this:
(let ([x 41]) (define-syntax always-42 (lambda (stx) (syntax (+ 1 x)))) (+ always-42 (always-42 400)))
84
Independently of how the macro is used — that is, independently of
what stx
is —, the macro transformer of this example always
returns the expression (+ 1 x)
(evaluating to 42
). Note that we
could have equivalently written #'(+ 1 x)
instead of syntax
.
If we want to make the macro output dependent on the macro input, we
have to employ syntax-case
to destruct the input syntax object. Let
us first define a macro transformer that uses syntax-case
:
(define f (lambda (stx) (syntax-case stx () [(_ x ...) (list #'quote (list (length #'(x ...)) #'(x ...)))])))
We can test this procedure as any other procedure:
(f #'(q a b c))
(#<syntax quote> (3 (#<syntax a> #<syntax b> #<syntax c>)))
The output is thus a syntax object of the form (quote (n x ...))
where the x
denote the arguments following the head element of the
syntax object argument to f
and n
is the number of these
arguments. The expression that yields the syntax object in the
procedure f
above is not very readable. Because of that, Scheme
also offer a quasisyntax
form (abbreviated with #`
), which is to
syntax
as quasiquote
is to quote
:
(define f (lambda (stx) (syntax-case stx () [(_ x ...) #`(quote (#,(length #'(x ...)) x ...))])))
Even more readable becomes the expression if pattern variables are
used, which can not only be bound by syntax-case
but also by
with-syntax
, which is for pattern variables what let
is for
ordinary variables:
(define f (lambda (stx) (syntax-case stx () [(_ x ...) (with-syntax ([n (length #'(x ...))]) #'(quote (n x ...)))])))
In fact, with-syntax
is not a primitive form but can be expressed in
terms of syntax-case
:
(define-syntax with-syntax (syntax-rules () [(with-syntax ([p e0] ...) e1 ... e2) (syntax-case (list e0 ...) () [(p ...) (let () e1 ... e2)])]))
In whatever way we write the procedure f
, we can then use it to
define an actual macro:
(define-syntax quote/length f)
Of course, instead of naming the macro transformer and just
referencing to it in the right hand side of define-syntax
, we could
have equally well written the transformer procedure expression inline.
The advantage of the former is that the transformer procedure can then
be easily tested using the usual tools, the advantage of the latter is
that it is more compact8.
Let's test our macro:
(quote/length a b c)
(3 a b c)
It should be noted that the calculation of the length, 3
in this
case, happens at expand-time (so in the compiler if we use one). In
fact, a macro can be understood as a compiler for a sublanguage and
that is be plugged into the Scheme system to extend the language.
We now have amassed enough knowledge to give the definition of
syntax-rules
. As the right hand side of define-syntax
expects a
procedure expression, a syntax-rules
form must evaluate to a
procedure. And, in fact, syntax-rules
can be defined as follows:
(define-syntax syntax-rules (lambda (stx) (syntax-case stx () [(_ (lit ...) [(k . p) t] ...) (for-all identifier? #'(lit ... k ...)) #'(lambda (x) (syntax-case x (lit ...) [(_ . p) #'t] ...))])))
The syntax
expression following the fender of the syntax-case
clause shows that a syntax-rules
expression evaluates to a
procedure. There is another instance of syntax
(#'
) within the
template of the outer syntax
expression. This is because procedure
to which a syntax-rules
expression evaluates outputs itself a syntax
object.
One more thing is remarkable: Each syntax-rules
pattern is of the
form (k . p)
; more precisely, it can only match (syntax) pairs whose
head element is an identifier, that is macro uses of exactly this
form. Notably, the pattern variable k
isn't referenced in the
output. This is because a syntax-rules
pattern ignores the pattern
variable that corresponds to the keyword position. In particular, the
following two syntax definitions are equivalent:
(define-syntax incr! (syntax-rules () [(incr! x) (set! x (+ x 1))]))
(define-syntax incr! (syntax-rules () [(_ x) (set! x (+ x 1))]))
This is in contrast to a syntax-case
expression, which doesn't tread
the keyword position in a special way. This is the reason why we
often use _
at the keyword position in syntax-case
expressions for
macro transformers.
It is a good time to finally give the definition of our initial
incr!
macro in terms of syntax-case
:
(define-syntax incr! (lambda (stx) (syntax-case stx () [(_ x) (identifier? #'x) #'(set! x (+ x 1))])))
It is instructive to go through the above definition of the
syntax-rules
keyword and see how the earlier definition using
syntax-rules
expands into the later definition using syntax-case
.
The only line that is not present with the syntax-rules
definition
is the fender (identifier #'x)
, which has no equivalent for
syntax-rules
. This fender ensures that a syntax error is reported
early if the user tries to use this macro in a non-sensible form like
in (incr! 2)
.
6.2. A fluid let
We should finally move past the incr!
macro. We already remarked
that mutation (which incr!
does) is frowned upon. To be more
precise, what makes problems is mutation with unlimited extent.
Mutation with dynamic extent, on the other hand, can be used to
implement dynamically scoped variables, which are also called fluids
and do not have all the problems associated with unbound mutation.
It is probably best to explain it with an example. For this, we
define a new binding-like construct, named fluid-let
9:
(define-syntax fluid-let (lambda (stx) (syntax-case stx () [(_ [(x e)] b1 ... b2) (identifier? #'x) #'(let ([y e]) (define swap! (lambda () (let ([t x]) (set! x y) (set! y t)))) (dynamic-wind swap! (lambda () b1 ... b2) swap!))])))
Let us briefly check the output of the following expression:
(let ([x 1]) (define show (lambda () (display x) (newline))) (show) (fluid-let ([x 2]) (show)) (show))
The dynamic-wind
procedure takes three thunks (procedures that take
no arguments) as arguments. When dynamic-wind
is called, it calls
the three thunks in that order and finally returns the results of the
call to the second, the middle, thunk. The reason why we didn't write
(begin (swap!) ((lambda () b1 ... b2)) (swap!))
is that
dynamic-wind
arranges for calling the enter and exit thunk even in
the presence of non-local control flow10.
The variable y
is used by the macro to store the old value of var
in it before the latter is mutated. As y
does not come from the
macro input, it won't conflict with the definition of an identifier
named y
surrounding the use of fluid-let
. Likewise, the temporary
variable t
won't conflict regardless of what variable the pattern
variable x
stands for.
Our fluid-let
can "bind" exactly one variable. If we want to change
more than value, say two, we have to rewrite our macro:
(define-syntax fluid-let (lambda (stx) (syntax-case stx () [(_ [(x1 e1) (x2 e2)] b1 ... b2) (for-all identifier? #'(x1 x2)) #'(let ([y1 e1] [y2 e2]) (define swap! (lambda () (let ([t x1]) (set! x1 y1) (set! y1 t)) (let ([t x2]) (set! x2 y2) (set! y2 t)))) (dynamic-wind swap! (lambda () b1 ... b2) swap!))])))
(let ([a 1] [b 2]) (define show (lambda () (display (list a b)) (newline))) (show) (fluid-let ([a 3] [b 4]) (show)) (show))
This is, of course, a non-solution because we still can't pass three variables and have also lost the ability of just passing one variable. Possibly, the ellipsis can help as in the following attempt:
(define-syntax fluid-let (lambda (stx) (syntax-case stx () [(_ [(x e) ...] b1 ... b2) (for-all identifier? #'(x ...)) #'(let ([y e] ...) (define swap! (lambda () (let ([t x]) (set! x y) (set! y t)) ...)) (dynamic-wind swap! (lambda () b1 ... b2) swap!))])))
However, this won't quite work. The problem is that there is only one
identifier y
introduced and not one identifier per each fluid
variable. The canonical solution Scheme offers here is the
generate-temporaries
procedure, which takes a list or a syntax
object representing a list and returns a list of as many identifiers,
each with its unique history so that they won't be pairwise
"bound-identifier=?
" or to any other identifier:
(with-syntax ([(x y) (generate-temporaries '(a b))]) (list (identifier? #'x) (identifier? #'y) (bound-identifier=? #'x #'y)))
Here, the list (a b)
has two elements, so generate-temporaries
creates two identifiers, which we bound using with-syntax
to the
pattern variables x
and y
.
With this tool at our disposal, we can finally write a version of
fluid-let
that works with an arbitrary number of variables:
(define-syntax fluid-let (lambda (stx) (syntax-case stx () [(_ [(x e) ...] b1 ... b2) (for-all identifier? #'(x ...)) (with-syntax ([(y ...) (generate-temporaries #'(x ...))]) #'(let ([y e] ...) (define swap! (lambda () (let ([t x]) (set! x y) (set! y t)) ...)) (dynamic-wind swap! (lambda () b1 ... b2) swap!))]))))
(let ([a 1] [b 2]) (define show (lambda () (display (list a b)) (newline))) (show) (fluid-let ([a 3] [b 4]) (show)) (show))
6.3. Implementing a variant type in Scheme
It is time to demonstrate more involved macros to highlight some features of the Scheme macro system and how it leads to extensibility of the language.
To have some use case at hand, let us assume that we deal with binary trees that carry a value at each (internal) node and at each leaf. We can use the Scheme record facility to provide the necessary data types, implementing an abstract tree interface:
(define-record-type node (fields left value right)) (define-record-type leaf (fields value))
We can build a tree using the constructors defined by the above record-type definitions:
(define t (make-node (make-node (make-leaf 4) 2 (make-leaf 1)) 8 (make-leaf -1)))
While creating a tree by hand in this way is doable, it is not very neat. It would be nice if we could give the tree above in simple, parenthesized syntax as follows:
(((4) 2 (1)) 8 (-1))
In other words, (internal) nodes are given by lists of three elements, and leafs by lists of one element. To achieve this, one might want to write a procedure as the following one:
(define make-tree (lambda (e) (define n (length e)) (cond [(= n 3) (make-node (make-tree (car e)) (cadr e) (make-tree (caddr e)))] [(= n 1) (make-leaf (car e))] [else (assert #f)])))
We can then build our tree t
as follows:
(define t (make-tree '(((4) 2 (1)) 8 (-1))))
The quote
(necessary so that Scheme does not try to evaluate our
tree description as an expression) is not optimal, but we can write a
macro that inserts the quote for us:
(define-syntax tree (syntax-rules () [(tree datum) (make-tree 'datum)]))
With this macro, we can now build our tree with the following syntax:
(define t (tree (((4) 2 (1)) 8 (-1))))
While this is optimal as far as the flexibility in syntax is
concerned, the solution is inferior to our original approach of
building the tree by calling the constructors make-node
and
make-leaf
by hand. The point is that the procedure make-tree
,
which is called in the output of the macro tree
, walks the tree
expression at run time and so is not as efficient as the original
approach. What we want is that the tree expression is analyzed during
compile time. As macros are nothing but small compilers, it is no
surprise that a macro will help. All we have to do is to rewrite the
tree
macro that it doesn't output a call to make-tree
but that it
directly outputs calls to make-node
and make-leaf
:
(define-syntax tree (syntax-rules () [(tree (left value right)) (make-node (tree left) value (tree right))] [(tree (value)) (make-leaf value)]))
The tree can be built as before:
(define t (tree (((4) 2 (1)) 8 (-1))))
Now let us do something with the tree. For example, we can ask for the sum of all values in the tree nodes, internal and leaf nodes:
(define tree-accumulate (lambda (t) (cond [(node? t) (+ (tree-accumulate (node-left t)) (node-value t) (tree-accumulate (node-right t)))] [(leaf? t) (leaf-value t)] [else (assert #f)])))
We can test the procedure with our example tree:
(tree-accumulate t)
We have used Scheme's general cond
expression to dispatch on the two
possible types of trees. Compared to pattern matchers of other
languages, this also does not deserve the attribute neat. What we
would like is to have a syntax so that we can write tree-accumulate
as follows:
(define tree-accumulate (lambda (t) (tree-case t [(node left value right) (+ (tree-accumulate left) value (tree-accumulate right))] [(leaf value) value])))
Obviously, this calls for another macro!
(define-syntax tree-case (lambda (stx) (define parse-clause (lambda (cl) (syntax-case cl (node leaf) [[(node left value right) e1 ... e2] #'[(node? tmp) (let ([left (node-left tmp)] [value (node-value tmp)] [right (node-right tmp)]) e1 ... e2)]] [[(leaf value) e1 ... e2] #'[(leaf? tmp) (let ([value (leaf-value tmp)]) e1 ... e2)]] [_ (syntax-violation 'tree-case "invalid clause syntax" stx cl)]))) (syntax-case stx () [(_ tree-expr clause ...) (with-syntax ([(clause ...) (map parse-clause #'(clause ...))]) #'(let ([tmp tree-expr]) (unless (tree? tmp) (assertion-violation 'tree-case "invalid tree argument" tmp)) (cond clause ... [else (assertion-violation 'tree-case "unhandled tree argument" tmp)])))] [_ (syntax-violation 'tree-case "invalid syntax" stx)]))) (define tree? (lambda (obj) (or (node? obj) (leaf? obj))))
In the above macro, we used the procedure syntax-violation
defined
by Scheme to report syntax errors when the macro is misused. It is
always a good idea to report syntax violations as early and as precise
as possible.
The two identifiers, the Scheme reports speak of auxiliary syntax,
node
and leaf
are matched using free-identifier=?
. Both of
these identifiers are bound (they were bound by our record-type
definitions of the node
and the leaf
type). Thus, when the macro is used in the form
(tree-case t [(n l v r) ---] ---)
the binding of the identifier n
is compared to the binding of the
identifier node
(in the lexical context of the macro transformer).
In general, it is a good idea to use bound identifiers as literals in
syntax-case
(or syntax-rules
). Even if the code surrounding a
macro use of, say, tree-case
binds node
to something else, the
library system of Scheme allows to import another identifier that is
bound to the original binding of node
so the tree-case
macro can
still be used with the other identifier in place. This does not work
when free-identifier=?
compares unbound identifiers by name.
With our tree-case
macro, we can finally define and test our newly
written tree-accumulate
:
(define tree-accumulate (lambda (t) (tree-case t [(node left value right) (+ (tree-accumulate left) value (tree-accumulate right))] [(leaf value) value]))) (tree-accumulate t)
14
We have solved our binary-tree-use case but we can still do better.
Assume that the problem we have to solve the next day does not involve
binary trees but abstract syntax trees of a programming language, for
which we have to write an interpreter or compiler. Instead of
(internal) nodes and leaves, we would have, say, expressions,
statements, definitions, programs. When walking an abstract syntax
tree, one has to dispatch again on the possible types of an abstract
syntax tree. So, instead of tree-case
we want ast-case
. We could
copy and suitably modify the tree-case
macro but this would violate
DRY.
The answer is, instead, to write a macro, once and for all, that
generates macros like tree-case
. Here it is:
(define-syntax define-destructor (lambda (stx) (syntax-case stx () [(_ name [keyword predicate-expr accessor-expr ...] ...) (for-all identifier? #'(keyword ...)) (with-syntax ([(pred-id ...) (generate-temporaries #'(predicate-expr ...))] [((acc-id ...) ...) (map generate-temporaries #'((accessor-expr ...) ...))] [((var ...) ...) (map generate-temporaries #'((accessor-expr ...) ...))]) #'(begin (define pred-id predicate-expr) ... (define acc-id accessor-expr) ... ... (define-syntax name (lambda (stx) (define parse-clause (lambda (cl) (syntax-case cl (keyword ...) [[(keyword var ...) e1 (... ...) e2] #'[(pred-id tmp) (let ([var (acc-id tmp)] ...) e1 (... ...) e2)]] ... [_ (syntax-violation 'name "invalid clause syntax" stx cl)]))) (syntax-case stx () [(_ expr clause (... ...)) (with-syntax ([(clause (... ...)) (map parse-clause #'(clause (... ...)))]) #'(let ([tmp expr]) (cond clause (... ...) [else (assertion-violation 'name "unhandled argument" tmp)])))] [_ (syntax-violation 'name "invalid syntax" stx)])))))] [_ (syntax-violation 'define-destructor "invalid syntax" stx)])))
A few explanations are in order. First of all, we see nested ellipses
in the code above. Using syntax-case
we can match a syntax object
of the form ((a b c) (1 2))
against a pattern of the form ((x ...)
...)
. The pattern variable x
will then represent a list of two
lists; the first list will contain the elements a
, b
, and c
, the
second list will contain the elements 1
and 2
. In syntax
templates, the pattern variable can be used as long as least two
ellipses follow. For example, the template ((x ...) ...)
gives back
((a b c) (1 2))
, while (x ... ...)
gives (a b c 1 2)
.
We also have to explain the occurrences of (... ...)
. In the
definition of define-destructor
, the outer syntax form has to
evaluate into a syntax object that contains ellipses, so we have to
keep the outer syntax form from interpreting these ellipses that
should be in the output syntax object, in other words, we have to
escape them. The Scheme way of doing this, is to write (... x)
. If
syntax
sees a sub-template like this one, it processes x
and
returns the result but gives the ellipsis in x
the status of an
ordinary identifier.
Coming back to our tree example, the define-destructor
syntax can be
used as follows:
(define-destructor tree-case [node node? node-left node-value node-right] [leaf leaf? leaf-value])
Now we can redefine tree-accumulate
and test it:
(define tree-accumulate (lambda (t) (tree-case t [(node left value right) (+ (tree-accumulate left) value (tree-accumulate right))] [(leaf value) value]))) (tree-accumulate t)
14
7. Breaking hygiene
Scheme macros written with syntax-rules
are hygienic. This is also
true by default for macros written with the more general
syntax-case~/~syntax
combination. Hygiene — although it may take
some time to understand — is one of the selling points of Scheme
macros and one (of many) reasons why Scheme macros are so more
powerful than, say, macros in C or even in Common Lisp.
Sometimes, however, we want to break hygiene explicitely. We give a number of concrete examples:
7.1. A classical loop macro
A classical example for this is a loop
macro that provides a loop
that evaluates the code enclosed in it repeatedly until a
corresponding break
command is evaluated. A typical use looks like
the following (again, not necessarily a good example for functional
programming!):
(let ([i 0]) (loop (when (= i 5) (break)) (display i) (newline) (incr! i)))
Our first attempt to implement the loop
construct with a macro is
the following syntax definition:
(define-syntax loop (lambda (stx) (syntax-case stx () [(_ e ...) #'(call-with-current-continuation (lambda (break) (let f () e ... (f))))])))
Here we make use of the fact that Scheme has first-class
continuations. The call to call-with-current-continuation
captures
the continuation of the named let
expression.
Nevertheless, our example code that is supposed to print the numbers
zero to four won't work with this version of the loop
keyword. Our
Scheme system will tell us that break
is an undefined identifier (or
refer to a predefined top-level identifier with this name). Although,
it won't say that, but hygiene is to be blamed for it.
As the identifier break
bound in the output of loop
does not
come from the input of the loop
form, it has a different history
than the identifier break
appearing in the body of the loop form.
Identifiers with different histories do not shadow each other, so the
break
in the loop body cannot reference the binding of break
coming from the template in the loop
macro.
One way to solve it is to provide break
as an explicit argument to
the loop macro (we put a star to the name to mark the new syntax):
(define-syntax loop* (lambda (stx) (syntax-case stx () [(_ break e ...) #'(call-with-current-continuation (lambda (break) (let f () e ... (f))))])))
With this modification, everything works:
(let ([i 0]) (loop* break (when (= i 5) (break)) (display i) (newline) (incr! i)))
This solution has one more advantage besides that it actually works — it allows us to specify the name we want to use for the expression breaking out of the loop. For example, it allows us to easily nest two of the loops:
(let ([i 0]) (loop* break-outer (loop* break-inner (when (= i 5) (break-outer)) (when (= i 2) (break-inner)) (display i) (newline) (incr! i)) (display "-\n") (incr! i)))
(Note how hygiene again helps to make this possible. Both macro
instances bind the identifier f
, but the occurrences of f
correspond to different histories so they don't shadow each other.)
While the version with an explicit break
argument to the loop*
macro has its advantages, sometimes we still may want the more terse
syntax with an implicit break
parameter. To make our original
version of loop
work, we must not introduce a break
identifier
with a different history. Instead, we must output break
as if it
appeared as an argument to the macro use. In other words, we have to
forge an identifier and datum->syntax
was the tool to do this:
(define-syntax loop (lambda (stx) (syntax-case stx () [(k e ...) (with-syntax ([break (datum->syntax #'k 'break)]) #'(call-with-current-continuation (lambda (break) (let f () e ... (f)))))])))
Here, for the first time, we make use of the keyword identifier of the
macro use, which we bound to the pattern variable k
. The call to
datum->syntax
then returns an identifier named break
as if it
appears where the macro use keyword appears, that is with the same
history and the same lexical context.
Let us test our example with this new version!
(let ([i 0]) (loop (when (= i 5) (break)) (display i) (newline) (incr! i)))
7.2. Convenience syntax to bind implicit identifiers
Above, we used the datum->syntax
procedure together with
with-syntax
explicitly to inject an identifier as if it appeared at
the macro use site into the template. Chez Scheme provides a
syntactic form with-implicit
that abstracts from this low-level
approach. While the with-implicit
form is non-standard, thanks to
Scheme's macro system we can define it in any standard system:
(define-syntax with-implicit (lambda (x) (syntax-case x () [(_ (k x ...) e1 ... e2) (for-all identifier? #'(k x ...)) #'(with-syntax ([x (datum->syntax #'k 'x)] ...) e1 ... e2)] [_ (syntax-violation 'with-implicit "invalid syntax" x)])))
With it, we can rewrite our loop
macro as follows:
(define-syntax loop (lambda (stx) (syntax-case stx () [(k e ...) (with-implicit (k break) #'(call-with-current-continuation (lambda (break) (let f () e ... (f)))))])))
7.3. Definitions that make the bound name accessible
For those who didn't like the loop
example because it is mostly
useful in imperative programming, we have provided another example,
that we will describe in this subsection.
User-friendly procedures check their arguments so that errors are reported early:
(define reverse-append (lambda (head tail) (unless (list? head) (assertion-violation 'reverse-append "invalid list argument" head)) (let f ([head head] [tail tail]) (cond [(null? head) tail] [(pair? head) (f (cdr head) (cons (car head) tail))] [else (assertion-violation 'reverse-append "concurrent modification detected")]))))
Just a brief test:
(reverse-append '(1 2 3) '(4 5 6))
(3 2 1 4 5 6)
The Scheme procedure that is used here to report an error is
assertion-violation
. Its first formal argument is called who
and
(if not #f
) should be a string or symbol naming the procedure where
the error occurs.
One can make the point that the code above again violates some
instance of the DRY principle because we had to type the name of the
procedure, reverse-append
in this case, three times. The following,
non-hygienic, macro, which can also be found in the source code of
Chez Scheme and in one of Racket's libraries, helps:
(define-syntax define/who (lambda (x) (define out (lambda (k f e) (with-syntax ([k k] [f f] [e e]) (with-implicit (k who) #'(define f (let ((who 'f)) e)))))) (syntax-case x () [(k (f . u) e1 ... e2) (identifier? #'f) (out #'k #'f #'(lambda u e1 ... e2))] [(k f e) (identifier? #'f) (out #'k #'f #'e)] [_ (syntax-violation 'define/who "invalid syntax" x)])))
With define/who
we can define a variable (or procedure) as with
define
. Moreover, the identifier who
(with a lexical and historic
context as the keyword define/who
in the macro use) is bound to the
name of the variable (or procedure) being defined.
With define/who
, the definition of reverse-append
looks like:
(define/who reverse-append (lambda (head tail) (unless (list? head) (assertion-violation 'who "invalid list argument" head)) (let f ([head head] [tail tail]) (cond [(null? head) tail] [(pair? head) (f (cdr head) (cons (car head) tail))] [else (assertion-violation 'who "concurrent modification detected")]))))
We can compare who
with the predefined identifier __func__
that
can be found in the C99 standard. With Scheme and its macro system,
however, this becomes a library feature and need not be a language
feature.
7.4. Definitions of constants
In Scheme, we can use define
to, well, define a variable. This
variable can be set!
by other parts of the code, possibly
accidentally. So we may want to define a variable-like object that
behaves more like a constant. One option is to use the
identifier-syntax
form, we already saw at the beginning of the
tutorial:
(define-syntax pi (identifier-syntax 3.14159)) pi
3.14159
If we tried to mutate the "variable" pi
now, the Scheme system would
raise an exception.
This is a good point to give the actual definition of
identifier-syntax
. Like syntax-rules
, it can be defined by the
more primitive forms syntax-case
and syntax
:
(define-syntax identifier-syntax (syntax-rules (set!) [(_ e) (lambda (x) (syntax-case x () [id (identifier? #’id) #’e] [(_ x (... ...)) #’(e x (... ...))]))] [(_ (id exp1) ((set! var val) exp2)) (and (identifier? #’id) (identifier? #’var)) (make-variable-transformer (lambda (x) (syntax-case x (set!) [(set! var val) #’exp2] [(id x (... ...)) #’(exp1 x (... ...))] [id (identifier? #’id) #’exp1])))]))
This definition can be found exactly in this form in the R6RS,
describing the Scheme language and its standard libraries. Again, we
see the occurrence of the quoted ellipses (... ...)
, which is
necessary because of the nesting of templates (remember that the right
hand side of syntax-rules
rules are syntax
templates).
We also note the two patterns within the first syntax-case
. The
first pattern is of the form id
where id
is an identifier, the
second pattern is of the form (_ x ...)
where x
is an arbitrary
form. The first pattern will match if the identifier, pi
in our
example, is not used in head-position of a combination; the second
pattern will match if pi
is used in the form (pi x ...)
. The
latter does not make sense for pi
, but we see that
identifier-syntax
allows us to define procedure-like identifiers
that behave differently when the directly applied or when referenced.
In the second part of the definition of identifier-syntax
, the
procedure make-variable-transformer
is used. This turns a macro
transformer given by a procedure (mapping syntax objects to syntax
objects) into a variable-transformer. A variable-transformer does
the same mapping between syntax objects but will also be called by the
expander when it processes an expression of the form (set! id form)
where id
is the keyword bound to a variable-transformer.
Now, 3.14159 is not the most precise value of pi
. We get a value
whose precision is adapted to the precision of Schemes inexact real
numbers by using the formula (* 2 (atan 1 0))
. This directly leads to the
following attempt of redefining pi
:
(define-syntax pi (identifier-syntax (* 2 (atan 1 0)))) pi
3.141592653589793
This is not a good solution, though. Every time, we reference pi
,
Scheme replaces it by the expression (* 2 (atan 1 0))
, so unless we
can rely on a sufficiently optimizing compiler, we will have pi
recalculated every time we use it. A better approach is to calculate
the value once, store this is a variable and expand into a reference
to it:
(define-syntax define-constant (lambda (stx) (syntax-case stx () [(_ id expr) (identifier? #'id) #'(begin (define val expr) (define-syntax id (identifier-syntax val)))])))
Again, we have macro-defining macro. Thanks to hygiene, the variable
val
cannot be accessed outside the macro. Let's test our new macro:
(define-constant pi (* 2 (atan 1 0))) pi
3.141592653589793
The number pi
is just a single constant, so let us now turn to a use
case where not only more than one constant but many constants are
needed. For concreteness, let us assume that want to develop a
library handling ELF files. We could start with defining all the
magic constants:
(define-constant et-none #x00) (define-constant et-rel #x01) (define-constant et-exec #x02) ... (define-constant pt-null #x00000000) (define-constant pt-load #x00000001) ...
This is not much different from what we would do in C. However, it has the same problem. It pollutes our top-level namespace. In Scheme, this is mitigated a bit due to the library system (which allows one to confine these constants in a module); nevertheless a library that exports myriads of identifiers (and where the exact set of identifiers may depend on the version of the ELF format) is not a good idea.
A way out is — you will already have guessed it — the Scheme macro
system. We will implement a macro define-constants
that can be used
as follows:
(define-constants elf-constant (et-none #x00) (et-rel #x01) ...) (elf-constant et-none) ; => 0x01
Moreover, we want this definition to bind the identifier
elf-constants
(note the "s") to a procedure returning an association
list of the form
((et-none . #x00) (et-rel . #x01) ...)
For this, we first need a procedure that takes an identifier like
elf-constant
and constructs a new identifier, elf-constants
in
this case, from it:
(define/who construct-name (lambda (k . arg*) (unless (identifier? k) (assertion-violation who "invalid template identifier argument" k)) (datum->syntax k (string->symbol (apply string-append (map (lambda (x) (cond [(string? x) x] [(identifier? x) (symbol->string (syntax->datum x))] [else (assertion-violation who "invalid string or identifier argument" x)])) arg*))))))
This procedure takes a template identifier k
and a sequence of
strings and identifiers to forge and return an identifier with the
same lexical and historic context as k
and whose name is given by
the concatenation of the sequence of strings and identifier (names).
With it, we can define our define-constants
easily:
(define-syntax define-constants (lambda (x) (syntax-case x () [(_ t (n c) ...) (and (identifier? #'t) (for-all identifier? #'(n ...))) (with-syntax ([ts (construct-name #'t #'t "s")] [(e ...) (generate-temporaries #'(c ...))]) #'(begin (define e c) ... (define-syntax t (lambda (x) (syntax-case x () [(_ y) (identifier? #'y) (cond [(assq (syntax->datum #'y) (list (cons 'n #'e) ...)) => cdr] [else (syntax-violation 't "unknown constant" x #'y)])] [_ (syntax-violation 't "invalid syntax" x)]))) (define ts (lambda () '([n . c] ...)))))])))
The macro is programmed so that the lookup of the constant happens at expand-time and not at run-time. Let us test it:
(define-constants color (salmon #xFA8072) (light-green #x90EE90) (cornflower-blue #x6495ED)) (colors)
((salmon . 16416882) (light-green . 9498256) (cornflower-blue . 6591981))
(color cornflower-blue)
6591981
7.5. A pitfall
Although the macros loop
and define/who
defined above are
non-hygienic, they are only so in a controlled sense. They behave as
if the user has provided an explicit break
or who
identifier, so
do not really differ from a hygienic macro, which makes reasoning
about them still easy.
Nevertheless, there is still a potential pitfall, we are going to
explain now. Consider the following definition of the macro
define-logging/who
:
(define-syntax define-logging/who (syntax-rules () [(define-logging/who (name . formals) body1 ... body2) (define/who (name . formals) (display "log: entering procedure ") (display who) (newline) body1 ... body2)]))
The macro defines a "logging procedure", a procedure that prints a log message when it is called:
(define-logging/who (hello)
(display "Hello!\n"))
(hello)
log: entering procedure hello Hello!
The define-logging/who
macro's output is an instance of the
define/who
macro from earlier. The define-logging/who
macro makes
use of the implicitly defined who
identifier.
We named the macro define-logging/who
with the suffix /who
because
the idea is that the user of the define-logging/who
macro can also
refer to the procedure's name through the implicitly bound identifier
who
. This, however, is not the case as the following test shows:
(let ([who 'outer])
(define-logging/who (return-who)
who)
(return-who))
outer
The reason is that historic context of the identifier who
is the
same as the historic context of the identifier define/who
that
occurs in the syntax template in the definition of
define-logging/who
. The historic context of the identifier
define/who
, which does not come from the macro input in the use of
define-logging/who
, is therefore not the same as those of the
identifiers who
appearing in the source of our test.
This problem cannot be easily mitigated bar explicitly define who
a
second time with the historic context of the final macro use. One
could think a possible solution would be the following rewrite:
(define-syntax define-logging/who (lambda (stx) (syntax-case stx () [(k (name . formals) body1 ... body2) (with-implicit (k define/who who) #'(define/who (name . formals) (display "log: entering procedure ") (display who) (newline) body1 ... body2))])))
Here, the identifier define/who
is put in the macro output of
define-logging/who
with the same history as the keyword in the macro
use of define-logging/who
. This will create who
with the historic
context of the macro use of define-logging/who
(and that's why we
had to add who
to the list of implicit identifiers as well), but it
will only work if define/who
is also bound (to the correct macro) at
the use site of the macro define-logging/who
. Such an assumption,
however, should not be made. (In the section after next we are going
to finally give a solution that works, but it needs an extension which
is not in R6RS.)
Our other example for a macro with implicit (unhygienic) identifiers
was define-constant
where the plural form of the name of the defined
macro was a forged identifier. That identifier's history, however,
was not derived from the macro keyword in the macro use but from the
(singular) name of the defined macro. This also helps mitigating the
problem of nested macro invocations described above.
8. Phasing
Phasing is an issue that is not specific to the particular macro system of Scheme or hygienic macros but occurs with procedural macros when the system distinguishes between run-time and expand-time. The latter distinction is important, for example, for the possibility of ahead-of-time compilation. As Scheme allows to evaluate code at run time, which then has to be expanded first, run-time and expand-time can be interleaved. The latter can also be due to the library system of Scheme; a library may need to be run first before another library can be expanded because the macro transformers may reference the code of the first library.
8.1. (Relative) phases
When the expressions in a program or library are evaluated, the evaluation happens at a specific relative phase. These relative phases are non-negative integers. The top-level expressions are evaluated at relative phase 0. The right-hand sides of top-level variable definitions are also evaluated at relative phase 0. The right-hand sides of top-level syntax definitions are evaluated at relative phase 1 (which means: "earlier"). The right-hand side of a variable definition appearing within an expression evaluated at relative phase n is also evaluated at relative phase n. The right-hand side of a syntax definition appearing within an expression evaluated at relative phase n is evaluated at relative n + 1.
In other words, define-syntax shifts the phase by one for its right-hand side.
The following code should make this clearer:
(begin (define x 4) (define-syntax foo (lambda (stx) (define-syntax bar (lambda (stx) #'3)) (+ 1 bar))) (+ x foo))
Assume that this expression is evaluated at phase n (0 if appearing
at the program top-level). The right-hand side of the definition of
x
, the reference to x
and the use of the macro foo
are evaluated
at relative phase n. The transformer expression of foo
is
evaluated at relative phases n + 1, and the transformer expression
of bar
is evaluated at relative phase n + 2.
8.2. Identifier references at different phases
A variable (an identifier bound to a location holding a value) can be referenced by an expression if the expression is evaluated at the same relative phase as the initializing expression of the variable. A keyword (an identifier bound to a macro transformer) can be referenced by an expression if the expression is evaluated at the same or higher relative phase than the phase of the transformer expression of the keyword minus one.
The following code is erroneous, for example:
(let ([counter 0]) (define-syntax count! (lambda (stx) (syntax-case stx () [(_) (begin (set! counter (+ 1 counter)) #'(values))]))) (count!) counter)
The variable counter
can only be referenced at relative phase 0,
thus not in the right-hand side of the syntax definition of count!
,
which is evaluated at relative phase 1. One can understand this
restriction as follows: The variable counter
only exists at
run-time, not at compile-time of the program, but the transformer
associated to count!
us used at compile-time.
On the other hand, the following example is correct code:
(let-syntax ([foo (lambda (stx) #'1)]) (define-syntax bar (lambda (stx) (define-syntax quux (lambda (stx) foo)) quux)) bar)
If a procedure needs to be used in different phases, in Scheme the library system can be used. If a library exports a variable (bound to the procedure) or any other identifier, the identifier can be imported at any relative phase.
9. Extensions
In this section, we will describe three extensions to Scheme macro's system that are not yet standardized in one the reports. All three extensions are supported by Chez Scheme, so we can experiment with them.
9.1. Aliases
Given an identifier x
, we may want to use it under a different name
as well. The first attempt of defining an alias for x
may look
like:
(let ([x 'old]) (define-syntax y (identifier-syntax [_ x] [(set! _ e) (set! x e)])) (set! y 'new) x)
new
This solution has no run-time overhead because y
is a keyword and
not a variable. Whenever we access y
, Scheme's expander rewrites it
into an access of x
. In a lot of cases, this is all that we need.
Still, y
is not a true alias to x
. This is demonstrated by the
following test:
(let ([x 'old]) (define-syntax y (identifier-syntax [_ x] [(set! _ e) (set! x e)])) (free-identifier=? #'x #'y))
#f
The reason that the result of the free-identifier=?
test is #f
is
that x
and y
are bound differently. The identifier x
is bound
to a location holding a value (x
is a variable); the identifier y
is bound to a macro transformer (y
is a keyword).
The alias
form described in SRFI 212 allows to define true aliases.
The syntax is (alias y x)
, which can be used wherever definitions
are allowed. It arranges that y
has the same binding as x
:
(let ([x 'old]) (alias y x) (set! y 'new) (list (free-identifier=? #'x #'y) x))
(#t new)
(The reason why alias
is not called define-alias
is that it does
not define a new binding; it just gives an existing binding (the one
of x
) a new name (y
).)
With the alias
form, there is also a general solution to the general
problem of nested unhygienic macros that we exhibited with
define-logging/who
:
(define-syntax define-logging/who (lambda (stx) (syntax-case stx () [(k (name . formals) body1 ... body2) (with-syntax ([(tmp-id) (generate-temporaries #'(tmp))]) (with-syntax ([local-define/who (construct-name #'k #'tmp-id)]) (with-implicit (k who) #'(begin (alias local-define/who define/who) (local-define/who (name . formals) (display "log: entering procedure ") (display who) (newline) body1 ... body2)))))])))
Here, we generate a fresh identifier and construct from its name an
identifier with the context of the keyword of the use of
define-logging/who
. This identifier is then aliased to define/who
but is used instead so that the transformer bound to define/who
(and
now also bound to define-logging/who
forges the who
identifier
with the right context.
Let us test our new version:
(let ([who 'outer])
(define-logging/who (return-who)
who)
(return-who))
return-who
9.2. Syntax parameters
Syntax parameters, which are like aliases also described in a SRFI, namely SRFI 139, provide an alternative to unhygienic macros when implicit macro parameters are needed.11
In what follows, we use Chez Scheme's implementation so that we can readily test our examples.
Chez Scheme defines the fluid-let-syntax
form, whose syntax is
equivalent to let-syntax
, but which is to let-syntax
as
fluid-let
is to let
. In other words, it rebinds a keyword for the
dynamic extent of the expansion of the body of fluid-let-syntax
:
(let-syntax ([x (identifier-syntax 'outer)]) (define-syntax y (identifier-syntax x)) (list (fluid-let-syntax ([x (identifier-syntax 'inner)]) y) y))
(inner outer)
Compare this to let-syntax
:
(let-syntax ([x (identifier-syntax 'outer)]) (define-syntax y (identifier-syntax x)) (list (let-syntax ([x (identifier-syntax 'inner)]) y) y))
(outer outer)
With the use of syntax parameters (keywords that are rebound by
fluid-let-syntax
), we can give a definition of our loop
macro as a
hygienic macro:
(define-syntax break (lambda (stx) (syntax-violation 'break "invalid use outside of loop form" stx))) (define-syntax loop (syntax-rules () [(loop e ...) (call/cc (lambda (k) (fluid-let-syntax ([break (syntax-rules () [(break) (k)])]) (let f () e ... (f)))))]))
Let us test this new version:
(let ([i 0]) (loop (when (= i 5) (break)) (display i) (newline) (incr! i)))
0 1 2 3 4
In this example, datum->syntax
does not appear so the macro defined
above is indeed hygienic. The identifier break
was not newly
introduced by the macro use but already existed in the lexical context
of the macro use.
It should be noted that the new loop
macro has a different semantics
than the unhygienic loop
macro from earlier. In our original loop
macro, the expression (break)
ends the loop when break
is
"bound-identifier=?
" to the implicit identifier forged by the first
version of the loop
macro. In the version with syntax parameters,
the expression (break)
ends the loop when break
is
"free-identifier=?
" to the global identifier named break
. Which
semantics is the better one depends on the use case. On the one hand
side, hygienic macros are preferable to unhygienic ones. On the other
hand side, syntax parameters have the same problem associated to
variables with dynamic scope: Their change of the behavior of code is
not lexically confined.
The Chez Scheme User's Guide contains a very interesting use of syntax parameters, which we want to reproduce here:
(define-syntax define-integrable (syntax-rules (lambda) [(_ name (lambda formals form1 form2 ...)) (begin (define xname (fluid-let-syntax ([name (identifier-syntax xname)]) (lambda formals form1 form2 ...))) (define-syntax name (lambda (x) (syntax-case x () [_ (identifier? x) #'xname] [(_ arg (... ...)) #'((fluid-let-syntax ([name (identifier-syntax xname)]) (lambda formals form1 form2 ...)) arg (... ...))]))))]))
The define-integrable
keyword can be used as follows:
(define-integrable count (lambda (ls) (if (null? ls) 0 (+ (car ls) (count (cdr ls)))))) (list (count '(1 2 3)) (procedure? count))
(6 #t)
This definition binds count
to a keyword that behaves like an
immutable variable bound to a procedure. However, when count
is
used in the form (count '(1 2 3))
the procedures body is inlined at
the macro use site, allowing for more local optimizations. We chose
the example of a recursive procedure because it demonstrates a
difficulty: If count
in the procedure's body were also inlined and
so on, we would get an infinite macro expansion. Instead, the
implementation of define-integrable
uses a syntax parameter to
rebound count
in the procedure body so that it is not further
inlined.
9.3. Identifier properties
It is often the case that two different macros have to communicate.
In its simplest form we already saw it, namely in the case of macros
whose behavior depends on the presence of auxiliary syntax (another
macro) in their inputs. A typical example is Scheme's cond
keyword
(implementable as a macro in terms of if
) that uses the else
auxiliary syntax.
Such an auxiliary keyword can function as a yes/no flag. Sometimes, however, we may be interested in more than a boolean value. This can be done with identifier properties, which are implemented in Chez Scheme and also described in SRFI 213.
An identifier property are superficially similar to symbol properties of Lisp, but there are important differences making them work with Scheme's macro system. Identifier properties are associated with an existing binding of an identifier and thus automatically lexically scoped. Each property is keyed by the binding of another identifier, so also property keys are lexically scoped.
The define-property
form, which can be used where ever a definition
can be used, associates properties with identifiers:
(define add1 (lambda (x) (+ 1 x))) (define key) (define-property add1 key #'"value")
If the define-property
appears in a context evaluated at relative
phase n, the very right-hand side of define-property
is evaluated
at relative phase n + 1, much like the right-hand side of
define-syntax
. This means that the property value, "(syntax "value")"
in
this example, is accessible at expand-time, but not a run-time.
Regardless of the identifier property attached to add1
, the
identifier is still a variable resolving to a procedure adding one to
its argument:
(add1 2)
3
Macro transformers can retrieve the values of identifier properties.
If the need to do so, they have to return a procedure instead of a
syntax object. The returned procedure must accept one argument,
lookup
and should return the syntax object which is the result of
the transformation. The lookup
procedure takes two arguments id
and key
, which must be bound identifiers, and returns the value of
the identifier property associated to id
and keyed by key
, or #f
if there is none:
(let-syntax ([x (lambda (stx) (lambda (lookup) (syntax-case stx () [(_ key) (or (lookup #'add1 #'key) #'"no-value")])))]) (define other-key) (list (x key) (x other-key)))
("value" "no-value")
While this example theoretically describes how identifier properties
work, it doesn't show their usefulness. A more practical example is
given by another loop macro, modeling general for
, we are going to
present:
(define key) (define-syntax define-iterator (lambda (stx) (syntax-case stx () [(define-iterator name parser-expr) (identifier? #'name) #'(begin (define-syntax name (lambda (stx) (syntax-violation 'name "invalid use of for keyword" stx))) (define-property name key (let ([parser parser-expr]) (unless (procedure? parser) (assertion-violation 'define-iterator "invalid parser" parser)) parser)))]))) (define-syntax for (lambda (stx) (lambda (lookup) (define parse-clause (lambda (cl) (syntax-case cl () [(formals keyword . arg) (identifier? #'keyword) (let ([keyword #'keyword]) (define parser (lookup keyword #'key)) (unless (procedure? parser) (syntax-violation 'for "invalid for iterator" stx keyword)) (let-values ([(outer-var* var* loop-var* outer-expr init-expr test-expr loop-expr step-expr) (parser stx cl)]) (list outer-var* var* loop-var* outer-expr init-expr test-expr loop-expr step-expr)))]))) (syntax-case stx () [(_ (clause ...) command ...) (with-syntax ([(((outer-var ...) (var ...) (loop-var ...) outer-expr init-expr test-expr loop-expr step-expr) ...) (map parse-clause #'(clause ...))]) #'(let-values ([(outer-var ...) outer-expr] ...) (let-values ([(var ...) init-expr] ...) (let f ([var var] ... ...) (unless (or test-expr ...) (let-values ([(loop-var ...) loop-expr] ...) command ... (let-values ([(var ...) step-expr] ...) (f var ... ...))))))))])))) (define-iterator in-list (lambda (stx cl) (syntax-case cl () [(var _ list-expr) (identifier? #'var) (values #'() #'(tmp) #'(var) #'(values) #'list-expr #'(null? tmp) #'(car tmp) #'(cdr tmp))]))) (define-iterator in-range (lambda (stx cl) (syntax-case cl () [(var _ start-expr end-expr) (identifier? #'var) (values #'(end) #'(i) #'(var) #'end-expr #'start-expr #'(>= i end) #'i #'(+ i 1))])))
The public API for our new loop facility consists of the for
keyword
and the define-iterator
defining keyword. Moreover, we defined two
iterator forms, one for going through a list and the other one for going
through a numeric range. The loop facility is extensible because the
user can define more iterator forms using define-iterator
.
A typical use of a for
loop can look like the following:
(for ([x in-list '(a b c)] [i in-range 0 10]) (display (list x i)) (newline))
(a 0) (b 1) (c 2)
In a language without a powerful macro system as Scheme possesses it, if it doesn't ship a suitable looping construct for our needs, we can't do anything except hoping for a future version of the language that includes more looping constructs. In Scheme, on the other hand, a small and simple core suffices as we can build syntactic abstractions as much as we can build procedural abstractions ourselves. Many advertised "brand new" features of en-vogue or not-so-en-vogue languages may sound like a big yawn to a Schemer.
In our implementation of the for
macro above, we used identifier
properties on the iterator keywords to communicate with the main for
macro. This hints at how we can build powerful, extensible
sub-languages into Scheme.
10. Complex examples
10.1. An LR(1) parser generator implemented as a Scheme macro
The power of Scheme's procedural macros enables us to stay within Scheme and within one Scheme process all the time. Let us consider a parser generator like GNU Bison as a case study. A classical parser generator is a separate executable that takes a grammar (for some formal language) interspersed with semantic actions and outputs source code implementing a parser for that language.
The process is rather fragile as it depends a lot on text substitutions and the parser generator is ignorant about the embedded semantic code in the target language. Moreover, an external tool is needed.
In Scheme, on the other hand, we can write a macro that takes a grammar (described using Scheme's ordinary lexical syntax) and semantic expressions and replaces it at compile-time with the definition of a parser of this languages, obeying, among other things. the lexical scoping of the embedded semantic expressions.
We have written such a macro that implements an LR(1)-parser generator in roughly 1000 lines to demonstrate a highly non-trivial macro (or sub-language). The source code can be found in the library directory of this tutorial's GitHub repository.
It is written as an R6RS library, which we can readily import:
(import (languages))
The following gives an example of how a grammar together with semantic actions is defined:
(define-language (make-parser token token?) (nonterminals exp term factor) (terminals NUM "+" "-" "*" "/" "(" ")") (rules [(exp -> exp "+" term) (+ exp term)] [(exp -> exp "-" term) (- exp term)] [(exp -> term)] [(term -> term "*" factor) (* term factor)] [(term -> term "/" factor) (/ term factor)] [(term -> factor)] [(factor -> NUM)] [(factor -> "(" exp ")") exp]) (start exp))
This definition binds make-parser
to a thunk (a procedure taking no
arguments) that when called returns a parser for the defined
language. A parser is a procedure. Calling it with one value, a
token, pushes this token in the parser. Calling it with zero
values, signals an end of the input and the procedure returns the
semantic value of the sentences consisting of the token pushed so far.
A convenience procedure parse
is defined that pushes a fixed number
of tokens and is used in the following example:
(parse (make-parser) (token NUM 3) (token "+") (token NUM 2) (token "*") (token NUM 7) (token "-") (token NUM 1))
16
11. Exercises
- Write a macro
push!
such that(push! list-variable expression)
prepends the value ofexpression
to the list bound to thelist-variable
. - Write a macro
when-all
such that(when-all test-expression ... expression)
evaluatesexpression
only if alltest-expressions
evaluate to#f
. The macro should short-circuit the evaluation thetest-expressions
as soon as one evaluates to#f
. - Write a macro
alist
such that(alist key1 value key2 value2 ... )
expands into a literal expression of the form'((key1 value1) (key2 value2) ...)
. - Let
x
be a pattern variable that represents a list of syntax objects andy
a pattern variable that represents a list of lists of syntax objects. Find out the result of#'(((x y) ...) ...)
. - Write a macro
timestamp
such thattimestamp
expands into a number literal counting the number of uses oftimestamp
. - Rewrite
fluid-let
as a recursive macro that does not usegenerate-temporaries
. - Write a procedure
symbolic-identifier=?
so that(symbolic-identifier=? id1 id2)
returns#t
if and only if the two identifiersid1
andid2
have the same symbolic name. - Extend the
loop
macro so that evaluating(continue)
skips the rest of the current loop iteration and the loop continues with the next iteration. - Modify the
for
macro into a functional form supporting user-definable accumulators whose final result are returned by thefor
expression. - Write a pattern matching Scheme macro.
- Make the parser generator more user friendly. It should provide more information at compile-time when shift/reduce and reduce/reduce conflicts are reported and at run-time when syntax errors are reported. Implement operator precedence and associativity rules to handle some shift/reduce and reduce/reduce conflicts automatically.
- Write a version of the lex scanner generator as a Scheme macro.
Footnotes:
This may change when the R7RS-large standardization effort is finished. Both, R6RS and R7RS-small, are successors (and extensions) to R5RS (1998), but R7RS-small was never meant to be seen in isolation as a successor to R6RS. Time will tell whether the R7RS large language will be able to replace R6RS when it is finally done. It is planned to include the R6RS macro facility and the extensions discussed here in R7RS-large.
While Scheme does not forbid mutation (like ML but unlike
Haskell), the pitfalls of impure code are well-understood. Therefore,
the names of Scheme procedures and syntax that modifies locations in
the store (the Scheme model of the computer's memory) end with a !
by convention.
More precicely: as a variable holding a procedure value.
When interactively testing our procedures and syntax (keywords), one has to be careful that we have given both the procedure and the keyword the same name. Evaluating one of the definitions will overwrite the meaning of the other one.
Such a form is predefined in Chez Scheme, but is not part of the Scheme standard. In fact, Chez Scheme's version properly handles tail calls, which our simple version doesn't.
Note that this cannot be done with C preprocessor macros.
The term "history" of an identifier is not an established one but was invented for this presentation.
When used in programs or libraries, there is one problem in the first approach because of so-called phasing issues. We will come back to this.
In fact, the standard binding constructs in Scheme let
,
let*
, or letrec
can also be defined as macro keywords in terms of
the more primitive lambda
using the macro facility described in this
report.
Remember that Scheme has first-class continuation. But this is a topic for a different tutorial.
In-depth advanced information can be found in the paper Keeping it Clean with Syntax Parameters.