Stanza's Macro System

(This chapter is still a work in progress - May 31, 2022)

This chapter teaches you about Stanza's macro system: what macros are, how to write your own, and some examples of using them.

Stanza's macro system results from the combination of three separate concepts and subsystems:

  1. an s-expression-based programmatic code transformation system,
  2. an extensible grammar system, and
  3. a template-based code generation utility.

What Is a Macro?

A macro is a syntactic shorthand for some longer piece of code. Stanza's core design relies heavily upon macros, and we have already seen many of them.

As a simple example, the following shows off the while macro that is included with Stanza's core library:

var counter:Int = 0
while counter < 10 :
  println("counter = %_" % [counter])
  counter = counter + 1

If you don't want to use the while loop, you can also type the following instead:

var counter:Int = 0
defn* loop () :
  if counter < 10 :
    println("counter = %_" % [counter])
    counter = counter + 1
    loop()
loop()

It would work just the same. It's just slightly more verbose.

Here is another example. This is the syntax that we typically use to call the do function.

for i in 0 to 10 do :
  println("This is inside the loop.")
  println("i is equal to %_" % [i])

But, similarly, you could also type the following instead:

do(
  fn (i) :
    println("This is inside the loop.")
    println("i is equal to %_" % [i])
  0 to 10)

It looks slightly uglier, but it works just the same.

The key point is that macros are just a syntactic abbreviation. A user is always free to choose not to use a macro if they are willing to type out the longer form of the code instead.

Defining and Using Your First Macro

When debugging, it is common to write code that looks like this:

println("DEBUG: x = %~" % [x])

It prints out the current value of a variable.

Let's write a macro that will allow us to type this instead:

PROBE(x)

and have it automatically expand into the code above.

Creating the Macro File

Create a new file called debugmacros.stanza containing the following contents:

defpackage debugmacros :
  import core
  import collections
  import stz/core-macros

defsyntax mydebugmacros :
  import exp4 from core

  defrule exp4 = (PROBE(?myvariable)) :
    val format-string = to-string("DEBUG: %_ = %%~" % [name(unwrap-token(myvariable))])
    val form = qquote(println(~ format-string % [~ myvariable]))
    parse-syntax[core / #exp](form)

This file defines a new syntax package called mydebugmacros, which contains the definition of a new macro called PROBE.

Extending the Compiler

In order to use the new macro definition, we need to first extend Stanza with the new macro definitions.

Open your terminal and type in the following:

stanza extend debugmacros.stanza -o myextendedstanza

This will result in a new Stanza compiler called myextendedstanza that now supports the new syntax.

Using Your New Macro

Create a new file called trymacros.stanza containing the following contents:

#use-added-syntax(mydebugmacros)
defpackage trymacros :
  import core
  import collections

defn main () :
  val x = 10
  val y = "Hello world"
  val z = x * 10
  PROBE(x)
  PROBE(y)
  PROBE(z)

main()

And use your new extended compiler to compile and run it:

./myextendedstanza trymacros.stanza -o trymacros
./trymacros

It should print out:

DEBUG: x = 10
DEBUG: y = "Hello world"
DEBUG: z = 100

That's a useful utility! Notice that this abbreviation is something that could only be written as a macro. It is not possible to write a function that behaves like PROBE.

Exploring Further

Our debugmacros.stanza file introduces a number of new concepts: defsyntax, defrule, parse-syntax. Let's explore each in turn.

Programmatic Code Transformations

The body of the defrule construct is allowed to contain arbitrary Stanza code.

Add the following prints to the code:

defpackage debugmacros :
  import core
  import collections
  import stz/core-macros

defsyntax mydebugmacros :
  import exp4 from core

  defrule exp4 = (PROBE(?myvariable)) :
    println("Implementation of PROBE macro.")
    println("myvariable = %~" % [myvariable])

    val format-string = to-string("DEBUG: %_ = %%~" % [name(unwrap-token(myvariable))])
    println("format-string = %~" % [format-string])

    val form = qquote(println(~ format-string % [~ myvariable]))
    println("form = %~" % [form])

    val result = parse-syntax[core / #exp](form)
    println("result = %~" % [result])
    println("\n")

    result

Now rebuild the extended compiler, and use it to compile our trymacros.stanza file again.

./myextendedstanza trymacros.stanza -o trymacros

You should see, during compilation, the following messages being printed out:

Implementation of PROBE macro.
myvariable = x
format-string = "DEBUG: x = %~"
form = (println (@do "DEBUG: x = %~" % (@tuple x)))
result = ($do println ($do modulo "DEBUG: x = %~" ($tuple x)))


Implementation of PROBE macro.
myvariable = y
format-string = "DEBUG: y = %~"
form = (println (@do "DEBUG: y = %~" % (@tuple y)))
result = ($do println ($do modulo "DEBUG: y = %~" ($tuple y)))


Implementation of PROBE macro.
myvariable = z
format-string = "DEBUG: z = %~"
form = (println (@do "DEBUG: z = %~" % (@tuple z)))
result = ($do println ($do modulo "DEBUG: z = %~" ($tuple z)))

Notice that we haven't yet ran the trymacros executable yet. These messages are printed out during the compilation of trymacros.stanza. Macros execute at compilation-time.

Let's focus on the messages printed out just for the expression PROBE(x).

The line

defrule exp4 = (PROBE(?myvariable))

defines a new syntax rule for Stanza expressions. The PROBE(?myvariable) is the definition of the pattern. In this case, our pattern matches any code that looks like PROBE(...), where a single s-expression is allowed within the ellipsis.

The question mark in front of ?myvariable indicates that it is a pattern variable. Within the body of the defrule, myvariable will refer to whatever s-expression the user provided within PROBE(...). For the usage PROBE(x), myvariable will take on the symbol x.

This can be observed in the message:

myvariable = x

Next, we use some basic string manipulation to construct the format string. The message

format-string = "DEBUG: x = %~"

shows us the final constructed string.

Finally, we use the qquote utility to construct an s-expression containing the code that we want the macro to expand into. This results in the form:

form = (println (@do "DEBUG: x = %~" % (@tuple x)))

Recall that the @do and @tuple symbols are inserted by the lexer. If we write the above form using the same notation that the lexer uses, it becomes:

println("DEBUG: x = %~" % [x])

which is exactly the final code that we want the macro to expand into.

The final step is for satisfying the requirements of the Stanza macro system. Each Stanza macro must return the final code to execute in terms of fully-expanded core forms. To do that we call parse-syntax to continue expanding any remaining macros in the code, and the fully-expanded form is then shown in the message:

result = ($do println ($do modulo "DEBUG: x = %~" ($tuple x)))

Bugs in Macros

Our macro implementation actually contains some errors in its implementation. Let's see what happens when it crashes.

Try changing the trymacros.stanza file to the following:

#use-added-syntax(mydebugmacros)
defpackage trymacros :
  import core
  import collections

defn main () :
  val x = 10
  val y = "Hello world"
  val z = x * 10
  PROBE((x + z))

main()

And try compiling it again.

./myextendedstanza trymacros.stanza -o trymacros

Our system crashes with the following printout:

Implementation of PROBE macro.
myvariable = (x + z)
FATAL ERROR: No appropriate branch for arguments of type (FullList).
  in core/print-stack-trace
    at core/core.stanza:329.14
  in core/print-stack-trace
    at core/core.stanza:335.2
  in core/fatal
    at core/core.stanza:382.2
  ...

This is caused by the call to:

name(unwrap-token(myvariable))

name is a function that can only be called on Symbol objects, but in this case myvariable is a List.

So be cautious. When a macro crashes, it causes the entire compiler to crash.

We can fix this by adding the following check:

defrule exp4 = (PROBE(?myvariable)) :
  println("Implementation of PROBE macro.")
  println("myvariable = %~" % [myvariable])

  ;Check that PROBE is called correctly.
  if unwrap-token(myvariable) is-not Symbol :
    throw(Exception("%_: Incorrect usage of PROBE(x). \
                     The argument to PROBE must be a symbol." % [
                     closest-info()]))

  val format-string = to-string("DEBUG: %_ = %%~" % [name(unwrap-token(myvariable))])
  println("format-string = %~" % [format-string])

  val form = qquote(println(~ format-string % [~ myvariable]))
  println("form = %~" % [form])

  val result = parse-syntax[core / #exp](form)
  println("result = %~" % [result])
  println("\n")

  result

With this additional guard, the system will print out the following.

[WORK IN PROGRESS]

Building an Optimized Compiler

We have been using the following command to extend the compiler:

stanza extend debugmacros.stanza -o myextendedstanza

You might have noticed that the extended compiler runs a bit slower than you're used to. This is because the extended compiler is compiled without optimizations, while the standard Stanza compiler is compiled in optimized mode.

When you are confident in the implementation of your macros, you can compile an optimized version of the compiler using this command:

stanza extend debugmacros.stanza -o myextendedstanza -optimize

Be cautious of using this before your macros have been fully debugged though. Optimized mode removes many safety checks for detecting errors early, and an incorrect program may behave strangely.

The DefSyntax System: A Small Experiment Framework

The defsyntax system is Stanza's built-in parsing mechanism for s-expressions. It is both the underlying system used by the macro system for extending the syntax of the language, and it can also be used as a standalone utility.

To allow us to explore the defsyntax system we will build a small framework that allows us to quickly try out different syntax definitions.

A Small Syntax

Create a new directory and create a file called myparser.stanza containing:

defpackage myparser :
  import core
  import collections

defsyntax my-experimental-language :

  public defproduction sentence: String
 
  defrule sentence = (the quick red fox) :
    "Sentence about foxes"
   
  defrule sentence = (the lazy brown dog) :
    "Sentence about dogs"

  defrule sentence = (the 3 "friendly" lions) :
    "Sentence about lions"

This file contains the definition of the my-experimental-language syntax.

Parsing Using the Syntax

Now create another file called test-myparser.stanza containing:

defpackage test-myparser :
  import core
  import collections
  import reader
  import myparser

defn main () :
  val forms = read-file("test-input.txt")
  println("PARSING:\n%_\n\n" % [forms])

  try :
    val parsed = parse-syntax[my-experimental-language / #sentence](forms)
    println("RESULT:\n%_\n\n" % [parsed])
  catch (e:Exception) :
    println("Could not parse forms.")
    println(e)

main()

This file reads in the s-expressions contained within a text file and asks the parsing system to interpret the forms as a sentence as defined in the my-experimental-language syntax.

Initial Experiments

Now create our input test file, test-input.txt, containing:

the lazy brown dog

And run our system like this:

stanza run myparser.stanza test-myparser.stanza

You should see the following printed out:

[WORK IN PROGRESS]

We can try out the other recognized sentences too:

If we fill test-input.txt with:

the quick red fox

then the program prints out:

[WORK IN PROGRESS]

If we fill test-input.txt with:

the 3 "friendly" lions

then the program prints out:

[WORK IN PROGRESS]

If we fill test-input.txt with an unrecognized sentence:

the quick blue fox

then the program prints out:

[WORK IN PROGRESS]

The Pattern Language

Using the above framework, we can now learn about the different patterns that the defrule construct supports.

Literals

Our patterns so far consists of "literals". These must match exactly to constitute a match.

The literal pattern:

quick

matches against the s-expression:

quick

The literal pattern:

3

matches against the s-expression:

3

The literal pattern:

"friendly"

matches against the s-expression:

"friendly"

It does not match against the s-expression:

friendly

The literal pattern:

3L

matches against:

3L

It does not match against the s-expression:

3

Wildcards

The wildcard pattern _ matches against any single s-expression.

The pattern:

_

matches against all of the following s-expressions:

3
"friendly"
(a b c)
3L
Pumbaa

Note that (a b c) is a single s-expression.

Concatenation

Multiple patterns can be concatenated together to form a longer pattern.

The pattern:

a b

matches against the following s-expressions:

a b

The pattern:

a _ _ x

matches against all of the following s-expressions:

a y z x
a 1 2 x
a (1 2 3) (1 2 3) x

Lists

The list pattern (...) matches against list s-expressions.

The pattern:

()

matches against the following s-expression:

()

The pattern:

(a)

matches against the following s-expression:

(a)

The pattern:

(a 3 "x")

matches against the following s-expression:

(a 3 "x")

The pattern:

(a 3 (y))

matches against the following s-expression:

(a 3 (y))

The pattern:

(a _ (_ y))

matches against all of the following s-expressions:

(a b (x y))
(a 3 ("hello" y))
(a "world (x y))
(a (x y z) ((x y z) y))

Ellipsis

An ellipsis pattern matches zero or more occurrences of a single pattern.

The pattern:

a ...

matches against all the following s-expressions:

a
a a
a a a a
a a a a a a a a a

It even matches against the empty s-expression:

The pattern:

3 ...

matches against all of the following s-expressions:

3
3 3
3 3 3 3
3 3 3 3 3 3 3 3 3

The pattern:

_ ...

matches against all of the following s-expressions:

a
a a a
x y z w
x y (1 2 3) (1 2 3)
() () () ()

And, like any ellipsis pattern, it matches against the empty s-expression:

So effectively, the pattern _ ... can match against anything at all.

The pattern:

(x _ z) ...

matches against all of the following s-expressions:

(x y z)
(x y z) (x w z) (x h z)
(x 0 z) (x 1 z) (x 2 z)
(x () z) (x (0) z) (x (0 0) z)

Splice-Ellipsis

A "splice ellipsis" pattern can only be used following a list pattern. It is similar to the normal ellipsis pattern in that it matches zero or more occurrences of the pattern, but it applies to all subpatterns with the list.

The pattern:

(x y) @...

matches against all the following s-expressions:

x y
x y x y
x y x y x y x y

It does not match against the s-expressions:

x y x

It does not match against the s-expression:

(x y)

The pattern:

(x _ z) @...

matches against all the following s-expressions:

x 0 z
x 0 z x 1 z x 2 z x 3 z
x () z x (0) z x (0 0) z x (0 0 0) z

Examples of Combining Patterns

The list and ellipsis patterns are very powerful, and can be combined to form very expressive patterns. Here are some examples.

The pattern:

(x y ...) @...

matches against all of the following s-expressions:

x
x x
x x x x
x y
x y y y
x y y x y y y x y y y y y

The pattern:

begin ((_ . _) @...) ... end

matches against all of the following s-expressions:

begin (x . int) end
begin (x . int y . string) end
begin (x . int y . string w . int) end
begin (x . int y . string w . int) (x . int) (x . string) end

Escaping

Most symbols that appear in a pattern, e.g. myname, myconstruct, x, y, z, etc. are interpreted as simple literal patterns. There are a small number of special symbols that have special meanings, such as ... and @....

So what is the pattern that would actually match against the s-expression:

a ... @... b

In this case, use the "escape" operator, code{~}, to specify that the next s-expression in a pattern should be interpreted as a simple literal.

The pattern:

a ~ ... ~ @... b

matches against the s-expressions:

a ... @... b

The pattern:

a ~ ~ b

matches against the s-expressions:

a ~ b

Understanding the Lexer Shorthands

To make writing code convenient and increase readability, Stanza's lexer automatically provides a small set of abbreviations. These abbreviations are fixed and cannot be modified by the user:

{x}          is an abbreviation for          (@afn x)

[x]          is an abbreviation for          (@tuple x)

f(x)         is an abbreviation for          f (@do x)

f{x}         is an abbreviation for          f (@do-afn x)

f[x]         is an abbreviation for          f (@get x)

f<x>         is an abbreviation for          f (@of x)

?x           is an abbreviation for          (@cap x)

`sexp        is an abbreviation for          (@quote sexp)

a b c :      is an abbreviation for          a b c : (d e f)
  d e f

Curly brackets ({}) expand to a list with the @afn symbol as its first item. Square braces ([]) expand to a list with the @tuple symbol as its first item. An s-expression followed immediately by an opening parenthesis (() inserts the @do symbol as the first item in the following list. An s-expression followed immediately by an opening curly bracket ({) inserts the @do-afn symbol as the first item in the following list. An s-expression followed immediately by a square brace ([) inserts the @get symbol as the first item in the following list. An s-expression followed immediately by an opening angle bracket (<) inserts the @of symbol as the first item in the following list. A question mark followed immediately by a symbol expands to a list with the @cap symbol as its first item. A backquote followed by an s-expression expands to a list with the @quote symbol as its first item. A line ending colon automatically wraps the next indented block in a list.

Commas:

(x, y, z)

are treated identically to whitespace, and is an abbreviation for:

(x y z)

These abbreviations need to be taken into consideration when writing patterns.

As an example, the pattern:

(@tuple _ ...)

matches against all of the following s-expressions:

(@tuple)
(@tuple x)
(@tuple x y z z z)
[]
[x]
[x y z z z]
[x, y, z, z, z]

The pattern:

plus (@do _ _)

matches against all of the following s-expressions:

plus (@do x y)
plus (@do 1 2)
plus(x y)
plus(1 2)
plus(x, y)
plus(1, 2)

The pattern:

while x : (println)

matches against the following s-expressions:

while x : (println)

and it matches against these s-expressions:

while x :
  println

but it does not match against these s-expressions:

while x : println

Patterns with Lexer Shorthands

Note that the lexer shorthands apply identically to patterns as well.

Thus the pattern:

[_ ...]

is identical to the pattern:

(@tuple _ ...)

And the pattern:

while x :
  println

is identical to the pattern:

while x : (println)

It is customary to use lexer shorthands in the pattern definitions to improve readability.

Thus the pattern:

[_ ...]

matches against the following s-expressions:

[x, y, z, z, z]

The pattern:

plus(_, _)

matches against all the following s-expressions:

plus(x, y)
plus(1, 2)

The pattern:

while x :
  println

matches against the following s-expressions:

while x :
  println

but it does not match against:

while x : println

Productions and Rules

What is a Rule?

A rule is a combination of a pattern and a block of code to execute if the pattern matches. A rule is specified for a production. Here is an example rule that we used in our experiment framework:

defrule sentence = (the quick red fox) :
  "Sentence about foxes"

The above syntax specifies the following:

  1. This is a new rule for the sentence production.
  2. The pattern is the quick red fox. So this rule matches any s-expressions that match this pattern.
  3. If the s-expressions match, then the rule returns the string "Sentence about foxes".

What is a Production?

A production is a named set of rules. Our experiment framework defined a single production called sentence:

public defproduction sentence: String

The above specifies:

  1. This is a new production called sentence.
  2. The rules for this production must return a String if they match.
  3. This production is public and is visible to users of this syntax package.

Recall that to use our syntax package to parse some s-expressions we used the following:

val parsed = parse-syntax[my-experimental-language / #sentence](forms)

The above specifies:

  1. Parse the s-expressions contained in the variable forms.
  2. Parse using the rules associated with the sentence production in the my-experimental-language syntax package.
  3. On a successful match the rule that matched will execute and return a String. This string will be stored into the parsed variable.

Defining Multiple Productions

A syntax package can contain as many productions as we like.

Let's introduce one more production to our experimental syntax package. Here is the new myparser.stanza:

defpackage myparser :
  import core
  import collections

defsyntax my-experimental-language :

  public defproduction sentence: String
 
  defrule sentence = (the quick red fox) :
    "Sentence about foxes"
   
  defrule sentence = (the lazy brown dog) :
    "Sentence about dogs"

  defrule sentence = (the 3 "friendly" lions) :
    "Sentence about lions"

  public defproduction animal: String
  defrule animal = (fox) : "little fox"
  defrule animal = (dog) : "loyal dog"
  defrule animal = (lion) : "regal lion"

Let's now modify the test program to try parsing some s-expressions using both productions. It will first attempt to parse the contents as a sentence, and then attempt to parse the contents as a animal.

defpackage test-myparser :
  import core
  import collections
  import reader
  import myparser

defn main () :
  val forms = read-file("test-input.txt")
  println("PARSING:\n%_\n\n" % [forms])

  try :
    val parsed = parse-syntax[my-experimental-language / #sentence](forms)
    println("RESULT:\n%_\n\n" % [parsed])
  catch (e:Exception) :
    println("Could not parse forms as sentence.")
    println(e)

  try :
    val parsed = parse-syntax[my-experimental-language / #animal](forms)
    println("RESULT:\n%_\n\n" % [parsed])
  catch (e:Exception) :
    println("Could not parse forms as animal.")
    println(e)

main()

If we now fill test-input.txt with:

lion

and run the test program it will print out:

[WORK IN PROGRESS]

Notice that the s-expressions could not be parsed as a sentence, but it can be successfully parsed as an animal.

Order of Rule Matching

When searching for a match, the rules for a production are tested one at a time until the system reaches the first rule that matches.

Here is an example of a production with multiple rules:

public defproduction sentence: String

defrule sentence = (one big chance) :
  "One big chance"

defrule sentence = (one _ chance) :
  "One ??? chance"

defrule sentence = (_ big chance) :
  "??? big chance"

defrule sentence = (_ _ _) :
  "Default case"

If we try to parse the following input:

one big chance

using the above production, the system will try out the first pattern one big chance. Since this pattern matches, the system will return "One big chance" and skip testing the rest of the rules.

If we try to parse:

one small chance

then the system will return:

"One ??? chance"

because the one _ chance pattern is the first pattern that matches.

If we try to parse:

my big chance

then the system will return:

"??? big chance"

Finally, if we try to parse:

my big break

then the system will return:

"Default case"

Failure Rules

As mentioned above, by default the system automatically tries rules one at a time until it finds the first rule that matches.

Sometimes, to keep behaviour predictable, it is important to prevent the system from continuing the search if we can determine early that something has gone wrong. To handle this case, we can use a fail-if rule.

Here is an example:

public defproduction sentence: String

defrule sentence = (one big chance) :
  "One big chance"

fail-if sentence = (one red chance) :
  Exception("Sentence doesn't make sense. A chance cannot be red.")

defrule sentence = (one _ chance) :
  "One ??? chance"

defrule sentence = (_ big chance) :
  "??? big chance"

defrule sentence = (_ _ _) :
  "Default case"

Let's try to parse the following input:

one red chance

Our test program will print out:

[WORK IN PROGRESS]

Note that this means that the input did not successfully parse. As soon as the system detects that the input matches the pattern one red chance it halts the entire parse.

The general form of a fail-if rule has this structure:

fail-if production = (pattern) :
  exception-body

which says:

  1. If the system is parsing the production production,
  2. and the system detects that the input matches the pattern pattern,
  3. then the entire parse is a failure. The exception-body is executed to compute an Exception object that represents the cause of the failure.

This description is quite abstract, but we will use this construct later in a larger example that will show off the practical situations when fail-if rules are useful.

Closest Info

Within both defrule and fail-if rules, a special function called closest-info can be used to retrieve the file name and line number where the rule first matched. It returns a FileInfo object if there is file information attached, or false otherwise.

It is most often used in a fail-if rule to provide the location of the error.

Let's alter the fail-if rule above to the following:

fail-if sentence = (one red chance) :
  match(closest-info()) :
    (info:FileInfo) : Exception(to-string("%_: Sentence doesn't make sense. A chance cannot be red." % [info]))
    (f:False) : Exception("Sentence doesn't make sense. A chance cannot be red.")

Now if we parse the following input:

one red chance

Our test program will print out:

[WORK IN PROGRESS]

Referencing Productions in Patterns

The true expressivity of productions are fully utilitized only when we refer to a production from within a pattern. To refer to a production, we put the pound character '#' before the production name.

Here is an example:

public defproduction sentence: String

defrule sentence = (A #animal is an animal) :
  "Sentence about animals"

defrule sentence = (I am a #animal) :
  "Sentence about what I am"

defproduction animal: String
defrule animal = (dog) : "Dogs"
defrule animal = (lion) : "Lions"
defrule animal = (meerkat) : "Meerkats"
defrule animal = (warthog) : "Warthogs"

The pattern:

I am a #animal

consists of three literals (I, am, and a) that must exactly, followed by one production #animal that matches only if one of the animal rules match.

If we try to parse the following input:

A dog is an animal

the system prints out:

[WORK IN PROGRESS]

If we try to parse the following input:

A meerkat is an animal

the system prints out:

[WORK IN PROGRESS]

If we try to parse the following input:

I am a lion

the system prints out:

[WORK IN PROGRESS]

If we try to parse the following input:

I am a cat

the system prints out:

[WORK IN PROGRESS]

Binders in Patterns

The previous section showed that we can refer to other productions from within a pattern. But it is unsatisfying that parsing both of the following:

A dog is an animal
A lion is an animal

outputs the same parsing result:

"Sentence about animals"

How would we know which specific animal we're talking about?

To handle this case, we can use a binder to store the intermediate result of parsing the #animal.

Make the following change to the definition of our rules:

defrule sentence = (A ?a:#animal is an animal) :
  to-string("Sentence about animals. Specifically about %_." % [a])

defrule sentence = (I am a ?a:#animal) :
  val len = length(a)
  val singular = a[0 to len - 1]
  to-string("Sentence about what I am. I am a %_." % [singular])

Now if we parse:

A dog is an animal

the system prints out:

[WORK IN PROGRESS]

If we parse:

A meerkat is an animal

the system prints out:

[WORK IN PROGRESS]

If we parse:

I am a lion

the system prints out:

[WORK IN PROGRESS]

If we parse:

I am a cat

the system prints out:

[WORK IN PROGRESS]

Recall that the animal production was declared like this:

defproduction animal: String

This means that the result of parsing an animal is a String.

Using the following syntax within a rule:

?a:#animal

means that we would like to use the variable a to refer to the result of parsing the animal from within the body of the rule.

If the #animal matches against meerkat, then a will contain the string "Meerkats". If the #animal matches against lion, then a will contain the string "Lions".

Binders can be used to refer to the result of any pattern. An especially common and useful use case is for ellipsis patterns.

Let's try adding one more rule for the sentence production.

defrule sentence = (The following are all animals : (?xs:#animal ...)) :
  to-string("Sentence listing different animals: %," % [xs])

Within the body of the rule, the xs variable will refer to the result of parsing the pattern #animal .... Since the result of parsing #animal is a String, the result of parsing #animal ... will be List<String>.

If we parse the following:

The following are all animals :
  dog
  lion
  meerkat
  warthog

the system prints out:

[WORK IN PROGRESS]

More generally, you can use a binder to bind to the result of any pattern.

?a:PATTERN

There is a quick shorthand that allows you to omit the pattern entirely:

?a

This is synonymous with binding to the wildcard pattern:

?a:_

Advanced Binding Patterns

The system properly handles binders when they are nested within list, ellipsis, and splice-ellipsis patterns. Here is an example of a nested binder:

defrule sentence = (Mean animals :
                      (The ?animals:#animal is mean) @...) :
  to-string("%, are mean." % [animals])

If we parse the following:

Mean animals :
  The dog is mean
  The lion is mean
  The warthog is mean

then the system prints out:

[WORK IN PROGRESS]

Here is a sophisticated example involving multiple nested binders:

defrule sentence = (Friendships:
                      (The ?animals:#animal likes (?friend-lists:#animal ...)) @...) :
  val buffer = StringBuffer()
  println(buffer, "Friendships between animals: ")
  for (animal in animals, friends in friend-lists) do :
    println(buffer, " %_ likes (%,)." % [animal, friends])
  to-string(buffer)

If we parse the following:

Friendships :
  The dog likes (dog, lion)
  The meerkat likes (warthog)
  The warthog likes (warthog, lion)

then the system prints out:

[WORK IN PROGRESS]

Note that the variable friend-lists has type List<List<String>> within the body of the rule. The pattern #animal ... has result type List<String>, and when it is nested within the spliced-ellipsis pattern @..., the final result type becomes List<List<String>>.

Binders can appear under arbitrary levels of nesting, but we recommend keeping it to below two levels in order to keep the code readable.

Guard Predicates

A guard predicate allows the user to use an arbitrary Stanza function to place further conditions on whether a rule matches or not.

Suppose that we have a predicate that determines whether a given s-expression might be a string that represents a name. We define a name to be any string that contains exactly a single space, and is made up of letters otherwise.


defn name? (x) -> True|False :
  match(unwrap-token(x)) :
    (s:String) :
      val num-spaces = count({_ == ' '
, s)

      val num-letters = count(letter?, s)

      num-spaces == 1 and

      num-letters + 1 == length(s)

    (x) :

      false

}

Suppose we have another predicate that determines whether a given s-expression might be a string that represents an address. We define an address to be any string that contains at least one space, one letter, and one digit.


defn address? (x) -> True|False :
  match(unwrap-token(x)) :
    (s:String) :
      val num-spaces = count({_ == ' '
, s)

      val num-digits = count(digit?, s)

      val num-letters = count(letter?, s)

      num-spaces > 0 and

      num-digits > 0 and

      num-letters > 0 and

      (num-spaces + num-digits + num-letters) == length(s)

    (x) :

      false

}

Now we can use these predicates in the following rules.

defrule sentence = (Detail for ?a:#animal: ?detail) when name?(detail) :
  to-string("The name of %_ is %_." % [a, detail])

defrule sentence = (Detail for ?a:#animal: ?detail) when address?(detail) :
  to-string("The %_ lives at address %_." % [a, detail])

fail-if sentence = (Detail for ?a:#animal: ?detail) :
  Exception("Unsupported detail for %_." % [a])

Parsing the following:

Detail for dog: "134 Varsity Avenue"

results in the following:

[WORK IN PROGRESS]

Parsing the following:

Detail for dog: "Rummy Li"

results in the following:

[WORK IN PROGRESS]

Parsing the following:

Detail for dog: "105; DROP TABLE Animals"

results in the following:

[WORK IN PROGRESS]

In the above rules the guard predicates place additional conditions on whether a rule matches. The first rule is match only if detail is bound to an s-expression that passes our name? predicate. The second rule matches only if detail is bound to an s-expression that passes our address? predicate. Finally, the last fail-if rule specifies explicitly that it is an error if detail does not pass either predicate.

Importing Productions

[WORK IN PROGRESS]

The Stanza Core Macros

[WORK IN PROGRESS]