Metacza: Syntax


File

Input files of Metacza are always encoded as UTF-8, UTF-16 or UTF-32. Both little endian and big endian are supported.

Output files are always UTF-8 encoded. All output streams (in particular, the error stream) are also taken to be UTF-8 encoded.

A Metacza file always starts with #!, possibly preceded by an optional Unicode BOM (U+FEFF) in either UTF-8, UTF-16 or UTF-32 encoding. From this, the input encoding is inferred.

The first line must contain the string metacza. A valid first line would be:

#! this is ignored metacza

Command line options may follow the metacza keyword on the first line:

#! metacza --namespace superlib

Comments are C++ style line comments, i.e., they start with //:

// This is a comment

The body of a file is a sequence of statements.

The interpretation of the file can be ended before the actual end of the file by using a special token:

__END__

Statements

Prototypes

f(x,y,z);
g(a,b...);

Defaults may be given:

int2str(n, base = 10, len = 1)

As an abbreviation, classifications may be used instead of identifiers in the argument list. In that case, var is optional.

f(x: raw(int), y: _(_), const z, const w: _(_));

This is equivalent to:

let
    var x: raw(int)
    var y: _(_)
    const z
    const w: _(_)
in
f(x,y,z,w)

If the kind is a pack, i.e., it ends in ..., then the variable will be expanded using ..., too. So the

data string(ch: raw(char)...)

This is equivalent to:

let
    var ch: raw(char)...
in
data string(ch...)

Function Definition

Functions may be defined using multiple clauses that may be distributed even among different files.

fib(n) = fib(n-1) + fib(n-2);
fib(0) = 0;
fib(1) = 1;

The first use of a function is taken to be prototypical, the following are pattern matches. The prototypical usage can be declared outside the current file by using a const declaration.

Each function definition clause may be optionally preceded by a let block. This can be used for local classifications of the parameters, but definitions are also allowed and will be local to the function.

let
    const a: _;
in
f(a) = 5;

Like with prototypes, function heads may contain embedded classifications.

f(const red) = green
g(car(x: _(_))) = true

This is equivalent to:

let
    const red
in
f(red) = green

let
    var x: _(_)
in
g(car(x)) = true

Data Definition

Data definitions look like prototypes with data prefixed.

data red;
data complex(_,_);

Inheritance can be specified using a colon:

data foo;
data bar : foo;

A let block is possible for local definitions (e.g. to add a tag):

let
    tag = 5
in
data myint(_);

Preprocessor Statements

The following preprocessor statements are supported:

#if ...
#ifdef ...
#ifndef ...
#else
#elif ...
#endif
#define ...
#undef ...
#include ...
#warning ...
#error ...
#line ...
#pragma ...

Evaluate Once Statement

The evaluate once statement is only allowed on top-level. All statements following it until the end of the file will be protected.

pragma once;

Namespace Statements

Metacza supports the following namespace related statements with the same syntax as C++:

namespace myScope { STMT... }
namespace newName = oldScope::oldName;
using namespace myScope;

Further, Metacza supports some extended syntax:

namespace someScope::myScope { STMT... }
namespace someScope::newName = oldScope::oldName;

Assertions

Assertions look similar to C, but take an additional argument for the failure message string.

assert(EXPR, "Some Failure Message");

As an extension, the message may be left out. In C++, a generic assertion failure message will be generated.

assert(EXPR);

Classification

A classification starts with either var or const followed by an identifier. Then an optional colon and a kind specificaion follow. It is terminated with a semicolon.

var a;
const b;
var c: _;
const d: _(_);

Raw C++ Code

Raw C++ code that will be copied to the output file unmodified can be put between %{ and %}. Metacza keeps track of string and character constants and line comments in the raw C++ code so that a closing %} may be used there without closing the Metacza statement. Note that C style comments are not supported even in raw C++ code and will cause an error.

%{ static int const value = 10; %}

Expressions

Expressions generally follow C++ syntax, but a few things are a little different. This mainly concerns handling of operator precedence (I personally hate this in C/C++/Perl), the if-else construction, and let blocks.

Operators are translated to invocations of meta template functions just like any funcall. This section also lists the template names that are used when compiling in Boost.MPL mode. In those cases where this differs from native Metacza mode, the differences are listed.

In raw mode (i.e., for expressions marked with a raw() functor), all operators are translated to plain C/C++ syntax.

Constant Literals

Metacza's basic literals are booleans, integers, and strings.

Integer literals are always unsigned: a preceding minus is parsed as a prefix operator and not part of the literal.

Integer literals may be given in decimal, octal, hexadecimal and binary notation and may contain underbar characters for improving readability (just like in Perl).

Strings come in three flavours, depending on their desired encoding in the output file: UTF-8, UTF-16, and UTF-32. They follow C++11 syntax. Strings without prefix are UTF-8 encoded just like strings with u8 prefix.

// Expression      // Boost MPL                     // native (if different)
///////////////////////////////////////////////////////////////////////////////
true               // true_
false              // false_
10                 // int_<10>                      PInt<10>
50                 // int_<50>                      ...
2_542              // int_<2542>
0x100              // int_<256>
0b1010_1011_1111   // int_<2751>
0777               // int_<511>
"hello"            // string<'hell','o'>            string<'h','e','l','l','o'>
u8"hello"          // string<'hell','o'>            string<'h','e','l','l','o'>
u"ab"              // vector<char16_t,u'a',u'b'>    string16<u'a',u'b'>
U"ab"              // vector<char32_t,U'a',U'b'>    string32<U'a',U'b'>

Identifiers

Metacza identifiers have the same syntax as in C++. Identifiers must not contain more than one consecutive underbars.

foo
bar10
x123_456

The special identifier _ is always a new anonymous identifier.

_ = print(5);

Funcalls

EXPR ( EXPR, ..., EXPR )

Funcall Like Operators

print(EXPR)
raw(EXPR)

Unary Prefix Operator

// Expression     // Translated As
////////////////////////////////////////////
+ EXPR          // identity
- EXPR          // negate
* EXPR          // treated specially: unlambda
! EXPR          // not_
~ EXPR          // bitxor_<~0,...>

Binary Infix Operators

// Expression            // Translated As
////////////////////////////////////////////////////////
EXPR + EXPR              // plus
EXPR - EXPR              // minus
EXPR * EXPR              // times
EXPR / EXPR              // divides
EXPR % EXPR              // modulus
EXPR << EXPR             // shift_left
EXPR >> EXPR             // shift_right
EXPR == EXPR             // equal_to
EXPR != EXPR             // not_equal_to
EXPR < EXPR              // less
EXPR > EXPR              // greater
EXPR <= EXPR             // less_equal
EXPR >= EXPR             // greater_equal
EXPR & EXPR              // bitand_
EXPR | EXPR              // bitor_
EXPR ^ EXPR              // xor_
EXPR && EXPR             // and_
EXPR || EXPR             // or_

Unary Suffix Operators

EXPR...

As an abbreviation, the following expressions are equivalent:

_...
...

Lambda Expressions

{ EXPR }
{ (x,y)  = EXPR }                    // with a parameter list
{ (x...) = EXPR }                    // and many others.
{ let STMT... in (x,y,z) = EXPR }    // with local definitions and params

Special Expressions

( EXPR )
let STMT... in EXPR
EXPR if EXPR else EXPR    // translated as eval_if  (in raw mode: ?:)

Operator Precedence

In general, any sequence of operators in Metacza needs to be disambiguated by parentheses. This means that operator precedence usually is not used or needed, because Metacza forces you to use parentheses anyway.

In a few cases, precedence is exploited and parentheses are not needed. This section lists these cases.

Sequences of summation operators + and - may be used without parentheses. This includes both prefix and infix operators:

-1 + 5 + -8 - +9

Sequences of communative, associative operators may be used without parentheses.

1 * 2 * 3 * 4 * 5
1 | 2 | 4 | 8 | 16
1 ^ 2 ^ 3 ^ 4 ^ 5
1 & 3 & 7 & 15
(foo == 10) && !bar && true
(foo == 10) || !bar || false

Funcalls in the same expression as prefix operators may be used without parentheses. The funcall takes precedence:

+f(5)          // same as +(f(5))  (NOT: (+f)(5))

Seqences of if expressions may be used without parentheses.

a if x else b if y else c

The condition in an if expression needs no parentheses.

a if n == 1 else b

Further, syntax not part of expressions (but looking like an operator) may be used together with any operator without parentheses.

f = 5 + x;     // no need for parens here: = is not an expression operator

Apart from that, all uses of multiple operators must use parentheses.

a + b * c                     // ERROR
a + (b * c)                   // GOOD
(a + b) * c                   // GOOD
a && b || c                   // ERROR
a && (b || c)                 // GOOD
(a && b) || c                 // GOOD
let a = b in a + b            // ERROR
let a = b in (a + b)          // GOOD
(let a = b in a) + b          // GOOD
a + b if c != 10 else d       // ERROR
a + (b if c != 10 else d)     // GOOD
(a + b) if c != 10 else d     // GOOD
-a * b                        // ERROR
-(a * b)                      // GOOD
(-a) * b                      // GOOD
+a...                         // ERROR
+(a...)                       // GOOD
(+a)...                       // GOOD syntax, but error otherwise
+a if c else -a               // ERROR
(+a) if c else -a             // GOOD
+(a if c else -a)             // GOOD

Kinds

_            // some value (will become 'typename' in C++)
_()          // nullary function
_(_)         // unary function
_(_...)      // function w/ arbitrarily many arguments
_(_,_)       // binary function
             // other functions work accordingly
raw(int)     // C++ type 'int'
             // other C++ types work accordingly, but currently,
             // only (qualified) identifiers are allowed in raw().

As an abbreviation, the following kinds are equivalent:

_...
...

Layout Rules

Metacza uses a mild set of layout rules so that semicolons separating statements are usually not needed.

For statements, you may drop the semicolon and start a fresh new statement without the separator, if the token starting the new statement is

  1. the first non-space or comment token on the line, and
  2. it's located further left on the line than the previous reference column

A reference column is defined by a previous statement: if a statement is the first statement on that line, its first column is the new reference column.

  stmt1(x)
  |
  |____ reference column

With several statements, only the first one on a line defines the reference column:

  stmt1(x) ; stmt2(x)      // only the first stmt on a line defines the reference column
  |
  |___ reference column

By these rules, the following few lines are well-formed; the semicolon usually ending a statement can be dropped when the next statement starts at the reference column.

stmt1(x)             // no semicolon needed because
stmt2(x)             // this starts on the same column
stmt3(x); stmt4(x)   // no-one prevents you from using semicolons
stmt5(x); stmt6(x)   // after stmt4, none is needed: stmt5 is further left
|
|___ reference column

This rule does have some consequences: expression parsing stops at tokens where a semicolon may be dropped. This means that suffix and infix operators that are the first non-space or comment token on the line, must be further right than the start of the statement, otherwise they are not recognised as part of the expression.

f(x) = x
+ 5                  // ERROR: + is not further right than reference column

f(x) = x
 + 5                 // OK: strictly right of reference column
|
|___ reference column

Note that if some structure must be parsed as the next token, then it will be parsed regardless of the layout rule:

f(x) = x +
5                    // OK, since after +, there must be an expression

The reference column is set recursively, i.e., inner blocks set the reference column only locally. When the inner block closes, the reference column before the block is active again.

f(x) =
|
|___ outer reference column
    let
        y = 5
        |
        |___ inner reference column
    in
|
|___ outer reference column
    (x + y)

Content

Index

December 7th, 2011
Comments? Suggestions? Corrections? You can drop me a line.
zpentrabvagiktu@theiling.de
Schwerpunktpraxis
Datenschutz