Artifact [0ad6708830]

Not logged in

Artifact 0ad670883040e05d4284814feea879b0a592f0cb:

Wiki page [Specification Syntax] by andy 2014-10-11 12:07:00.
D 2014-10-11T19:07:00.409
L Specification\sSyntax
N text/x-markdown
P d27884444a5f6ffe7caa5c3e3b5415bb5ec34106
U andy
W 5002
Arend's specification syntax is based on Prolog, but is more flexible, with fewer predefined operators.

#Lexical syntax

Arend's lexical syntax is largely defined in terms of Unicode character categories. Only a few characters are treated specially.

Whitespace is ignored, but required to separate some tokens. Comments normally count as whitespace.

Comments come in two forms:

    // Single-line comment
    /* Multi-line comment */

Tokens are broken into three groups:

  *  *Atomic* tokens start with a lowercase alphabetic (Ll or Lo) or numeric (Nd, Nl, or No) character, and are followed by any contiguous sequence of zero or more non-separator (see below) characters. As a special case, the empty list `[]` is considered to be an atomic token.

  *  *Variable* tokens start with an *uppercase* (Lu) alphabetic character or an underscore, and are followed by any contiguous sequence of zero or more non-separator (see below) characters.

  *  *Symbolic* tokens are those that start with a character in one of the following classes: Sc, Sm, Sk, So, Pc, Pd, Po. This includes most symbolic and punctuation characters. This can be followed by any contiguous sequence of zero or more non-separator characters. (Grammatically, all symbolic tokens are assumed to be infix operators.)

  *  *Separator* tokens are the following: `:-` `(` `)` `[` `]` `{` `}` `:` `,` and `.` This list may be extended to include all bracketing (Ps, Pe) and paired quoting (Pi, Pf) characters.

A *raw* token is any other type of token enclosed in backquotes. A raw token is exactly equivalent to the unquoted token, except that for symbolic tokens it is not assumed to be infix. Thus, the following are equivalent (assuming the term-based parsing described below):

    a + b

#Grammatical syntax

Although Arend's specification language has only a handful of built-in operations, it's syntax includes support for a number of common extensions, under the assumption that specifications will want to use them in their natural forms. Thus, e.g., arithmetic operators are parsed according to the usual precedence, even though there are no built-in arithmetic operations.

The specification parser outputs a list of *terms*, from the `terms` module. Atomic and variable tokens are translated into `Atoms` and `Variables`, respectively. Compound terms of the form `head(args...)` or the infix form `lhs OP rhs` are translated into `Compounds`; for infix operators, the `OP` becomes the head. 

Bracket-enclosed lists are parsed specially. The internal syntax for lists is
`a:b:c:[]`, being equivalent to `[a,b,c]`. The elements of the list can be any term. 

Curly-braces are currently reserved for future extension, as are single- and double-quoted strings. 

The grammatical syntax of Arend's specification language is roughly

    specification = definition* 

    definition = head ":-" body "."
               / head "."

    head = term

    body = term

    term = atomicTerm
         / infixTerm
         / prefixTerm
         / listTerm
         / "(" term ")"

    atomicTerm = atom

    infixTerm = term OP term   // oversimplification

    termList = (term ("," term)*)?

    prefixTerm = atom "(" termList ")"

    listTerm = "[" termList "]"

Note that, due to the ambiguity with `,`, infix terms with `,` as their head inside lists must either be enclosed in parentheses or use the raw form of `,`.

## Operator precedence

Infix operators are parsed according to the following precedence table, from lowest to highest. The second column lists associativity.

    Operator          Assoc.  Description
    ================= ======= ==============================
    h(...)   (t)              Compounds, parenthesized terms

    - !                       Prefix minus, not
    ^                 right   Exponentiation
    * / %             left    Multiplication, division, mod
    + -               left    Addition, subtraction
    < <= > >= == !=   non     Comparison
    :                 right   Cons

    <user-defined infix operators>

    =                 other* Unification
    , &               r,l    Conjunction
    | ;               left   Disjunction
    :-                left   Predicate definition
Note that in the first group of operators, the descriptions are just the "standard" interpretations. These operators have no predefined meaning in Arend. The second group of operators includes those which have some semantic meaning in Arend. 

Associativity of unification: `A = B = C` is parsed as `A = B, B = C`, except that the entire unification is setup all at once, so it's slightly more efficient.

Also note that while `,` is *right* associative, `&` is *left* associative.

Note that, in its current state, Arend's grammar *requires* memoization to be enabled in order to avoid exponential parsing time in some cases. Hopefully this will be remedied in the future.
Z 635d06ae593a64c5ad37f2d78e30bf57