kestrel picture
  

Definition

The Kestrel Programming Language

Part of the documentation for the Kestrel language
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Index


BNF Notation

The notation used here, BNF. There is no consensus on what BNF stands for. John Backus developed the notation, and Peter Naur simplified it, both while working on the Algol 60 report, published in 1963. Naur named it Backus Normal Form, but it is not, formally speaking, a normal form. Later, Don Knuth suggested that it should be called Backus–Naur Form. There have been many variations on this theme, so we must define our notation here:

Symbols used in the description of a language are divided into two classes:

Terminal symbols actually occur in the language. Terminal symbols in the grammar are enclosed in quotation marks. Thus, 'rock' and "stone" might be terminal symbols in a grammar for the English language. Double and single quotes are both used in order to allow quotation marks themselves to be quoted, as in "'".

Nonterminal symbols are used to describe constructs in the language. Thus, in a grammar of English, you might find <noun phrase> used to describe such strings as "the big red house" and <adverb> used to describe the word "quickly". Angle brackets surround the name of every nonterminal.

Some BNF variants do not require quotation marks around terminal symbols and angle brackets around nonterminal symbols. Here, we use them even when their omission does not create ambiguity.

The grammar consists of a set of Production rules that describe how the terminal symbols of the language combine. Every production rule consists of a designated nonterminal, on the left hand side, and a list of terminal and nonterminal symbols that may be combined to form an instance of the designated nonterminal. For example, the following production rule might occur in a grammar for the English language:

        <noun phrase> ::= <article> <noun>

The phrase "the house" follows this rule, so long as "the" is accepted as an <article> and "house" is accepted as a <noun>.

The punctuation "::=" is used to separate the left and right sides of the production rules. Peter Naur seems to have introduced this mark because it is distinctly different yet still easy to type. Some BNF variants use just a colon or just an equals sign, but here, we use the full classical form.

Multiple production rules may be combined. Consider these two rules:

        <noun phrase> ::= <article> <noun>
        <noun phrase> ::= <article> <adjective> <noun>

These may be combined and written as:

        <noun phrase> ::= <article> <noun>
                       |  <article> <adjective> <noun>

Optional syntactic elements may be enclosed in square brackets. This allows a more compact expression of the above:

        <noun phrase> ::= <article> [ <adjective> ] <noun>

Repeated syntactic elements may be enclosed in curly braces. This allows expressions such as:

        <noun list> ::= <noun> [ { "," <noun> } "and" <noun> ]

This compact expression stands in for the following production rules:

        <noun list> ::= <noun>
                     |  <noun> "and" <noun>
                     |  <noun> <comma noun list> "and" <noun>
        <comma noun list> ::= "," <noun>
                           |  "," <noun> <comma noun list>

Finally, the mark --, occurring without quotation, is used as a comment indicator, with the comment extending to the end of line.

Over the years, numerous other extensions and variants have been proposed for use in BNF grammars. Here, we will limit ourselves to the above.

Character Set

Kestrel is defined over the 7-bit ASCII character set, with provisions for extensions to UTF-8, the standard 8-bit encoding of Unicode. The following characters and classes of characters are defined:

        <tab>     ::= HT   -- 09
        <newline> ::= LF   -- 0A
        <break>   ::= VT   -- 0B
                   |  FF   -- 0C
                   |  CR   -- 8D
        <space>   ::= " "  -- 20
        <digit>   ::= "0"  -- 30
                   |  "1"  -- 31
                  ...
                   |  "9"  -- 39
        <letter>  ::= "A"  -- 41
                   |  "B"  -- 42
                  ...
                   |  "Z"  -- 5A
                   |  "a"  -- 61
                   |  "b"  -- 62
                  ...
                   |  "z"  -- 7A

The comments above give the hexidecimal code for the corresponding ASCII character. Note that ASCII is a strict subset of UTF-8. The <letter> category may be extended to accept the accented letters in the Unicode Latin-1 Supplement, Greek, and Cyrillic alphabets. Extensions to support alphabets Hebrew, Arabic or other alphabets that do not render left to right is more problematic.

Additionally, the <space>, <newline> and <break> categories may be extended to include such things as Unicode's reverse linefeed (008D - a form of line break), no-break-space (00A0 - a form of space). Unicode is vast, and the appropriate use of all of its obscure and scattered control characters is difficult to determine.

In some contexts any character is acceptable, and in others, any character with some exceptions is acceptable. We will use the following nonterminals for these:

        <anything>       ::= NUL  -- 00
                         ...
                          |          FF
        <anything but "> ::= NUL  -- 00
                         ...
                          |  "!"  -- 21
                          |  "#"  -- 23
                         ...
                          |          FF

Here, <anything> may be any character at all, while Here, <anything but "> may be any character excepting the double quote mark. Other exclusions are defined similarly, for example Here, <anything but '>, any character other than single quote, and <anything but newline>, any character other than a newline. Note that the definition of these <anything> collections include not only the 128 characters defined by ASCII but also the 128 additional 8-bit codes with the high bit set.

Kestrel Lexical Structure

At the lexical level, a Kestrel program consists of a sequence of lexemes intermixed with arbitrary amounts of white space and terminated by an end of file:

        <kestrel program> ::= { { <white space> } <lexeme> }
                               { <white space> } <end of file>

In what follows, however, we will cheat and view the end of file mark as a kind of lexeme, one that occurs only once in any program and only at the end. This allows us to describe a Kestrel program as follows:

        <kestrel program> ::= { { <white space> } <lexeme> }

White space consists of spaces, tabs, newlines, comments and other format effectors classified here as breaks:

        <white space> ::= <space>
                       |  <tab>
                       |  <newline>
                       |  <comment>
                       |  <break>

Comments in Kestrel begin with a double dash and continue to the end of line:

        <comment> ::= "--" { <anything but newline> } <newline>

For error reporting purposes, newlines may be counted in order to determine the line number in the program. Aside from counting newlines and serving to separate lexemes, the white space in a program has no significance. Thus, while the Kestrel definition may recommend prettyprinting rules, nothing in the definition of Kestrel enforces such rules.

Lexemes include all of the symbols of the programming language, including the end of file:

        <lexeme> ::= <identifier>
                  |  <number>
                  |  <string>
                  |  <punctuation>
                  |  <end of file>

As in most programming languages, an identifier consists of a letter followed by any number of letters or digits

        <identifier> ::= <letter> { <letter> | <digit> }

There is no limit on the length of an identifier, and all characters within an identifier are significant. Note that the formal definition given here is ambiguous! Because white space between consecutive lexemes is optional, this definition permits ab to be read as two consecutive identifiers, a followed by b. This ambiguity is resolved informally by what is usually called the greedy rule: Where there is a possible ambiguity, the interpretation packing the most characters into a single chunk is accepted. In this case, we consume the longest possible identifier before declaring that the lexeme has ended. Thus, where consecutive identifiers are intended, they must be separated by non-empty white space.

Numbers may be specified in any number base from 2 to 32. Decimal numbers are used by default, and radix specification is always given in decimal:

        <number> ::= <decimal number> [ '#' <extended number> ]
        <decimal number> ::= <digit> { <digit> }
        <extended number> ::= <extended digit> { <extended digit> }
        <extended digit> ::= <digit> | <letter>

The # (number sign) character is used to designate numbers in radices other than 10. The decimal number to the left of the number sign gives the radix, while the string of extended digits to the right of the number sign gives the value. This definition of numbers introduces the same ambiguity that is present with identifiers, and it is resolved the same way: Consecutive numbers or identifiers must be separated by white space.

The extended digit set encodes the values of digits using Crockford's Base-32 encoding system:

digit 0  1   2  3   4  5   6  7 
value 0 1 3 3 4 5 6 7
digit89AaBb CcDdEeFf
value 8 9 10 11 12 13 14 15
digitGgHhIiJj KkLlMmNn
value 16 17 1 18 19 1 20 21
digitOoPpQqRr SsTtUuVv
value 0 22 23 24 25 26 32 27
digitWwXxYyZz
value 28 29 30 31

Digits with values greater than or equal to the radix are, of course, illegal. The end of a string of extended digits making up an extended number does not depend on the radix. For bases such as 8 and 16, this encoding is identical to common octal and hexadecimal. Upper and lower case are equivalent, and for larger bases, the possibility of transcription errors is significantly reduced by interpreting the letter O as zero and the letters I (capital i) and l (lower case L) as one. The primary use for such larger number bases is to permit compact encoding of large constants such as are used for cryptographic keys and pseudorandom number generation.

Quoted strings may be enclosed in either double quotes or single quotes. There is no significance to the different types of quotes.

        <string> ::= '"' { <anything but "> } '"'
                  |  "'" { <anything but '> } "'"

The following punctuation marks occuring in Kestrel programs are classified as lexemes:

        <punctuation> ::= ";"   -- statement terminator/separator
                       |  "="   -- assignment and comparison
                       |  ":"   -- definition
                       |  "("   -- grouping parameters and subscripts
                       |  "["   --     "
                       |  "{"   --     "
                       |  ")"   --     "
                       |  "]"   --     "
                       |  "}"   --     "
                       |  ","   -- separator within groupings
                       |  "@"   -- pointer definition and use
                       |  ".."  -- subrange constructor
                       |  "<>"  -- comparison (inequality)
                       |  ">"   -- comparison
                       |  ">="  -- comparison
                       |  "<"   -- comparison
                       |  "<="  -- comparison
                       |  "+"   -- arithmetic
                       |  "-"   -- arithmetic (unary or binary)
                       |  "*"   -- arithmetic
                       |  "/"   -- arithmetic
                       |  "&"   -- logic
                       |  "|"   -- logic
                       |  "~"   -- logical not or one's complement
                       |  "."   -- field selection

As the language is refined, meanings may be given to additional punctuation marks (notably, exclusive or and mod are missing above).

Note that several of the punctuation marks above are compound marks made up of several characters. Above the lexical level, these are always seen as single lexemes. Thus, for example, ">=" is a single lexeme quite distinct from "> =". The latter is seen as two lexemes, ">" followed by "=", the latter having no defined meaning.

Also note the possibility of natural Unicode extensions supporting, for example, "×" as an alternative to "*" and "∧" as an alternative to "&". Pretty printing tools and smart text-editors that make such subsitutions automatically would be useful, since typing the full Unicode character set is nearly impossible.

The following reserved words are recognized as identifiers by the BNF definitions for the lexical level given above, but are treated as special cases in the Kestrel syntax and may not be used except as dictated by the syntax:

        <reserved word> ::= "end"       -- terminates many constructs
                         :  "const"     -- various definitions
                         :  "type"
                         :  "exception"
                         :  "var"
                         :  "procedure"
                         :  "function"
                         :  "private"   -- mark a definition as private
                         :  "external"  -- mark a function as external
                         :  "ref"       -- indicate parameter by reference
                         :  "enum"      -- various types
                         :  "array"
                         :  "of"
                         :  "record"
                         :  "if"        -- statement
                         :  "then"
                         :  "else"
                         :  "select"    -- statement
                         :  "from"
                         :  "case"
                         :  "while"     -- statement
                         :  "do"
                         :  "until"
                         :  "for"       -- statement
                         :  "in"
                         :  "try"       -- statement
                         :  "handle"
                         :  "raise"     -- statement
                         :  "null"      -- a distinguished pointer value

As the language is refined, meanings may be given to additional reserved words, but this should be done sparingly.

Kestrel Programs

        <kestrel program> ::= <block> [ "end" ] <end of file>

A Kestrel program consists of a single block. Execution of the program proceeds by allocating a record in memory to hold the components of the block, executing the initializer for the block, and deallocation of the record.

Private variables and functions declared in the block making up a Kestrel program are purely local to that program. Public variables and functions are visible to the linker. With care, this permits linkage with variables that are declared in external code with which Kestrel programs are linked. Notably, this allows linkage with variables declared in support libraries written in languages like C.

Blocks

        <block> ::= { <block element> [ ";" ] }

        <block element> ::= <declaration>
                         |  <statement>

A block consists of a sequence of declarations and statements. Semicolons separating or terminating these are entirely optional, although using semicolons may lead to more useful syntax errors.

The declarations in a block are elaborated in the sequence they occur. Declarations introduce identifiers into the block on a strict definition-before-use basis. All identifiers declared in a particular block must be distinct.

The statements in a block are executed in their order of appearance. The statements in a block serve to initialize the record associated with the block and perform other computations.

Declarations

        <declaration> ::= [ "private" ] <constant declaration>
                       |  [ "private" ] <type declaration>
                       |  [ "private" ] <exception declaration>
                       |  [ "private" ] <variable declaration>
                       |  [ "private" ] <procedure declaration>
                       |  [ "private" ] <function declaration>

By default, all declarations in a block are public, that is, visible outside the block, but declarations may be marked private, preventing outside access.

Constant Declarations

        <constant declaration> ::= <identifier> ":" "const" <expression>

A constant declaration binds the given identifier to the value of an expression. The expression must have a constant value, which is to say, all the operands must be constants and the operators must be computable at compile time. The type of the constant is determined from the expression.

Type Declarations

        <type declaration> ::= <identifier> ":" "type" <type>

A type declaration binds an identifier to a type.

Exception Declarations

        <exception declaration> ::= <identifier> ":" "exception"

An exception declaration binds an identifier to a new exception, distinct from all other exceptions used in the program.

Variable Declarations

        <variable declaration> ::= <identifier> ":" "var" <type>

A variable declaration binds an identifier to a newly allocated chunk of memory of size sufficient to hold a value of the indicated type.

Procedure and Function Declarations

        <procedure declaration> ::= <identifier> ":" "procedure"
                                    [ <formal parameter list> ]
                                    <body>
        <function declaration> ::= <identifier> ":" "function" <type>
                                    [ <formal parameter list> ]
                                    <body>

        <body> ::= "external" | ( <block> "end" )

        <formal parameter list> ::= "(" <parameter> { [","] <parameter>} ")"
                              |  "[" <parameter> { [","] <parameter>} "]"
                              |  "{" <parameter> { [","] <parameter>} "}"
        <parameter> ::= [ "ref" ] <identifier> ":" <type>

A procedure or function declaration binds an identifier to a constructor for a new block that, when constructed, inherits from the enclosing block. In effect, the procedure name identifies a new type, comparable to a record type, where the only permitted way to create an instance of that type is to call the procedure. The declarations in the formal parameter list, if present, are part of this block. When a procedure is called, the constructor is executed, allocating storage for the parameters and variables in the block and executing the associated code.

Parameters qualified by the keyword ref are passed by reference. In that case, the formal parameter will be a reference to the actual parameter when the call is made. Otherwise, the formal parameter will be a copy of the actual parameter.

A procedure call is a type of statement. Functions may be called only from within expressions. The type of the return value is given by the type given immediately following the keyword function.

Kestrel subsets may restrict this type reference to a simple identifier, eliminating the possibility of function types such as package.type.

A procedure with an external body refers to a procedure, typically written in some other language such as C, which can be called using the standard calling sequence of the host system and which some kind of linker can bind to the object code produced by the Kestrel compiler. Not all C routines can be called from Kestrel programs. Notably, routines such as scanf() and printf() that have variable numbers of parameters of uncertain types cannot be called.

Types

        <type> ::= <reference>
                |  <enumeration>
                |  <subrange>
                |  <pointer>
                |  <array>
                |  <record>

The reference used to define a type is usually just a type identifier, but it may refer to a visible (non-private) type defined in the body of a block that is visible in the current context. The referenced type must have been previously defined. Reference to an already defined type does not define a new type.

New types may be constructed using any of the following mechanisms:

Enumeration Types

        <enumeration> ::= "enum" "(" <identifier"> { [","] <identifier>} ")"
                       |  "enum" "[" <identifier"> { [","] <identifier>} "]"
                       |  "enum" "{" <identifier"> { [","] <identifier>} "}"

Each enumeration type is a new scalar type and is not interchangable with any preexisting type. The identifiers listed in the creation of an enumeration type are the constants of that type. An ordering relationship is established over the enumeration such that the first constant is less than the second, which is less than the third, and so on. Each enumerated type introduces a new base type.

Subrange Types

        <subrange> ::= <expression> ".." <expression>

Subrange types may be defined over any scalar type. A subrange establishes upper and lower bounds on values of that subrange type. The two expressions giving the bounds of the subrange must have constant scalar values, where the values are defined over the same base type, and the first value must be less than or equal to the second. A value is in a particular subrange if it is greater than or equal to the first or lower bound and less than or equal to the second or upper bound.

All practical integer types are subranges of the abstract integer type, which has an infinite range of values. The predefined type char can be considered to be a subrange defined over the alphabet, from NUL to DEL, for ASCII, or to NUL+255 for UTF-8.

Subranges declared with the upper and lower bound equal are, for practical purposes, new void types, in that an object of such a type occupies no memory.

Pointer Types

        <pointer> ::= '@' <type>

Pointer types give the memory address of an object of the indicated type. Although memory addresses may be implemented as an integer type, the only comparison operator permitted on objects of a pointer type is comparison for equality.

Array Types

        <array> ::= "array" <type> "of" <type>

Array types require an index type, the type of the array index, and an element type, the type of each array element. The index type must be a finite scalar type. The range of values of the index type indicates the number of instances of the element that will be allocated when the array type is elaborated.

Record Types

        <record> ::= "record" <block> "end"

When a variable of a record type is allocated, declarations in the block associated with that record type are elaborated and the code of that block is executed to initialize the record. The persistence of the resulting record depends on where the record was allocated. Note that record types may include procedural components; these may be viewed as methods when applied to instances of that type. As a result, record types are classes and instances of record types are objects from the point of view of the terminology commonly used for object-oriented programs.

A record type with an empty bock, for example, record end, is a void type. Instances of void records occupy no memory and have no useful value.

Statements

        <statement> ::= <if>
                     |  <case>
                     |  <loop>
                     |  <handler>
                     |  <raise>
                     |  <procedure call>
                     |  <assignment>

Statements direct the computation. Conditional, selection and iteration statements determine which computations are performed, while procedure calls direct the execution of the block of code associated with a procedure and assignment statements direct changes to the values of variables.

If Statements

        <if> ::= "if" <expression> [ "then" ]
                     <block>
               [ "else"
                     <block> ]
                 "end"

If statements begin by evaluating the controlling expression. This expression must have a Boolean value. If the expression evaluates to true, the block in the then clause is executed. If the expression evaluates to false, the block in the else clause is executed, if present. Execution of the controlled blocks proceeds by first allocating storage for any variables declared in the block, and then elaborating the declarations and executing the statments of the block. Both subsidiary blocks inherit from the enclosing block.

Case Statements

        <case> ::= "select" <expression> [ "from" ]
               { "case" <case label> { [ "," ] <case label> } [ ":" ]
                     <block> }
               [ "else"
                     <block> ]
                 "end"

        <case label> ::= <expression> [ ".." <expression> ]

Select-case statements begin by evaluating the selection expression. This expression must have a scalar value. If the value of the expression matches one of the case labels, the labeled block is executed. If no case label matches and if there is an else clause, the block in the else clause is executed. The subsidiary blocks operate in exactly the same way as the subsidiary blocks in an if statement.

The expression or expressions making up a case label must be constant values and their types must be compatible with the type of the controlling expression. Furthermore, no two case labels under one select clause may have the same value, nor may the ranges overlap.

Loop Statements

        <loop> ::= <while loop>
                 |  <until loop>
                 |  <for loop>

        <while loop> ::= "while" <expression> [ "do" ]
                             <block>
                         "end"

        <until loop> ::= "do"
                             <block>
                         "until" <expression>

        <for loop> ::= "for" <identifier> "in" <type> [ "do" ]
                           <block>
                       "end"

The expressions controlling while and until loops must have Boolean values. The while loop tests the value of the expression before each iteration of the block making up the loop body, and terminates as soon as this expression is false. The until loop evaluates the expression after each iteration of the loop body and terminates the loop as soon as this expression is true.

For loops declare a variable named by an identifier as being of a finite scalar type and then iterate the block making up the loop body over every value of that type. The order of the enumeration is indeterminate.

As with if and select-case statements, the block making up the loop body inherits from the enclosing block. In the case of for loops, the implicit declaration of the loop control variable may be considered to be in a block that inherits from the enclosing block and from which the loop body inherits. The declarations elaborated during any particular iteration of the block making up the loop body have an effect limited to that iteration. Any communication from one iteration to the next must, therefore, be through variables declared in the block(s) that enclose the loop.

Exception Handlers

        <handler> ::= "try"
                          <block>
                    { "handle" <reference> { [ "," ] <reference> } [ ":" ]
                          <block> }
                    [ "else"
                          <block> ]
                      "end"

The try block is attempted, and if it executes to completion without raising an exception, the handle and else clauses are ignored. If an unhandled exception is raised within the try block, or within any code called from within the try block, execution of that code is abandoned. The references in the handle clauses must name distinct exceptions. If the exception raised in the try block is named by the refrence on one of the handle clauses, that block is executed as an exception handler before control exits the try statement.

The else clause is equivalent to a handle block carrying the names of all other exceptions (even if those names are invisible in this context). If there is no handler in this try statement for the exception that was raised, this block is also abandoned.

Procedure Calls

        <procedure call> ::= <reference>

The reference given in a procedure call must refer to a declared procedure and end with a constructor for the actual parameter list of that procedure. Execution proceeds as follows:

Assignment Statements

        <assignment> ::= <reference> "=" <expression>

The reference given on the left-hand side of an assignment statement must identify a variable. The expression on the right-hand side of the assignment statement must have a type that is compatible with the type of that variable.

For the purpose of assignment, a value is compatible with a variable if the two are of exactly the same type or, in the case of a scalar type, if they are of the same base type and the value falls within the subrange constraint of the variable. For the case when the subrange constraint on the value is not contained in the subrange constraint on the variable, this requires run-time checking, and a range exception may be raised if the range constraint is violated.

Expressions

        <expression> ::= <comparand> [ <comparing operator> <comperand> ]
        <comparand> ::= <term> { <adding operator> <term> }
        <term> ::= <factor> { <multiplying operator> <factor> }

Multiplying operators take precedence over adding operators, which take precedence over comparison operators. Aside from those three levels of precedence, operators are evaluated from left to right.

In every context where an expression or sub-expression is required, the context may constrain the expression to be of a particular type, for example, boolean, or it may be constrained to be scalar. In any event, the expression itself returns a value of a particular type determined by its operands.

Wherever an operand is an array of a single element, when the context requires the operand to be of the same type as the array element type, the array element will be used as an operand. Thus, while the string constant "a" is interpreted as a constant array of one character, it may be used in any context where a character constant is required, for example, in the expression "a"+1.

Comparing Operators

        <comparing operator> ::= "="  -- equality
                              |  "<>" -- inequality
                              |  ">"  -- greater
                              |  ">=" -- greater or equal
                              |  "<"  -- less
                              |  "<=" -- greater or equal

The comparing operators always give a Boolean result, and the operands must be comparable.

Any two expressions of the same type are comparable for equality and inequality. Pointers are equal if and only if they refer to the same object. Comparison of pointers to variables of distinctly different types has a result that is a foregone conclusion: They are unequal and the expression is considered to have a constant value. Any pointer may be compared with null. Comparing two instances of the same record type for equality is equivalent to the logical and of a equality tests on each of the fields of the record. Comparing two instances of the same array type operates similarly.

Two scalar types are comparable if they have the same base type, for example, integer or a particular enumeration. If the subranges do not overalp, the result of a comparison is a foregone conclusion: They are unequal and the expression is considered to have a constant value.

Adding Operators

        <adding operator> ::= "+"  -- addition
                           |  "-"  -- subtraction
                           |  "|" -- logical or

Integers may be added to scalars to produce a result of a comparable scalar type (note that integers are themselves scalars). Thus, for example, 1+1=2 and "a"+1="b". When adding to a subrange type, the bounds on the subrange of the result are the sums of the bounds on the subranges of the operands. If the result is outside the bounds of the base type of the scalar, a range exception is raised.

Two operands of comparable scalar types may be subtracted to produce an integer result, and an integer may be subtracted from a scalar to produce a result of the same scalar type. Thus, for example, "b"-"a"=1 and "b"-1="a". The bounds on the subrange of the result are determined from the bounds on the operands. If the result is outside the bounds of the base type, a range exception is raised.

The logical or operator applies to boolean operands. In this case, if either operand is true, the other operand may or may not be evaluated.

Multiplying Operators

        <multiplying operator> ::= "*"  -- multiplication
                                |  "/"  -- division
                                |  "&" -- logical and

Integers may be multiplied to produce an integer product. The subrange of the result depends on the subranges of the operands. If the result is outside the bounds of the base integer type, a range exception is raised.

Integers may be divided to produce an integer quotient. The subrange of the result depends on the subranges of the operands. If the result is outside the bounds of the base integer type, a range exception is raised.

The logical and operator applies to boolean operands. In this case, if either operand is false, the other operand may or may not be evaluated.

Note that integer division involving negative numbers is problematic. A common definiton of integer division states that

This is trivially true for real numbers, but if the quotient is constrained to be an integer, it is problematic. A mature language definiton must grapple with this issue.

Factors

        <factor> ::= [ "-" | "~" ] <value>
        <value> ::= <number>
                 |  <reference>
                 |  <constructor>
                 |  <string constant>
                 |  "null"

The unary negation operator, if present, applies only to values with an integer or Boolean base type. For integers, it has the usual meaning, two's complement. The range of the result of applying the negation operator to an integer is computed simply by inverting the bounds on the original range. Note that the two's complement binary numbers in the range -2n-1 to 2n-1-1, the result will be in the range -2n-1-1 to 2n-1; two's complement representations in this range require n+1 bits, so the assignment i=-i, for example, may produce a value that is out of bounds, raising a range exception.

The unary "~" is logical not operator when applied to Boolean terms. When applied to integers, it is a one's complement operator, inverting all of the bits of the binary representation. The range of the result of applying the one's complement operator to an integer is problematic. Here, noting that

we declare that and we compute the range of the one's complement by one's complementing the bounds on the range of the operand.

The type of a numeric value is a subrange of the base integer type with bounds equal to the value. So, for example, the constant 4 is of type 4..4.

When a value consists of a reference, where that reference names a variable or constant, the value of that variable or constant is taken. Where the reference is to a function, the function is called and the return value is taken.

When a factor consists of a constructor, that constructor must have only one component and the value of that component is taken. This may later be generalized to support use of constructors to build instances of record types in contexts where a record type is permitted.

The special value null is a distinguished value that is assignment compatible and comparable with all pointer types. It is designated by a reserved word so that it may not be redefined.

String constants

        <string constant> ::= <string> { <string> | <expression> }

String constants are constant arrays of characters with an integer index type with bounds between zero and the size of the string. So, for example, the constant "Hello" is of type array 0..4 of char. Note that an array of one element may be used in any context where an object of the element's type is expected, so "a" may be used as a character constant as well as being an array of one character.

Adjacent strings separated by white space are viewed as a single string constant. An expression with a constant character value separated from a string by white space is considered to extend that string. This allows

"Hello world!" CR LF
to be seen as a single string.

Constructors

        <constructor> ::= "(" <expression> { [ "," ] <expression> } ")"
                       |  "[" <expression> { [ "," ] <expression> } "]"
                       |  "{" <expression> { [ "," ] <expression> } "}"

A constructor found in a context where a scalar value is expected permits only a single expression. That expression must deliver a scalar value.

A constructor following a reference to a procedure or function will construct the actual parameter list for a call to that procedure or function. In that case, number of expressions found in the constructor must match the number of formal parameters in the procedure or function declaration, and each of the successive parameters must be of a type compatible with the declared types of the formal parameters. In the case of parameters passed by value, the actual parameter must be assignment compatible with the formal parameter. In the case of parameters passed by reference, the actual parameter must be of the same type as the formal parameter, or of a subtype, as follows: Where the parameter is of an array type, the bounds of the actual parameter array may be a subrange of the bounds declared for the formal parameter, so long as the array elements are of identical type.

References

        <reference> ::= <identifier>
                     |  <reference> "@"
                     |  <reference> "." <identifier>
                     |  <reference> <constructor>

References name items declared in some block. All references begin with an identifier that must be bound in the current block or in a block from which the current block inherits. The reference, at this point, refers to whatever that identifier is bound to.

A reference followed by an at-sign such as p@ is only legal in contexts where the initial reference p names a variable of a pointer type and the value of that variable is not null. In that case, the resulting reference refers to the memory location referenced by that pointer, and the type of the result is the type of the object stored in that location, a type that can be statically inferred from the declaration of the pointer variable.

A reference followed by a dot such as v.f, is legal in contexts where the initial reference v refers to a variable of a record type. In this case, the identifier f must be declared in that type and may refer to any public declaration in that record.

In addition, v.min and v.max are legal in contexts where v refers to an array. In this case, they give the lower and upper bounds on the array subscript.

A reference followed by a dot such as t.f, is legal in contexts where the initial reference t refers to a record type (including the activation record types of procedures and function), and f refers to a public type (including activation record types), constant or exception declared in that record. Note that if v is a variable of type t, references to v.f and t.f are equivalent when referencing constants, and exceptions. Record types, cannot be instantiated except in the context of an instance, and procedures and functions cannot be called except in the context of an instance, so where t.f references a procedure, function or type, the reference is not permitted.

In addition, t.min and t.max are legal in contexts where t refers to a scalar type. In this case, they give the minimum and maximum values for variables of that type. For example, int16.max can be used instead of 32767.

A reference followed by a constructor may have several meanings. If the reference names a variable of type array, the constructor will be asked to deliver a single value of the index type of that array, and that value will be used to select an element of the array. In the event that the referenced array element is itself an array, additional elements are permitted in the constructor. As a result, a(x,y) is equivalent to a(x)(y).

Where a reference to a procedure or function is followed by a constructor, the constructor will be asked to deliver the actual parameters for a call to that procedure. For procedures, the value resulting from the reference is void. For functions, the value is the return value of the function. In the event that this is a pointer, the dereferencing may continue. Calls are executed as described under the heading Procedure Calls".

Where a reference to a record type is followed by a constructor, an instance of that record type is allocated, its initializer is run, and then the public fields are assigned values from the successive expressions in the constructor.

The Standard Prologue

The following Kestrel code produces definitions that can be considered to be a standard prologue for every Kestrel program, although entering these definitions as code does not necessarily define values exactly equivalent to the predefined values.

        exception range;

        boolean: type enum( false, true );

        NUL:  const "@"-64; -- for these, character in quotes is
        SOH:  const "A"-64; -- the control code used to type it.
        STX:  const "B"-64;
        ETX:  const "C"-64;
        EOT:  const "D"-64;
        ENQ:  const "E"-64;
        ACK:  const "F"-64;
        BEL:  const "G"-64;
        BS :  const "H"-64;
        TAB:  const "I"-64;
        LF :  const "J"-64;
        VT :  const "K"-64;
        FF :  const "L"-64;
        CR :  const "M"-64;
        SO :  const "N"-64;
        SI :  const "O"-64;
        DLE:  const "P"-64;
        DC1:  const "Q"-64;
        DC2:  const "R"-64;
        DC3:  const "S"-64;
        DC4:  const "T"-64;
        NAK:  const "U"-64;
        SYN:  const "V"-64;
        ETB:  const "W"-64;
        CAN:  const "X"-64;
        EM :  const "Y"-64;
        SUB:  const "Z"-64;
        ESC:  const "["-64;
        FS :  const "\"-64;
        GS :  const "]"-64;
        RS :  const "^"-64;
        US :  const "_"-64;
        DEL:  const NUL+127;

        char: type NUL .. NUL+255;

        int8:   type -128 .. 127;
        uint8:  type 0 .. 255;
        int16:  type -32768 .. 32767;
        uint16: type 0 .. 65535;
        int32:  type -16#80000000 .. 16#7FFFFFFF;
        uint32: type 0 .. 16#FFFFFFFF;

        file: type @ record end;

        input:  var file;
        output: var file;
        errors: var file;

        putchar:   procedure( c:char, f:file ) external;
        getchar:   function char ( f:file ) external;
        eof:       function boolean ( f:file ) external;
        putstring: procedure( ref s:array uint16 of char, f:file ) external;
        getstring: procedure( ref s:array uint16 of char, f:file ) external;

Note that NUL

Note that putchar, getchar and eof correspond to fputc, fgetc and feof in the C standard library. The parameters have the same meanings and appear in the same order; a small stub may be required in the Kestrel support library to change the name of the function.

The putstring and getstring procedures correspond to fputs and fgets in the C standard library, but the calling sequence for passing arrays by reference in kestrel is not compatible with that of C, so the function will likely require more a less trivial interface routine.