1. Notation and Lexical Structure

Part of the SMAL Manual
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Index

Notation
Lexical Structure
Identifiers
Numbers
Quoted Strings
Comments and Line Ends

1.1. Notation

The syntactic rules governing the formation of correct SMAL32 programs will be given in an extended form of BNF using the following special symbols:

< and >: Angle brackets enclose names of syntactic elements. These will be defined somewhere in the text unless the meaning is obvious.
{ and }: Curly races enclose optional syntactic elements that may be repeated an indefinite number of times.
[ and ]: Square braces enclose optional syntactic elements that may not be repeated.
|: Vertical bars separate alternative groups of syntactic elements that may be used interchangably.
::=: The BNF "assignment symbol" is used to define the name to the left as being replaceable by the group of syntactic elements to the right.

Technically, newlines are not significant in BNF, but for readability, major alternatives (separated by | are frequently set off from each other by newlines.

1.2. Lexical Structure

SMAL programs may be described in two ways: As sequences of lexical elements with no structure, and in terms of syntactic structures. At the lexical level, SMAL programs have the following structure:

<program> ::= { { <space> } <lexeme> } { <space> }
              <line end> <end of file>

That is, a program is a file containing a sequence of lexemes; lexemes may be separated by any number of blanks (or tabs), which have no significance, but lexemes may not contain blanks (except in quoted character strings). The legal lexemes are as follows:

<lexeme> ::= <identifier>
           | <number>
           | <quoted string>
           | <line end>
           | :   -colon, suffix on labels and substring notation.
           | .   -dot, refers to the location counter.
           | ,   -comma, separates macro parameters.
           | =   -equals, assignment, comparison, and macro parameters.
           | >   -greater than, comparison and shifting.
           | <   -less than, comparison and shifting.
           | +   -plus, addition and as a unary sign.
           | -   -minus, subtraction and as a unary sign.
           | *   -times, multiplication.
           | /   -divide by, division.
           | \   -backslash, unary not.
           | ~   -a synonym for \.
           | &   -ampersand, the and operator.
           | !   -exclamation point, the or operator.
           | |   -a synonym for !.
           | (   -begin paren, in a group or list.
           | )   -end paren, ends a group or list.

The comments above describe the use of the single character lexemes. The following additional punctuation marks are used within some lexemes:

             #   -number sign, numbers in arbitrary radixes.
             "   -double quote, quoted strings.
             '   -single quote, quoted strings and macros.
             ;   -semicolon delimits comments at line ends.

The following ASCII punctuation marks are not used in the SMAL32 assembly language:

             [ ] { } ` $

The following ASCII punctuation marks are not used, but are likely to be used in future revisions:

             ^   -may be used as a prefix for extended quotes.
             @   -may be used to name registers.

1.3. Identifiers

Identifiers begin with a letter followed by any number of letters or digits. All characters in an identifier are significant.

<identifier> ::= <letter> { <letter or digit> }

Identifiers may not contain spaces and may not cross from one line to the next, so any limit on line length will limit the length of identifiers. 80 character lines will always be permitted. For the purpose of defining identifiers, the underscore character is a letter. Note that identifiers must be separated from any following decimal number by at least one space.

The following are legal identifiers:

    I
    NEXT
    THISISONLYONEIDENTIFIER
    A1235_B42XY

The following are not legal identifiers:

    76TROMBONES
    THIS_ISN'T.ONE-IDENTIFIER

The particular set of letters allowed may vary between implementations; some may support only one case, but most will support both upper and lower case and some may support additional alphabets. Where both cases are supported, case differences will be significant in comparing identifiers.

1.4. Numbers

Numbers may take two basic forms: They may be simple decimal numbers, or they may be numbers in another radix (signified by a leading radix specification). A single character radix specifications is provided for hexadecimal numbers; other bases may be used by specifying the base as a decimal number, followed by the radix character, followed by the number in that base.

<number> ::= <decimal number>
           | <radix> <extended number>
<radix> ::= #
          | <decimal number> #
<decimal number> ::= <digit> { <digit> }
<extended number> ::= <letter or digit> { <letter or digit> }

The digits A through Z are used for the numerical values 10 through 35 in numbers with a radix over 10. Thus, the number A₁₆ is equal to 10₁₀ and is represented as either #A or just 10 in SMAL. For bases other than 10 and 16, the base must be specified, so, for example, 12₁₅, which has the value 17₁₀, is represented as 15#12 in SMAL.

The radix, if explicitly specified, must be between 2 and 36 (inclusive), and no digit must ever be greater than or equal to the radix. The assembler will raise the "bad radix" or "bad digit in number" errors when these rules are violated. The assembler uses a 32 bit internal representation, use of larger numbers will result in a "value out of bounds" error. Note that numbers between 2³¹ and 2³² are interpreted as negative numbers in some situations, for example, when performing comparisons. Note that decimal numbers must be separated from any following hexadecimal, octal, or binary numbers by at least one space to distinguish the decimal number from a radix specification.

The following legal numbers are all equal:

    32767
    #7FFF
    16#7FFF
    8#77777
    32#VVV
    2#111111111111111

The following are not legal numbers:

    3276A
    2#10120
    40#00

The letters allowed as digits greater than 9 begin with A which has the value 10, and continue up to Z with the value 35. The letter Z is only a valid digit in base 36. On systems allowing both upper and lower case letters, only upper case letters should be used as extended digits.

1.5. Quoted Strings

Quoted strings may be formed with either single or double quotes, and may be of any length that fits on one line.

<quoted string> ::= " { <character> } "
                  | ' { <character> } '

Constraints on the length of a quoted string may be imposed by the context in which it is used. A "missing end quote" error will be raised if the trailing quote is missing. It should be noted that the single quote (apostrophe) has a special meaning in macro definitions, and should be used with care. The following are legal quoted strings:

    "GREAT'STRING'(("
    "X"
    ''
    ')+"5"+('

The following are not legal strings:

    'DON''T DO IT'
    "UNBALANCED'

Note especially that it is not possible to include a quote in a string by repeating it, as in some high level languages.

1.6. Comments and Line Ends

All lines may end with a comment. Comments begin with a semicolon and may include any text. Comments are completely ignored in the normal assembly process, but they are stored as part of the text of a macro, and macro parameter substitution does apply to them.

<line end> ::= [ ; { <character> } ] <end of line>

Note that, unlike their use in high level languages descended from Algol, it is conventional to space the semicolon away from any immediately preceding commands and to align the semicolons on successive lines to make a column which delimits useful text from comments.