On the Hawk machine, the natural size for a variable is one word, but there are many applications where this is not the appropriate size. If you look at the Pascal programming language, for example, users are free to declare new types with arbitrary ranges, for example:
type funny = -352 .. 517;This says that variables of type funny may have values ranging from -352 to 517. The predefined type integer in Pascal has a size that depends on the host machine's wordsize, and the programmer can find out what that allows because the compiler provides a predefined constant called maxint giving the maximum legal integer value.
In C, the programmer can't arbitrarily set the upper and lower bounds on user-defined integer types. Instead, a fixed selection of integer types is provided. The standard integer types in C are listed below, along with the minimum range of values each should be able to store:
signed char -- -127 to +127 unsigned char -- 0 to +255 short int -- -32767 to +32767 unsigned short int -- 0 to +65535 long int -- -2147483647 to +2147483647 unsigned long int -- 0 to +4294967295The ANSI standard says that every C compiler should provide the actual limits for each integer type in the header file <limits.h>. As in Pascal, the most commonly used types, char and int, are not in the above list! Compilers are free to use either signed or unsigned characters for variables of type char, and they are free to use either short int or long int variables when the user asks for int (but the compiler should disclose its choice in <limits.h>).
In both Pascal and C, this freedom, on the part of the compiler, is a major source of portability problems when programs are moved from one machine to another or even from one compiler to another, but this freedom gives compilers a licence to take advantage of whatever size and sign conventions work best on each machine!
On the hawk machine, we have the following storage units:
byte -- -128 to +127 or 0 to +255 halfword -- -32768 to +32767 or 0 to +65535 word -- -2147483648 to +2147483647 or 0 to +4294967295When a variable is stored in memory, you can't tell whether it is signed or unsigned, it is only a bit pattern! The decision to interpret the most significant bit of the variable as a sign bit is made when the program manipulates that variable, and the sequence of instructions used to do that manipulation determines whether the sign is there or not!
Obviously, a C compiler should use bytes for char types, halfwords for short int types, and words for long int types, but what about types int and char? Because fetching and storing words is fast, while fetching and storing halfwords is slower, a C compiler written for the Hawk should use words for variables of type int, but the question of signed versus unsigned characters requires looking deeper into the peculiarities of the architecture.
When a fractional word variable is fetched into a register on the Hawk machine, it is extended to a 32 bit value, because the Hawk machine does not support short registers. By default, this extension involves sign extension, because the EXTB and EXTH instructions sign extend their operands. Consider the following example:
LOADS R4,R3 EXTB R4,R4,R3This fetches, into r[4], the byte pointed to by r[3]. If the byte in memory was:
m[r[3]] = 01100001The value loaded in r[4] will be:
r[4] = 00000000000000000000000001100001If the byte in memory was:
m[r[3]] = 10100011The value loaded in r[4] will be:
r[4] = 11111111111111111111111110100011This sign extension implies that C compilers for the Hawk machine should, by default, implement type char as signed char and not unsigned char.
So, what if you want unsigned characters or unsigned halfwords? You must add instructions to your program to explicitly truncate the loaded value to 8 or 16 bits! There are many ways to do this:
r[4] = 11111111111111111111111110100011 and 00000000000000000000000011111111 --------------------------------------- result 00000000000000000000000010100011Thus, the following sequence of instructions will fetch an unsigned byte into a register
LOADS R4,R3 EXTB R4,R4,R3 LIL R5,#0000FF AND R4,R5The constant FF, while only 8 bits, was loaded using the LIL instruction because the LIS instruction sign extends its operand! Thus, LIS with the constant #FF would load #FFFFFFFF and not #000000FF. If there are many references to unsigned bytes, it would be sensible to load the constant #FF in some register globally, or at least outside any loops that reference many unsigned bytes, and then use it as needed.
LOADS R4,R3 EXTH R4,R4,R3 SL R5,16 SRU R5,16Note that, with a 4 bit field on the shift instructions used to indicate how many places the register should be shifted, it would seem that the count of 16 would be illegal, but this is not so! Shifting a register zero places is of no particular use, so the shift instructions interpret a count of 0 as meaning 16, and coding a shift count of 16 in the assembly language is quite safe; the least significant 4 bits of this value are 0 and the extra bits above this are discarded!
Using shifting to truncate bytes is not recommended on the Hawk machine because this requires two successive 24 bit shifts, but the Hawk only allows 16 bits of shift per shift instruction. As a result, it takes two shift instructions to move 24 bits (how the job is divided between these instructions makes no difference at all), as follows:
LOADS R4,R3 EXTB R4,R4,R3 SL R5,16 SL R5,8 SRU R5,12 SRU R5,12In the above, two successive shifts wer
LOADS R4,R3 EXTB R4,R4,R3 TRUNC R4,8This TRUNC instruction zeros out all but the least significant 8 bits of the loaded register. TRUNC 16 would zero out all but the least significant 16 bits, and if someone needed it, TRUNC 5 would zero all but the least significant 5 bits. Adding a corresponding SXT instruction would allow sign extending a bit pattern of arbitrary length. These instructions will be added to the Hawk shortly, but it is an interesting exercise to try to write macros that do the same thing using the instructions currently avaialble on the machine. (Hint: consider using sequences of shifts!)
Some applications require big variables! Modern cryptographic algorithms, for example, require computation on 512 and sometimes 1024 bit values, and conversion between calendar systems, for example, between the Jewish mixed solar-lunar calendar and the Gregorian calendar we use today, requires very high precision arithmetic (some wonderful research on calendars, done by Ed Reingold at Illinois, has led to algorithms for converting between the Hindu, Jewish, approximate Moslem, Julian, Gregorian and ancient Maya calendars; only the approximate Moslem date can be computed because the Moslem calendar relies on actual observation of the new moon to determine when each month begins).
So, what do you do if you need high precision? Most machines provide some way to build high precision arithmetic out of low precision pieces. This is essential on machines with short words, but the examples above suggest how useful it is on machines with longer words. Languages like C and Pascal provide essentially no help in this area, since the fundamental problem is propagation of carry infromation from one add or subtract to the next, and it is extremely difficult to find the value of the carry bit after an add in C, and even harder in Pascal! (It can be done, if it couldn't, emulators such as the Hawk emulator couldn't be written in C or Pascal!)
At the machine language level, the typical solution to the carry propagation problem is to have a special instruction to add the carry from one add into another. Suppose, for example, we have an application that needs to add two 96 bit integers, one stored in R1-3, and the other stored in R4-6, with R1 and R4 holding the least significant 32 bits, and R3 and R6 holding the most significant 32 bits. On the Hawk, we would do this as follows:
ADD R1,R1,R4 ; add low bits ADDC R2,R5 ; add middle bits ADDC R3,R6 ; add high bitsThe ADDC (add with carry) instruction adds its source operand to its destination operand, and at the same time, adds one extra if the C bit is set. Similarly, there is an SUBB (subtract with borrow) instruction that subtracts from its destination operand and in addition, subtracts one more if a borrow is needed (signalled by the carry bit being reset!).
The ADDC and SUBB instructions correctly report the condition codes for the entire long operation, not just the last 32 bits. Thus, after the above 96 bit operation, N will be set if the entire result was negative, Z if the entire result was zero, V if the entire result had an overvlow, and C if there was a carry out of the entire result.
If very high precision is needed, so that not all operands can be fetched into registers before the add or subtract is started, it becomes essential that the load and store operations used to move the component words to and from memory not meddle with the carry bit between steps! This is one reason the Hawk architecture provides both LOAD and LOADCC -- the former does not meddle with the condition codes, the latter does.