2. Data Representation
Part of
CS:2630, Computer Organization Notes
|
When we write computer programs, we always deal with representations. For example, we don't deal with text the way a calligrapher or author deals with text, we deal with a specific character set that may be used to represent text. Instead of dealing with integers or real numbers, we deal with finite precision representations of integers and we deal with floating point numbers.
Well designed programming languages allow the programmer to treat these representations as if they were identical to the abstractions they represent, but this identity is never complete. Thus, for example, our machine representations of integers behave exactly like abstract integers until a result greater than the machine precision is produced; at that point, either the behavior of integer variables becomes very strange (the usual consequence in C and C++) or the program raises an exception.
If speed is no issue, arbitrary precision arithmetic packages are available in some languages such as C and Java. These packages, while slow, automatically switch to larger representations every time the result of a computation overflows the representation previously in use. In languages where speed is of only secondary importance, such as Python, all integers are represented with such arbitrary precision packages.
Just about every introduction to assembly language or machine language
programming ever written begins with a section like this. The
IBM 7040/7044 Data Processing System Student Text (IBM, 1963) begins
with with a chapter on binary arithmetic that is as good as anything more
recent, and the Introduction to Programming for the PDP-8 (DEC, 1973)
has a nice "Number System Primer" that covers this ground equally well!
All modern computers use binary representations for their data, and all use fixed size containers for data. There is no reason that computers could not be built that use other representations. The Russians built a production run of 50 ternary (base 3) computers in the 1960s, known as the Setun. The pioneering ENIAC (1945) used decimal, as did the IBM 650 (1953) and 1401 family (1959-1968). There is no compelling benefit to using non-binary representations, though, and the decimal machines of the 1950 and 1960s were built using binary logic and used binary-coded-decimal (BCD) internally.
The bit is the fundamental unit of information. A bit has two values, conventionally called zero or one. We could as easily name the values on and off, high and low, or true and false; in fact, all of those names are used in different contexts, and the truth is that all such names are arbitrary. The representation of a bit inside the computer is also arbitrary. The presence or absence of holes in a punched card or strip of paper tape was once a common representation of bits, while inside the computers, the two values of a bit are usually represented using different voltages. The specific voltages used does not generally matter to the computer architect, but is left to the engineering level in the design.
Binary encoding of information dates back to the 19th century with the development of the punched card, but it only grew to maturity in the second half of the 20th century, with the dawn of the computer age. Over the years, bits have been grouped in many different ways, and these groupings have been given many names, words, bytes, characters and octets are but a few examples.
Here is the classic C hello world program, only slightly changed since it was first published in The C Programming Language by Kernighan and Richie (Prentice Hall, 1976). Typically, you'd store this program in a file called hello.c, and once you did that, you could compile it with the shell command cc hello.c. Once compiled this way, you can run the program with the shell command a.out.
/* hello.c -- the classic hello world program */ #include <stdio.h> int main( void ) { printf( "Hello World\n" ); return 0; } |
The first line of the program is a comment; programmers should put a header comment in each file that identifies, at the very least, the file and its function. The second line is an include directive telling the C compiler (or more specifically, the C preprocessor) to go look for a file called stdio.h in the system library and use the definitions there. That particular header file contains the definitions for the standard input-output functions, one of which is called printf.
In C, the main program must be declared as an integer function called main, and the return value should be zero when the program terminates normally, or some nonzero value (conventionally 1) to indicate failure.
The printf function is used to output text to the standard output
stream. It is a function, in the sense that it has an integer return value,
the number of characters that were output. C allows function return values
to be discarded, and in this program, we do not need this value.
The use of printf here is trivial, it is
being used to output a constant string. Printf is a bizarre function, part of
a legacy that dates back to the original FORTRAN in 1957; it is the root
cause of numerous security bugs, yet it is the primary way that C programs
produce text output.
/* limits.c -- a program to explore the limits of C variables */ #include <stdio.h> int main( void ) { int i = 1; /* holds successive powers of two */ int oldi; /* previous value of i after first iteration */ do { printf( "i = %d\n", i ); oldi = i; i = i + i; } while (oldi < i); return 0; } |
The limits.c program given here is designed to explore the limits of C variables. Here, the variables i and oldi are declared to be of type int meaning integer. Unlike Python, where variables are created when you assign values to them, but like Java, C variables must be explicitly declared and given a data type before use.
This program would be an infinite loop if you wrote it in Python, where variables are elastic and simply expand to hold whatever value you place in them. In C and Java, variables have a fixed capacity. In both of those languages, if you try to increment a variable beyond its maximum possible value, the result rolls over, much like the mechanical odometer of an old car.
Like Java and Python, C has a number of looping constructs. The do-while loop used in limits.c always executes the loop body at least once, testing the while condition after each iteration. This is called a post-test loop. Like Java and Python C also has a while loop, a pre-test loop that only executes the body when the while condition is true.
The variables i and oldi are local variables of main; i is initialized at the point where it is declared, but oldi is not initialized until the loop body. If the program used a pre-test loop, oldi would be uninitialized where it is tested in the loop condition, but because the program uses a post-test, there is no problem with using oldi in the while condition.
Actually, C local variables have default initial values; for integer variables, the default is zero. This means that they are never uninitialized, in the narrow technical sense of the word, but it is not always the case that the default initial value of zero is the correct one.
Finally, printf has two parameters in this program. The first parameter is the format string. The text of this string will be output, but wherever there is a % (percent sign) in the string, an additional parameter is required, where the letter after the percent tells the form of this additional argument. In the example, we used %d to request that a decimal integer be output, and we provided a second parameter, an integer. Format strings for printf can be very complex, we have only scratched the surface here.
In summary, this program starts with i set to one and then repeatedly doubles i, adding it to itself and printing the result until i rolls over. It is wort trying this with a variety of different integer types to see the range of values each supports. This program does not find the maximum possible value, it only finds the nearest power of two just below the limit. Writing a program to find the exact maximum value each type supports is harder, particularly large data types.
signed char a; char b; unsigned char c; short int d; unsigned short int e; int f; unsigned int g; int h; unsigned long int i; long int j; |
A few of the standard C declarations look just like Java declarations. C variables of type short int are usually 16 bits, type int is usually 32 bits, and type long int is usually 64 bits. Java's int is the same, and Java's long corresponds to C's long int. In most C compilers, type signed char is an 8 bit intger type, unlike Java, where it is can hold a 16-bit UTF16 character.
Unfortunately, usually is not the same as always. The original C compiler defined char as 8 bits, int as 16 bit and long int as 32 bits, and C compilers for some small microcontrollers still use these definitions. Furthermore, there is no requirement that computers use 8, 16, 32 and 64 bit chunks. C compilers were written in the 1970s for 36-bit machines.
It took some time for the community maintaining the C language to come to grips with the problems posed by the slippery definition of C's variable sizes. Eventually, in C99, a set of standard (but optional) types was provided. These are defined in the inttypes.h header file.
#include <inttypes.h> uint8_t a; /* 8 bits, unsigned */ int8_t c; /* 8 bits, signed */ uint16_t d; /* 16 bits, unsigned */ int16_t e; /* 16 bits, signed */ uint32_t d; /* 32 bits, unsigned */ int32_t e; /* 32 bits, signed */ uint64_t d; /* 64 bits, unsigned */ int64_t e; /* 64 bits, signed */ |
a) Create, compile and run the Hello World program.
b) Use code based on limits.c to find the largest power of two that you can represent in a C int, and then modify the code so that it also reports the value that C computes when doubling that maximum value.
c) Modify the code so that it computes the exact largest value that C can represent as a long int. Note, to output a long int, the format string in printf must be changed from %d to %ld. This is an algorithms problem, if you try to write simple counting code, your solution will take ages to finish.
An assembly language is a language for assembling data and instructions into memory. The most primitive operations in an assembly language place individual data items such as bytes, words or halfwords into memory. Other more complex operations fill memory with instructions; initially, we will ignore these, but it is important to understand that assemblers do not understand the meanings of any instructions, they merely place them in the computer's memory as successive chunks of data.
In most assembly languages, each line of text specifies one data item of some kind to be placed in memory. Consecutive lines of text specify the contents of consecutive memory locations. The assembly directive or instruction on each line specifies how to place data in memory, while the operands of the directive or instruction specify what data to place in memory.
Our assembly language is called SMAL, because it is a Symbolic Macro Assembly Language. The central purpose of any assembly language is to assemble data into memory. SMAL contains direct support for 8-bit bytes, 16-bit halfwords and 32-bit words, using the B, H and W directives. (There is also a T directive for the sometimes useful 24-bit triple-byte quantity.) Each of these assembly directives takes an operand giving the value of one byte, halfword or word. One word of memory can be filled with a W directive, or with two consecutive H directives, or four B directives. (Or, more obscurely, a T and then a B directive.)
W 2#10101010110011001111000011111111 H 2#1111000011111111 H 2#1010101011001100 B 2#11111111 B 2#11110000 B 2#11001100 B 2#10101010 |
In this example, we have filled 3 words of memory so that each 32-bit word holds the identical binary value. The first line of code assembles a word as a single operation. The next 2 lines assemble the 2 halfwords of a word in 2 operations, while the final 4 lines assemble the 4 bytes of a word in 4 separate operations.
The values in this example were all given in binary, indicated by the leading 2# on each word. The SMAL assembler supports other data representations, to be discussed later. When assembling code for the Hawk or for the Intel 8086/Pentium family, the least significant byte or halfword is always assembled first. So, the first H directive in the example places binary 1111000011111111 in the least significant 16 bits of the next word and the first B directive in the example places binary 11111111 in the least significant 8 bits of the next word. If you look at the first line, where an entire word is assembled in one shot, you should note that the least significant 8 bits are 11111111.
Some other computers put the most significant byte first when breaking a
word down into multiple bytes.
IBM's 360 – Enterprise Server family and their Power
architecture do this.
There is no particular advantage of one scheme over the other, so
different computers take one or the other approach at random. This
variety can be a source of confusion for novice programmers, although this
issue usually only comes up when writing low-level system code.
How many values can be coded in one bit? Two! We'll call them 0 and 1 for
now, but in other contexts, it makes more sense to call them true
and false. The names we use are arbitrary.
How many values can be coded by 2 bits? By 3 bits? By 4 bits?
Here is a systematic listing of all possible combinations of 4 bits:
0 0 0 0 | 1 0 0 0 | 4 bits gives 24 = 16 possibilities |
0 0 0 1 | 1 0 0 1 | |
0 0 1 0 | 1 0 1 0 | |
0 0 1 1 | 1 0 1 1 | |
0 1 0 0 | 1 1 0 0 | |
0 1 0 1 | 1 1 0 1 | |
0 1 1 0 | 1 1 1 0 | |
0 1 1 1 | 1 1 1 1 |
The table on the previous page lists all combinations of 2 bits, in the rightmost 2 bits of each quadrant, and all combinations of 3 bits in the rightmost 3 bits of each column. In general, to represent n distinct values, it takes ⌈log2(n)⌉ bits (log2 is the logarithm to the base 2; ⌈i⌉ means i rounded up to the next integer).
How we associate values with bit combinations is our business. We could represent Tuesday as 0000 and five as 0001, or we could be systematic, associating 0000 with a, 0001 with b, and so on. The need for compatibility with older encodings while extending them to represent additional values frequently leads to bizarre results that do not seem entirely rational unless you understand the history.
Character encodings can be very arbitrary and have varied widely over the years. Consider the archaic character set used by ILLIAC I, a computer built at the University of Illinois in the 1940s. Despite the early date, this character set inherits its odd order from even older technology. The table below was transcribed from a typewritten original in The ILLIAC Miniature Manual by John Halton (U. of Illinois Digital Computer Lab file no 260, Oct 22 1958, page 3):
| Characters |n for 92 Tape Holes | F/S L/S | Orders ----------------------------------- o 0 P 2F o O 1 Q 66F o O 2 W 130F o OO 3 E 194F oO 4 R 258F oO O 5 T 322F oOO 6 Y 386F oOOO 7 U 450F Oo 8 I 514F Oo O 9 O 578F Oo O + K 642F Oo OO - S 706F OoO N N 770F OoO O J J 834F OoOO F F 898F OoOOO L L 962F |
| Characters |n for 92 Tape Holes | F/S L/S | Orders ----------------------------------- O o Delay Delay 3F O o O $(Tab) D 67F O o O CR/LF CR/LF 131F O o OO ( B 195F O oO L/S=Letter-Shift 259F O oO O , V 323F O oOO ) A 387F O oOOO / X 451F OOo Delay Delay 515F OOo O = G 579F OOo O . M 643F OOo OO F/S=Figure-Shift 707F OOoO ' H 771F OOoO O : C 835F OOoOO x Z 899F OOoOOO Space Space 963F |
This character set is coded using
5 bits per character, with two special codes, L/S
or letter shift and F/S or figure shift used to
shift between the two different and wildly disorganized sets of 32 characters.
The code CR/LF or carriage return -- line feed was used to
mark the end of each text line.
In the table, the letters O and o are used to signify
holes punched in the paper tape that the ILLIAC I system used for input and
output. These tapes could be punched or printed using Teletype machines.
|
The jumbled organization of this character code is based in part on the QWERTY layout of the keys on a typewriter keyboard, originally developed for the Sholes and Glidden typewerter from 1873. Beyond that, the order is something of a a mystery! The tape is punched with a hole for each one in the binary code, while no hole stands for the digit zero. The row of smaller holes punched along the tape does not code any data; these holes engage the sprocket that pulls the tape through the tape reader. In reading the characters from the tape, the sprocket holes serve somewhat like a decimal point.
d) Show the string "HELLO WORLD." as it would have been punched on paper tape for the ILLIAC I. Remember, you don't know if the previous text left the system in figure-shift or letter-shift state. (Use capital O for data holes in the tape, use small o for sprocket holes. Work from top to bottom, as in the table defining this code.)
e) What is encoded on the snippet of ILLIAC I tape given above?
The ASCII character set was proposed in the early 1960s as a sensible response to the limits of the 6-bit character sets common at the time. ASCII (the American Standard Code for Information Interchange, later called ISO-7) was intended for use on new computers with 8-bit bytes, but it only used 7 bits and it had an upper-case only subset that fit in 6 bits:
left 3 bits | ||||||||||
---|---|---|---|---|---|---|---|---|---|---|
000 | 001 | 010 | 011 | 100 | 101 | 110 | 111 | |||
Right 4 bits | 0000 | NUL | DLE | SP | 0 | @ | P | ` | p | |
0001 | SOH | DC1 | ! | 1 | A | Q | a | q | ||
0010 | STX | DC2 | " | 2 | B | R | b | r | ||
0011 | ETX | DC3 | # | 3 | C | S | c | s | ||
0100 | EOT | DC4 | $ | 4 | D | T | d | t | ||
0101 | ENQ | NAK | % | 5 | E | U | e | u | ||
0110 | ACK | SYN | & | 6 | F | V | f | v | ||
0111 | BEL | ETB | ' | 7 | G | W | g | w | ||
1000 | BS | CAN | ( | 8 | H | X | h | x | ||
1001 | HT | EM | ) | 9 | I | Y | i | y | ||
1010 | LF | SUB | * | : | J | Z | j | z | ||
1011 | VT | ESC | + | ; | K | [ | k | { | ||
1100 | FF | FS | , | < | L | \ | l | | | ||
1101 | CR | GS | - | = | M | ] | m | } | ||
1110 | SO | RS | . | > | N | ^ | n | ~ | ||
1111 | SI | US | / | ? | O | _ | o | DEL |
Spaces are encoded with the SP code, 010 0000.
The function of the other named codes will
be given later. Using this, we can encode the string
"Hello World!" in ASCII as follows:
100 1000 | H | 101 0111 | W | ||||
110 0101 | e | 110 1111 | o | ||||
110 1100 | l | 111 0010 | r | ||||
110 1100 | l | 110 1100 | l | ||||
110 1111 | o | 110 0100 | d | ||||
010 0000 | 010 0001 | ! |
The ASCII character set, now officially known as the ISO-7 Latin 1 character set, is the basis of the widely used character sets in common use today and forms the base from which Unicode was developed. From the very start, ASCII text has been encoded in 8 bit bytes by adding an extra bit on the left, so the letter 'a' is coded as either 0110 0001 or 1110 0001 instead of being coded as 110 0001. Some vendors opted to set the extra bit to 0 while others set it to 1. Fortunately this variation ended in the 1980s. In all modern 8-bit character sets, this bit is zero to indicate the classic ASCII character set and one to indicate one or another extended character set.
Whether ancient or modern, all practical character codes
contain several non-printing codes.
In ASCII, these have standard 2 and 3 letter names.
Except for space, SP, these are
known as control characters.
Most ASCII control characters are rarely used. In some
cases, they were included in ASCII to support applications
that are now obsolete and some of them were included to support
applications that never caught on.
In the C programming language and its relatives,
special representations are used
for several control characteres. For example,
HT (horizontal tab) is represented with \t and FF (form feed)
with \f.
NUL | null, intended to be ignored, commonly abused as end-of-string | |
---|---|---|
SOH | start of heading | |
STX | start of text | |
ETX | end of of text | |
EOT | end of of transmission | |
ENQ | enquire (about the status of a remote terminal?) | |
ACK | acknowledge (an answer to ENQ?) | |
BEL | \a | bell, used to sound an alert on the remote terminal |
BS | \b | backspace, commonly abused as erase |
HT | \t | horizontal tab, still used as originally intended |
LF | \n | linefeed, frequently used as end-of-line |
VT | \v | vertiacal tab, commonly used as a reverse linefeed |
FF | \f | formfeed, on printers, frequently causes page eject |
CR | \r | carriage return, frequently used to mean enter |
SO | shift out (change text color?) | |
SI | shift in (undo SO) | |
DLE | data link escape | |
DC1 | device control 1 (for devices on a remote terminal) | |
DC2 | device control 2 | |
DC3 | device control 3 | |
DC4 | device control 4 | |
NAK | negative acknowledge (an answer to ENQ?) | |
SYN | synchronize (for synchronous data links) | |
ETB | end transmission block | |
CAN | cancel | |
EM | end of message | |
SUB | substitute | |
ESC | escape (a prefix altering the use of what follows) | |
FS | file separator | |
GS | group separator | |
RS | record separator | |
US | unit separator | |
SP | space | |
DEL | delete, intended to be ignored |
Pure Unicode uses 21 bits to encode each
character, although the characters from the common alphabets of the world
can be coded using the 16-bit subset of Unicode.
When Unicode is represented in the UTF-8 encoding, the binary codes
0000 0000 to 0111 1111 are used for the 128 characters of the
classic ASCII character set. Each of the
Unicode codes from 128 to 2047 can be coded in two-bytes in UTF-8.
The two-byte UTF-8 codes include complete support for
the Latin, Cyrillic, Greek, Arabic and Hebrew alphabets, including a wide
range of accent marks. For codes from 2048 to 65535, the UTF-8 encoding
uses three bytes per character.
These three byte codes suffice for most purposes, but the UTF-8 extension
scheme can be continued to encode up to 36 bits of data as a string of seven
8-bit bytes. The Unicode standard never needs more than 4 bytes to encode a
21-bit value in UTF-8.
Range of Values | Encoding | Number of Bytes | ||||
---|---|---|---|---|---|---|
0 | - | 127 | 0xxx xxxx | 1 (classic ASCII) | ||
128 | - | 2047 | 110x xxxx | 10xx xxxx | 2 | |
2048 | - | 65535 | 1110 xxxx | 10xx xxxx | 10xx xxxx | 3 |
In the UTF-8 scheme illustrated above, the each bit of the pure Unicode character is shown as an x, while bits added in the UTF-8 encoding are given explicitly as ones and zeros. The order of the Unicode bits remains unchanged when the consecutive bytes of the UTF-8 encoding are written out from left to right.
Note that not all possible 21-bit values are legal Unicode characters. We will not cover the full complexity here, but it is worth noting that UTF-16 allows all valid Unicode characters to be represented in 1 or 2 consecutive 16-bit words, with most characters in the common alphabets represented in just 1 word.
One other character set from the 1960s is still with us, EBCDIC, the Extended Binary Coded Decimal Interchange Code. This is an 8-bit character set originally devised by IBM and still used on the IBM enterprise servers used by many large corporations. EBCDIC, extends BCD, an older 6-bit code.
f) Represent "The 4th of July." (without quotes) in 7-bit ASCII, in binary.
g) What range of Unicode values is encoded in 4 consecutive bytes in the UTF-8 encoding?
h) If the DEC PDP-8 made a comeback, we might want a Unicode UTF-12 encoding for that machine's 12 bit word. Following the coding scheme of UTF-8, what range of Unicode values would you encode in one 12-bit word? In two 12-bit words? In three 12-bit words?
C uses single quotes for character constants and double quotes for character strings. Characters are no different from 8-bit integers, so they may be added and subtracted. So, for example, the boolean expression ('a'+1)=='b' is true. When writing character and string constants, the backslash character has a special meaning. The character constant '\n', for example, is newline or the ASCII control character LF. If the backslash is followed by a quote mark or a backslash, it means 'include this character in the string literally,' so '\'' is a character constant representing a single quote. If the backslash is followed by digits, you are giving the numerical code for that character. As a result, the expression '\b'=='\7' is true. Unfortunately, the digits are in octal (base 8), not decimal, so '\n'=='\12' is true, where you would expect the value 10.
/* alphabet.c -- a program demonstrating character manipulation */ #include <stdio.h> int main( void ) { int i; const int capitalize = ('A' - 'a'); char* lowercase = "abcdefghijklmnopqrstuvwxyz"; char uppercase[27]; printf( "lowercase = %s\n", lowercase ); for (i = 0; i < 26; i++) { uppercase[i] = lowercase[i] + capitalize; } uppercase[26] = '\0'; printf( "uppercase = %s\n", uppercase ); } |
The alphabet.c program given here works with two character strings, lowercase which is initialized to a character constant containing the 26 lower case letters, and uppercase which is an uninitialized array of 27 characters. The declaration char* creates a string pointer or string handle, and in C, every array of characters is potentially a string, if you use it as one. To output a string, you use %s in the format string of printf and include the string pointer in the parameter list.
The program uses a for loop to iterate over the 26 letters in each string. In c, the loop control variable must be declared outside the loop. The loop body copies the letters, one at a time, adding an integer constant to each letter. It is an integer because while the codes for ASCII characters vary over a range from 0 to 127, the difference between two arbitary characters varies over the range –127 to +127. The value is a constant, we could look up the characters in the ASCII table, work out their values, take the difference and then use that constant value in the code, but this program lets the compiler do that clerical work.
Finally note that while the string constant pointed to by lowercase holds only 26 characters, the array uppercase is 27 characters long, indexed from 0 to 26. The string constant actually has a 27th character, a final a NUL (or '\0') that indicates the end of string. This must be present for printf to work correctly when using \s format conversion.
Our assembly language, SMAL, contains two features that directly support
the ASCII character set. The first is its support for characters as operands
for assembly into memory as bytes, halfwords or words (although the most common
use is as bytes). The second feature is the ASCII directive for
storing the consecutive bytes of a string in memory. As a result, we could
store the text Hello in three different ways, as illustrated below:
B 2#01001000 B 2#01100101 B 2#01101100 B 2#01101100 B 2#01101111 B "H" B "e" B "l" B "l" B "o" ASCII "Hello" |
This example shows the same 5-character string being encoded in three different ways, producing a total of 15 consecutive bytes of data. First, there is a sequence of 5 bytes with values given in binary, followed by a sequence of 5 bytes with values given as quoted characters, followed by an ASCII directive to place a 5-character string in memory.
We could have encoded this with a single line reading ASCII "HelloHelloHello". There is no difference at all between the data loaded in memory by these three different approaches. Once it is loaded in memory, all data becomes nothing but patterns of bits with no inherent meaning. Nothing in the result records how the data got into memory.
Unlike C, SMAL does not distinguish between single and double quotes. If you want to create a character constant or ASCII string containing a single quote, use double quotes around it, and visa versa. SMAL also does not put a trailing null on strings. You must put that there yourself, as we will discuss later.
SMAL has no direct support for control characters other than SP, space, although as we will see, it is possible to define identifiers with any desired value. We can therefore define each of the control characteres by name, for example, giving NUL the meaning 0000 0000 and LF the meaning 0000 1010.
In the text up to this point, we've informally assumed that you know something about binary numbers, or at least, that you understand that a series of bits that make up a computer word can be expressed as a number. Now, it is time to be very explicit about number representations.
It is possible to associate arbitrary bit patterns with numeric values. For example, nothing, prevents us from deciding to represent zero as 1101 and one as 1000. Using such a mapping from binary values to the integers makes arithmetic extremely difficult -- in effect, this arbitrary mapping forces us to use table-lookup schemes for all arithmetic operations.
Table lookup has been used as the basis of all arithmetic on at least one computer, the IBM 1620, codenamed the Cadet during development in the late 1950s. Later, someone decided that the name should be interpreted as the acronym "Can't Add, Doesn't Even Try," because the Cadet used table-lookup to compute the results of arithmetic operations. Some programmers even loaded new tables so they could do arithmetic in bases other than 10, the base for which the machine was designed.
With few exceptions, computers today use variations on the binary
number system for integer arithmetic. The exceptions include a few
decimal computers and even fewer ternary (base 3) machines.
The binary system has its roots in ancient China, where it was used to
order the 64 combinations of the I-Ching sticks (coins could also be used;
what matters is that each stick or coin could be in
one of two states; the pattern resulting from a throw of the sticks or coins
was used for fortune telling).
0 | = | 000000 | 16 | = | 010000 | 32 | = | 100000 | 48 | = | 110000 | |||
1 | = | 000001 | 17 | = | 010001 | 33 | = | 100001 | 49 | = | 110001 | |||
2 | = | 000010 | 18 | = | 010010 | 34 | = | 100010 | 50 | = | 110010 | |||
3 | = | 000011 | 19 | = | 010011 | 35 | = | 100011 | 51 | = | 110011 | |||
4 | = | 000100 | 20 | = | 010100 | 36 | = | 100100 | 52 | = | 110100 | |||
5 | = | 000101 | 21 | = | 010101 | 37 | = | 100101 | 53 | = | 110101 | |||
6 | = | 000110 | 22 | = | 010110 | 38 | = | 100110 | 54 | = | 110110 | |||
7 | = | 000111 | 23 | = | 010111 | 39 | = | 100111 | 55 | = | 110111 | |||
8 | = | 001000 | 24 | = | 011000 | 40 | = | 101000 | 56 | = | 111000 | |||
9 | = | 001001 | 25 | = | 011001 | 41 | = | 101001 | 57 | = | 111001 | |||
10 | = | 001010 | 26 | = | 011010 | 42 | = | 101010 | 58 | = | 111010 | |||
11 | = | 001011 | 27 | = | 011011 | 43 | = | 101011 | 59 | = | 111011 | |||
12 | = | 001100 | 28 | = | 011100 | 44 | = | 101100 | 50 | = | 111100 | |||
13 | = | 001101 | 29 | = | 011101 | 45 | = | 101101 | 61 | = | 111101 | |||
14 | = | 001110 | 30 | = | 011110 | 46 | = | 101110 | 62 | = | 111110 | |||
15 | = | 001111 | 31 | = | 011111 | 47 | = | 101111 | 63 | = | 111111 |
We must distinguish between the representation of a number and its abstract value. When we say ten, we are naming an abstract value, without reference to any number system. In English, we have simple names for all values up to twelve. When we write "12", we are using a decimal representation for the value we call twelve. This twelve is itself a representation, a string of italic lower-case letters encoding a sound. This sound is related to its abstract value only by the cultural conventions of the English speaking world.
In what follows, we will use a programming language notation, where
unquoted numerals stand for abstract integer values, subject to arithmetic
operations, while quoted strings stand for concrete representations of those
values.
Thus, the string "011100" is just a string of symbols, with no inherent
meaning. It could be equivalent to eleven thousand one hundred if we
decided to interpret it as a decimal number, or it
could be the binary representation of the value twenty eight.
Consider, first, the formula that relates representations of decimal numbers to their values:
digits(S)-1 | ||
value(S) = | Σ | v(S[i]) × 10i |
i = 0 |
Users of the decimal system usually speak of the one's place, the ten's place, the hundred's place and so on. To illustrate the use of this formula, consider the following:
value("1024") = v("1") × 103 + v("0") × 102 + v("2") × 101 + v("4") × 100 |
In thinking about the above example, it is important to distinguish between "1", the numeral, which is a character, v("1"), the application of the numeric value function v to this numeral, and one, the actual value computed by the application of this function. As humans, we confuse these all the time, rarely distinguishing between the numeral, a symbol, and the abstract value it represents. Inside a computer, this distinction is crucial.
The formula given above can be directly reduced to code in a high level programming language. For example, the following C code will convert from decimal numbers, passed as a string, to the corresponding binary value, encoded in a variable of type int:
/* value.c -- a program to compute the value of a the textual number */ #include <stdio.h> #include <string.h> #include <math.h> int v( char d ) { /* given character d, a decimal digit, return the int value */ return d - '0'; } int value( char S[] ) { /* given string S holding a decimal number, return the int value */ int i; int a = 0; for (i = 0; i <= strlen( S ) - 1; i++) { a = a + v( S[i] ) * (int)pow( 10, i ); } return a; } int main( void ) { /* test the value function */ printf( "value = %d\n", value( "1024" ) ); } |
The main function in this program calls the value function which, in turn, calls the v function. Initially, we will focus on the value function, taking for granted that the v function computes the value of each successive digit. If you run this program on a binary computer, it computes values in base 2 before the call to printf converts the result back to decimal. If you could run it on ternary computer, it would do its computations in base 3.
Notice that when you declare a C function, as in C++ and Java, you must declare the types of each parameter. The function v is declared to take the character d as a parameter, while the function value is declared to take a character string or undimensioned array of characters S. In C, the declarations char* S and char S[] are equivalent; they declare S to be a pointer to or the handle for an undimensioned array of characters.
As written, the value function tries, as much as possible, to follow exactly the summation formula given above. The for loop corresponds exactly to the summation in the formula, with the exact same bounds. Where we used digits(S) in the formula, we use strlen(S), and where the fomula says 10i, we write pow(10,i). Our program needs more #include directives in order to use strlen and pow. The file string.h is included to allow use of strlen to measure the length of a string, and math.h is included to allows use of pow to for exponentiation.
The pow function returns a floating point result, so we had to convert the result back to an integer using the type casting operator (int). The only good reason to use this approach is to make the code conform to the algebraic formula. The primary reason to avoid using pow here is that conversion back and forth between integer and floating representations is expensive/
If you try to compile the program above, you may get a long error message saying something like /usr/bin/ld: ... undefined reference to `pow'. This isn't an error message from the compiler, but from ld, the linker for C and C++ programs. It runs after the compiler, which found math.h just fine. The problem is, nobody told the linker where to look for the pow function. You solve this with the -lm command line option on the cc command for compiling C programs.
If you run the program above, you will quickly find out that it has a major flaw. The output for value("1024") is 4201 — the digits are backward! It would work just fine if the text were incorporated into Arabic or Hebrew text, where the text is printed from right to left, but we are using the Roman character set here.
i) Fix the code of value so it avoids re-evaluating the string-length with each iteration.
j) Fix the code of value so it avoids using an exponential function, but instead starts with a multiplier of 1 and multiplies this by 10 for each iteration.
k) Fix the code of value so it evaluates the digits of the
number in their conventional left-to-right order.
The formula given above applies equally well for computing the
abstract value of a binary number, with the simple
restriction that each bit or binary digit may take on only two values,
0 and 1, and that the base of the number system is 2 instead of 10. The
change of base means that we use 2i where we used to use
10i.
digits(S)-1 | ||
value(S) = | Σ | v(S[i]) × 2i |
i = 0 |
If we evaluate this formula using decimal arithmetic, as we would
typically do with pencil and paper, it is an excellent formula for
converting from binary to decimal. If we evaluate it using binary
arithmetic on a binary computer, it is a useful way to convert from
the textual representation of a binary number to the representation
of that number in a single binary integer variable. If we had
ternary computers, it would convert from the textual binary representation
to a single base 3 integer variable.
Users of decimal work with the one's place, the ten's place and the hundred's place. Similarly, users of binary work with one's place, the two's place, and the four's place. The list of powers of 2 will become very familiar as you study binary numbers:
i | 2i | i | 2i |
---|---|---|---|
0 | 1 | 8 | 256 |
1 | 2 | 9 | 512 |
2 | 4 | 10 | 1,024 |
3 | 8 | 11 | 2,048 |
4 | 16 | 12 | 4,096 |
5 | 32 | 13 | 8,192 |
6 | 64 | 14 | 16,384 |
7 | 128 | 15 | 32,768 |
8 | 256 | 16 | 65,536 |
i | 2i | |
---|---|---|
10 | 1,024 | =~ 1,000 = 1K (kilo) |
20 | 1,048,576 | =~ 1,000,000 = 1M (mega) |
30 | 1,073,741,824 | =~ 1,000,000,000 = 1G (giga) |
40 | 1,099,511,627,776 | =~ 1,000,000,000,000 = 1T (tera) |
Note that each power of two is twice the previous power of two, and note
the convenient coincidence that 210, 1024, is within a few
percent of 103, 1000. This allows quick estimates of the
magnitude of
binary numbers -- if you have a binary number with 10 bits, it will be
close to a 3-digit decimal number, while if it has 20 bits, it will be
close to a 6-digit decimal number.
l) What is 224? Several machines were once built with 24-bit words.
m) What is 232, in decimal, exactly. This is relevant on machines with a 32-bit word.
n) The maximum number that can be represented in 6-bits in binary
is 63 while 26 is 64. The maximum number that
can be represented in 2-digits in decimal is 99 while
102 is 100. What is the maximum number than can be
represented in n-bits, as a function of n?
To convert to decimal with pencil and paper, write the place values over each bit position, in decimal, then sum the product of each place value times the digit under it. For binary, this is easy because we only multiply by 0 or 1. For example, to convert 101010112 to decimal, we proceed as follows:
128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 |
101010112 = 128 + 32 + 8 + 2 + 1 = 171
To convert decimal to binary with pencil and paper, repeatedly divide the number by two, writing the integer part of the quotient below the number and the to the right. Repeat this until the quotient is 0, and the column of remainders will be the binary number, reading up from bottom. So, we can check our work from the previous example and convert 171 back to binary, hoping to get the binary value we started with, as follows:
171 | ||
85 | 1 | |
42 | 1 | |
21 | 0 | |
10 | 1 | |
5 | 0 | |
2 | 1 | |
1 | 0 | |
0 | 1 | 101010112 |
o) Flip a coin 8 times (or use any other random process) in order to make an 8-bit binary number. Convert it to decimal. Convert the result back to binary to check your work.
p) Write a little C function that takes two positive integer parameters n and b and outputs the digits of n in base b. In C, the / and % operators return the quotient and remainder.
How do we relate the characters used to represent digits to their abstract numeric values? When working with pencil and paper, we use rules we learned in elementary school. These may seem so natural that they pass unnoticed. With computers, we need to be very explicit. In the code given above, we used this conversion function:
int v( char d ) { /* given character d, a decimal digit, return the int value */ return d - '0'; } |
Here, we used the fact that the character codes for successive decimal digits are sequential in the ASCII character set. This is true in the vast majority of character sets, so we can get away with it.
We also used the fact that, in C, characters are an integer type so it is legal to do arithmetic on them. This is not true in several other languages. We could have been explicit about the conversion from character to integer by writing (int)d-(int)'0', using the casting notation of C.
There is another approach to this problem. Instead of writing v(d), a function call, we could have used v[d], array indexing. Some programmers get nervous when they see things like v['b'], which means the element of the array v selected by the character 'b'. In C, we can declare the array v as follows:
static const int v[128] = { 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 1, 2, 3, 4, 5, 6, 7, 8, 9, 0, 0, 0, 0, 0, 0, 0,10,11,12, 13,14,15, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0, 0 }; |
We declared v to be static. In C, this means that the declaration of v is visible only in this source file. By default, global variables are visible in all source files that are linked into the same final program. We declared v to be a constant, in order to prevent accidental changes. This tells the C compiler that it can give the linker permission to put the array in read-only memory. The initial value of the array is enclosed in braces.
By replacing a function call with an array lookup, we have actually made our program run a bit faster, and we have also created a lookup table that can handle digits that are not consecutive. The above table, for example, maps the character 'A' to the value 10, 'B' to 11, and so on, up to 'F', which is 15. This allows us to handle numbers up to base 16, the highest number base in common use today.
The difficulty with the array constant v is that you need to know the character encoding in order to fill it out. Each row in the array initialization given above corresponds to one column of the table of ASCII character codes given earlier in this chapter. This table only holds 128 entries, so it works for ASCII, but such tables become unwieldy and very difficult to construct for larger character sets. For example, extending it to support the subset of Unicode used by western alphabets (Roman, Greek, Cyrillic, Armenian, Hebrew and Arabic) would require an unmanagable table with 1920 entries.
q) Fix value.c to use the v array instead of the function used in the original.
r) Fix v so it also accepts lower-case letters as equivalent to their upper-case counterparts.
s) Fix v so it uses an illegal value such as 99 for all characters that are not legal digits.
Any positive integer radix r may serve as the base of a number system. The formula used above generalizes to:
digits(S)-1 | ||
value(S) = | Σ | v(S[i]) × ri |
i = 0 |
Note that, for any radix r, the string "10" in that radix has the value r. Thus, in binary, 10 means two, in decimal, the string 10 means ten, and in octal (base 8), the string 10 means eight. To avoid confusion, the the number in printed work is sometimes given as a numerical subscript to specify the base. The subscript is always in base 10. So, for example,
101102 = 268 = 2210 = 2113
As long as the radix r is ten or less, the conventional digits 0 to 9 can be used. For r greater than ten, additional digit symbols are needed. Historically, a variety of symbols have been used. For example, in base 12 also called duodecimal, the digits T for ten and E for eleven have been used. On ILLIAC I, base 16 was called sexadecimal, and they used the following digits:
0123456789KSNJFL
The most common convention for bases greater than ten is to use A for ten, B for eleven, C for twelve, and so on, up to Z. This allows bases as high as 36, although base 16, hexadecimal, is the highest common base. That is what we did in our v table defined above. The standard hexidecimal digits are:
0123456789ABCDEF
It is common to use bases 8 and 16 when working with binary numbers. This is the result of grouping binary digits in groups of three or four. We routinely group digits of long numbers in decimal as well. Usually, we group them into threes, so we write 1,234,567 instead of 1234567. In effect, this grouping of decimal digits shifts us to base 1000 with three character names for each digit and commas between the digits. So 1,234,567 has 1 in the million's place, 234 in the thousand's place, and 567 in the one's place.
Consider the representation of one thousand in binary, a fairly long number:
1000 = 512 + 256 + 128 + 64 + 32 + 8
100010 = 11111010002
We can group the digits of this number in order to improve readability. If we group the bits in groups of 3, we can treat each triplet of bits as an octal (base 8) digit, and if we group the digits in groups of 4, we can treat each group of bits as a hexadecimal digit:
100010 = 1,111,101,0002 = 17508
100010 = 11,1110,10002 = 3E816
For integers, regardless of the number base, we always group digits from the least significant end of the number, leaving the odd leftovers for the most significant bits.
The pencil and paper conversion algorithms given for binary actually work for any place-value number system. To convert to decimal from base r, write successive powers of r over successive digits, starting with 1 over the least significant digit, then multiply each digit by the power of r above it and sum the products. To convert a decimal number to base r, divide repeatedly by r, setting aside the remainders, until you get zero, and then write the remainders in order, last to first, as digits of the number.
Some programmers get so used to using the octal or hexadecimal number systems as abbreviations for binary numbers that they will refer to those bases as being binary numbers. This kind carelessness is fairly natural and common, but technically, it is wrong.
In C, decimal is the default number base, but octal and hexadecimal are also supported. The C notations for these number bases has influenced the notations used in C++, Java and Python.
In C, numeric constants that begin with zero are interpreted as octal, so 010 has the value eight, 01750 has the value one thousand and 09 is illegal. This choice of this notation was almost certainly a mistake on the part of the designers of C, since leading zeros are perfectly legal in decimal, but we are stuck with it.
Hexadecimal numbers in C have the prefix 0x, so 0x10 means sixteen and 0x3E8 means one thousand. The use of the 0x prefix in the original version of C has led some later languages to support 0o to mean octal.
In a printf format string, where %d means decimal, %o means octal and %x means hexadecimal.
As we have already seen, our assembly language, SMAL, supports binary representations of data to be assembled into memory, along with the use of the ASCII code for text. In fact, SMAL offers support for all of the common number bases and many uncommon ones. For example, the letter A, which is represented as 10000012, can be represented in SMAL in the following different forms, among many others:
B "A" ; ASCII B 2#1000001 ; binary B 3#2102 ; base 3 B 4#1001 ; base 4 B 8#101 ; base 8, octal B 10#65 ; base 10, decimal B 65 ; also decimal B 16#41 ; base 16, hexadecimal B #41 ; also hexadecimal |
In the above, note that the ASCII, decimal and hexadecimal representations have special privileges. For all other systems of representation, the number base must be given explicitly (in decimal), followed by a number sign, followed by the number in that base. In the case of decimal numbers, the number is given directly, while a pound-sign used without a number base as a prefix is used to represent hexadecimal.
For those who are curious, the SMAL assembler includes a general conversion
routine to handle multiple number bases.
This operates in any number
base up to 36. The limit of base 36 is set by the fact that there are 26
letters in the alphabet plus 10 decimal digits available for use as digits
in higher radix numbers.
t) Given the numbers 1b, 10b, 100b and 1000b, where b is the unknown number base, what are the decimal equivalents for these numbers, assuming that the base is 2? Assuming that it is 8? Assuming that it is 10?
u) The letters A through F are sufficient to spell many words, and we can augment them with misreadings of the digits 0 and 1 as the capital letters O and I to spell even more words. So, what decimal value, when converted to hexadecimal, spells the words BEEF, FEED, FOOD, and CODE? What are the corresponding binary values? What are the corresponding octal values?
v) Flip a coin 8 times, in sequence, and write down 1 for each
time it comes up heads, and 0 for each time it comes up tails in
order to make an 8-bit binary number. Convert that to decimal.
Convert the result to octal, and then replace each digit by it's
3-bit binary representation to get the original value,
then convert the result to hexidecimal and then replace each
digit by it's 4-bit binary representation to get the original value
again.
It is common, in the modern world, to hear that some nuclear weapon has an explosive yield equivalent to so many megatons of TNT, or that a disk has a capacity of so many gigabytes, or that a computer has so many megabytes of main memory. Units of measurement like the kilometer and the millimeter are also widely used. It is important to remember that the multipliers used for such units, multipliers like kilo- and milli-, have precise meanings that are defined by international standards as part of the SI system of measurement, otherwise known as the metric system:
1T | = | 1 tera | = | 1,000,000,000,000 | |
1G | = | 1 giga | = | 1,000,000,000 | |
1M | = | 1 mega | = | 1,000,000 | |
1K | = | 1 kilo | = | 1,000 | |
1 | = | 1 | = | 1 | |
1m | = | 1 milli | = | .001 | |
1µ | = | 1 micro | = | .000,001 | |
1n | = | 1 nano | = | .000,000,001 | |
1p | = | 1 pico | = | .000,000,000,001 |
In the world of computers, we have on-line disk subsystems with capacities measured in terabytes, or Tbytes, and we have CPU clock speeds measured in gigahertz or GHz. Small microcontrollers have memory capacities measured in kilobytes or Kbytes, and data communications lines may have lengths measured in kilometers or Km. We also have memory speeds measured in nanoseconds or ns, chip sizes measured in square millimeters or mm, and we have disk access times measured in milliseconds, ms. Note that capitalization is used to indicate whether the multiplier is greater than or less than one, and note the use of the Greek letter mu, µ standing for micro. For example, microseconds are frequently written as µseconds. It is handy to remember that the speed of light is remarkably close to one foot per nanosecond.
When someone tells you that their memory has a capacity of 100Kb, unfortunately, you are in trouble. Chip designers will tend to use this to refer to kilobits or Kbits, while computer system designers will expect it to stand for kilobytes or Kbytes. As a rule, spell it out if there is any doubt!
In the world of computer science, it is common to use one K or one kilo to refer to a multiplier is 1024, but in fact, this usage is only approximate. In the world of disks, it was once common to use strange mixed approximations, where mega means 1000×1024. To distinguish multipliers that are powers of 2 from multipliers that are powers of 10, it has been suggested that we should use the subscript 2 to indicate the common computer science approximation, while reserving the unsubscripted form for the exact SI prefix.
Of course, this leaves us with no convenient notation for the disk salesman's definition of a megabyte as 1000 times 1024. Fortunately, this absurd usage is in decline.
The International Electrotechnical Commission recommends that we use
the prefix kibibyte to refer to 1024 bytes, mebibyte
for 10242 bytes and gibibyte for 10243 bytes,
with the
abbreviations KiB, MiB and GiB, respectively. These
terms are an official standard, but there are other standards.
As Andrew Tannenbaum once said, "the great thing about standards is that
there are so many to chose from."
w) How long is a microfortnight, in seconds.
How heavy is a kilorod, in meters?
(Fortnights and rods are an archaic measures of time and distance.)
On computers, we generally are forced to work in terms of a fixed word size, so our numbers are composed of a fixed number of bits. Leading zeros are used to pad a number out to the desired width. Thus, on a machine with a 16-bit word,
On a 16-bit machine, the maximum representable number is
When deciding between octal and hexadecimal for compact representation of numbers, it is common to select one that divides words and bytes evenly, so for machines based on an 8 bit byte, hexadecimal seems a good choice, while on older machines with a 6 bit byte, octal made good sense.
There are exceptions to this. The Motorola 68000 and the DEC PDP-11
were 16-bit machines with 8-bit bytes, but on these machines,
the instruction encoding used a 4-bit field followed by four 3-bit fields
in each word.
The Intel 8080 was an 8-bit machine with a 2-bit field followed by two
3-bit fields in each instruction. On these machines, it was convenient
to print the contents of memory in octal so that the 3-bit fields stood out.
The Unix system and the C programming language were first implemented on the
PDP-11, explaining why the C notation for octal is more convenient than its
notation for hexadecimal.
To add two decimal numbers, for example, to add 132 to 39, we arrange the numbers as follows:
1 | 3 | 2 | augend | ||
+ | 3 | 9 | addend | ||
|
And then do the addition one pair of digits at a time, starting with 2+9. To add one pair of digits, we use an addition table that most of us should have memorized in elementary school. This same table would have been loaded in the memory of the IBM 1620 Cadet, which did arithmetic using a table-lookup instead of an adder built from digital logic:
+ | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
---|---|---|---|---|---|---|---|---|---|---|---|
0 | 0 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | |
1 | 1 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | |
2 | 2 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | |
3 | 3 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | |
4 | 4 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | |
5 | 5 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | |
6 | 6 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | |
7 | 7 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | |
8 | 8 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | |
9 | 9 | 10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | |
10 | 11 | 12 | 13 | 14 | 15 | 16 | 17 | 18 | 19 |
For the example, looking up 2+9 in the table gives a sum of 11; we split this sum into two pieces, the sum digit and the carry digit, writing the result as follows:
1 | carry | ||||
1 | 3 | 2 | augend | ||
+ | 3 | 9 | addend | ||
| |||||
1 | sum |
The next column creates a problem: Our addition table only allows adding two numbers, but there are three in that column. One solution is to use the same addition table, but to step down a row when we have to add in a carry. This is why an extra row was given in the above table. Using this trick, we get:
1 | carry | ||||
1 | 3 | 2 | augend | ||
+ | 3 | 9 | addend | ||
| |||||
7 | 1 | sum |
Finally, we add up the column in the hundred's place. For this example, this is trivial:
1 | carry | ||||
1 | 3 | 2 | augend | ||
+ | 3 | 9 | addend | ||
| |||||
1 | 7 | 1 | sum |
Recall that the value of a simple binary number, represented as a string S of digit symbols, is defined as:
digits(S)-1 | ||
value(S) = | Σ | v(S[i]) × 2i |
i = 0 |
This is the same place-value scheme we use for decimal numbers, except that the 1's, 10's and 100's places are now the 1's, 2's and 4's places. As a result, binary arithmetic strongly resembles decimal arithmetic. Binary addition can be done using exactly the approach described above, but with a much smaller addition table.
+ | 0 | 1 | |||
---|---|---|---|---|---|
0 | 0 | 1 | |||
1 | 1 | 10 | |||
10 | 11 |
This may seem too simple, but that is because the rules are the same as for decimal arithmetic or any other place-value number system. The only change is in the number of values each digit can take. This means that remembering things like the addition and multiplication tables is far easier in binary than in decimal. We pay a penalty, of course, because there are more binary digits in a particular number than there are decimal digits. It takes 4 binary digits to represent a quantity that would be comfortably represented in 1 decimal digit, and it takes ten bits to represent a 3 digit decimal number. Converting decimal to binary, adding the binary numbers using the above rules, and converting the binary sum back to decimal gives:
decimal | binary | ||||||||||||||||
---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|---|
1 | carry | 1 | |||||||||||||||
1 | 3 | 2 | augend | 1 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | ||||||
+ | 3 | 9 | addend | + | 1 | 0 | 0 | 1 | 1 | 1 | |||||||
| | ||||||||||||||||
1 | 7 | 1 | sum | 1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 |
If there is a limit on the number of bits allowed, as on a computer with a fixed word size, we say that an overflow has occurred if the carry out of the topmost bit of our fixed size word is nonzero. Note that this applies to words used to hold unsigned data. The conventions for detecting overflow are more complex if we are also interested in representing negative numbers.
x) Flip a coin 16 times to produce 2 8-bit binary numbers. Add them in binary, and then convert the original numbers to decimal, add them, and convert the result back to binary to check your work.
Subtraction in binary is as easy as subtraction in decimal. The rules are the same as in decimal, but with more digits and simpler tables for each digit. There is a major difference: Subtraction forces us to deal with negative numbers.
If we follow our habits from the world of pencil and paper, we can reserve the leftmost bit of each number to hold the sign. This is called the signed magnitude number system In this system, we usually use 0 to represent the + sign and 1 to represent the – sign. This means that, for positive values, the signed magnitude representation is the same as the unsigned representation. In 4 bits, this allows the following 16 values:
0 | 0 | 0 | 0 | +0 | 1 | 0 | 0 | 0 | –0 | |||||
0 | 0 | 0 | 1 | +1 | 1 | 0 | 0 | 1 | –1 | |||||
0 | 0 | 1 | 0 | +2 | 1 | 0 | 1 | 0 | –2 | |||||
0 | 0 | 1 | 1 | +3 | 1 | 0 | 1 | 1 | –3 | |||||
0 | 1 | 0 | 0 | +4 | 1 | 1 | 0 | 0 | –4 | |||||
0 | 1 | 0 | 1 | +5 | 1 | 1 | 0 | 1 | –5 | |||||
0 | 1 | 1 | 0 | +6 | 1 | 1 | 1 | 0 | –6 | |||||
0 | 1 | 1 | 1 | +7 | 1 | 1 | 1 | 1 | –7 |
The signed–magnitude system works, but unfortunately, it is harder to build hardware to add numbers in this system than in alternative systems. As a result, it is not commonly used for integer arithmetic today. This was not obvious to computer developers in the 1940s and 1950s; many early computers used the signed magnitude system. Today, the signed magnitude system is still used for floating point numbers.
An annoying feature of signed magnitude numbers is that positive zero and negative zero have different representations, making comparison for equality difficult.
To negate a signed magnitude number, you just flip the sign bit from one to zero or from zero to one. Adding is harder: You must either add or subtract the magnitudes, depending on whether the signs are the same or not, and then set the sign of the result according to a rather complex set of rules.
An alternative way to make negative numbers is to arrange things so that the negative of a number is obtained by reversing all the bits of the corresponding positive number. This leads to the one's complement number system, illustrated with the following table of 4-bit signed numbers:
0 | 0 | 0 | 0 | +0 | 1 | 0 | 0 | 0 | –7 | |||||
0 | 0 | 0 | 1 | +1 | 1 | 0 | 0 | 1 | –6 | |||||
0 | 0 | 1 | 0 | +2 | 1 | 0 | 1 | 0 | –5 | |||||
0 | 0 | 1 | 1 | +3 | 1 | 0 | 1 | 1 | –4 | |||||
0 | 1 | 0 | 0 | +4 | 1 | 1 | 0 | 0 | –3 | |||||
0 | 1 | 0 | 1 | +5 | 1 | 1 | 0 | 1 | –2 | |||||
0 | 1 | 1 | 0 | +6 | 1 | 1 | 1 | 0 | –1 | |||||
0 | 1 | 1 | 1 | +7 | 1 | 1 | 1 | 1 | –0 |
The operation of negating a one's complement number is called taking the one's complement of a number or one's complementing the number. So, the one's complement of 0000 is 1111 and the one's complement of 1111 is 0000. Note that negating a negative number produces a positive number, and visa versa. Also note that the one's complement operator is well defined even when other number systems are being used. The operator is defined in terms of its effect on the binary representation, not in terms of the interpretation of that representation.
Because C was designed for binary computers, it has a one's complement operator, the tilde sign ~. So, for 32-bit int values, ~0x0F3C5A1E is the same as 0xF0C3A5E1.
Warning: Many students get confused by the use of the term one's complement to mean both the name of the number system and the name of the operator used to negate numbers in that system! If someone says: "Take the one's complement of x," or "one's complement x" that means to apply the operator, reversing each bit in x; the number of bits in the result will be the same as the number of bits in x. If someone says: "write x as a 4-bit one's complement number," that means to use the number system!
As was noted above, adding two signed magnitude numbers is fairly difficult. In contrast, the rules for adding two one's complement numbers are simple, but far from obvious: Treat the most significant bit exactly like all the other bits, and compute the unsigned binary sum of the two numbers. Then, if this addition produces a carry out of the topmost bit, add one to the result. This is called an end around carry.
This rule for addition is so simple that, when building computer hardware to do one's complement arithmetic, subtraction is always done by adding the one's complement of the subtrahend to the minuend. One's complement has been used as the primary representation for integers on many computers, almost all designed prior to 1965.
Note that one's complement numbers and signed magnitude numbers share the problem with there being two representations of zero, +0 and –0. This is eliminated when two's complement numbers are used. In the two's complement system, numbers are negated by taking the one's complement and then adding one. This idea is far from obvious, but the result is the dominant number system used on today's computers.
0 | 0 | 0 | 0 | 0 | 1 | 0 | 0 | 0 | –8 | |||||
0 | 0 | 0 | 1 | +1 | 1 | 0 | 0 | 1 | –7 | |||||
0 | 0 | 1 | 0 | +2 | 1 | 0 | 1 | 0 | –6 | |||||
0 | 0 | 1 | 1 | +3 | 1 | 0 | 1 | 1 | –5 | |||||
0 | 1 | 0 | 0 | +4 | 1 | 1 | 0 | 0 | –4 | |||||
0 | 1 | 0 | 1 | +5 | 1 | 1 | 0 | 1 | –3 | |||||
0 | 1 | 1 | 0 | +6 | 1 | 1 | 1 | 0 | –2 | |||||
0 | 1 | 1 | 1 | +7 | 1 | 1 | 1 | 1 | –1 |
The two's complement operator is defined as take the one's complement and then add one. It is important to note that applying this operator twice to an n-bit number always produces the original value. So, for example, the two's complement of 0101 is 1011 and the two's complement of 1011 is 0101.
The two's complement number system has an annoying feature: It cannot representa the negation of the most negative number. In 4 bits, for example, the value –8 is represented as 1000, but there is no 4-bit two's complement representation of +8. If you apply the two's complement operator to 1000 you get 1000 back.
Despite this, two's complement arithmetic is the dominant system used on all modern computers. Among the advantages of this system are that there are no problems with multiple representations of zero. Applying the two's complement operator to 0000 gives 0000 after we discard the carry out of the high bit. Another advantage is that odd numbers always have a one in their least significant bit, whether they are positive or negative. Similarly, all two's complement numbers that are evenly divisible by 4 end in 00, and all that are evenly divisible by 8 end in 000, regardless of the sign.
The fundamental reason for the popularity of two's complement numbers is that arithmetic is trivial. To add two two's complement numbers, just treat all bits as magnitude bits, even the most significant bit that you might think of as a sign bit.
To subtract one two's complement number from another, one's complment the
number to be subtracted (the subtrahend), then add it to the other number
(the minuend) plus one. Using C notation (which is
also C++ and Java notation), where - is the two's complement
operator and ~ is the one's complement operator, the following
lines of code are all equivalent:
A - B |
A + -B |
A + (~B + 1) |
(A + ~B) + 1 |
Note that with two's complement subtraction, a carry of one into any bit
position means, in essence, don't borrow from this position. This may seem
an odd way of handling borrows, but it works.
Warning: As was the case with one's complement arithmetic, many students get confused by the use of the term two's complement to mean both the name of the number system and the name of the operator used to negate numbers in that system! If someone says: "Take the two's complement of x," or "two's complement x" that means to apply the operator; the number of bits in the result will be the same as the number of bits in x. If someone says: "write x as a 4-bit two's complement number," that means to use the number system, not negate the number.
The two's complement number system turns out to be a strange place-value
system where the "sign bit" has a value that is negative. Thus, in
addition to the obvious ways of converting two's complement numbers to
decimal (if negative, write down the minus sign, two's complement the number
so it is positive, then convert to decimal), we can do the
conversion using the sum of the place values. Here is an 8-bit example
showing both methods:
–128 | 64 | 32 | 16 | 8 | 4 | 2 | 1 |
1 | 0 | 1 | 0 | 1 | 0 | 1 | 1 |
101010112 = –128 + 32 + 8 + 2 + 1 = –85
= –(010101012) = –(64 + 16 + 4 + 1) = –85
When adding two two's complement numbers, overflow detection is not as simple
as looking for a carry out of the top bit. If the numbers being
added were of opposite sign, the sum must lie between them, so overflow is
impossible. If the numbers were of the same sign, overflow is
possible, and when it occurs, the sign of the result will differ from the sign
of the two numbers. Alternately, this turns out to be equivalent
to asking if the carry into the sign position is the same as the carry out of
the sign position. When adding negative numbers,
there should always be a carry into and out of the sign,
and when adding positive numbers, there should
never be a carry in or carry out of the sign.
As an aside, 2's complement numbers have a strong relationship to what is called the ten's complement number system in decimal, sometimes known as the odometer number system. To compute the 10's complement of a number, first take the 9's complement by subtracting every digit from 9, then add 1 to the result. So, –1, as a 6-digit 10's complement number, is represented by 999999, and –2 is 999998.
There are other number systems. For example, there are biased numbers, in which a fixed bias is added to the abstract value to guarantee a positive value before the value is represented in binary. The most natural bias for n-bit numbers is 2n–1. Here is an example, in 4 bits:
0 | 0 | 0 | 0 | –8 | 1 | 0 | 0 | 0 | 0 | |||||
0 | 0 | 0 | 1 | –7 | 1 | 0 | 0 | 1 | +1 | |||||
0 | 0 | 1 | 0 | –6 | 1 | 0 | 1 | 0 | +2 | |||||
0 | 0 | 1 | 1 | –5 | 1 | 0 | 1 | 1 | +3 | |||||
0 | 1 | 0 | 0 | –4 | 1 | 1 | 0 | 0 | +4 | |||||
0 | 1 | 0 | 1 | –3 | 1 | 1 | 0 | 1 | +5 | |||||
0 | 1 | 1 | 0 | –2 | 1 | 1 | 1 | 0 | +6 | |||||
0 | 1 | 1 | 1 | –1 | 1 | 1 | 1 | 1 | +7 |
Biased number systems seem odd, but they are commonly used in representing the exponent field for floating point numbers. Biased numbers have the advantage that sorting the representations into simple binary order puts the abstract values in numerical order. This is not true for any of the other systems we discussed. As with two's complement numbers, biased numbers always have at least one value that cannot be negated.
Curiously, when the bias used with an n-bit numbers is 2n–1, the biased representation of a value v is the same as the two's complement representation of v with the most significant bit inverted. So, for example, –8 as a 4-bit two's complement number is 1000 and as a 4-bit number biased by 8, it is 0000.
y) Flip a coin 8 times to produce an 8-bit binary number. What abstract value does this number represent in the signed magnitude system, the one's complement system, the two's complement system and the biased system with the natural bias of 128. Check your result by converting back to binary in each system.