## BCD Arithmetic, a tutorial
Part of
the Arithmetic Tutorial Collection
Copyright © 1999, Douglas. W. Jones, with major revisions made in 2002. This work may be transmitted or stored in electronic form on any computer attached to the Internet or World Wide Web so long as this notice is included in the copy. Individuals may make single copies for their own use. All other rights are reserved. |

- Introduction
- BCD Representations
- Packed BCD Arithmetic
- 6-bit BCD Arithmetic
- ASCII Arithmetic
- Hardware Support
- Other Number Bases

Rated as a top 50 site by Programmer's Heaven as of Feb 2002. |

Binary coded decimal arithmetic has fallen out of favor among modern hardware designers, and it is poorly supported in most modern programming languages, yet it is still occasionally appropriate. For example, when a numeric field of a record in a text file must be incremented, the methods presented here will be significantly faster than converting the textual value to a binary integer, incrementing, and then converting back to text.

For programmers with a devious sense of humor, these methods can be a very effective way to obfuscate code in languages such as C. These techniques are also historically significant. The DEC (later Compaq, then HP) Alpha architecture and the more recent Intel MMX extensions to the 80x86 architecture both supporting manipulation of vectors of small objects such as pixels packed into 64-bit words; the motive for this can be found in the much older ideas described here.

In the following sections, all arithmetic is assumed to be done in 32 bit registers (the algorithms are easily adapted to other word lengths), and neither carry out nor overflow are deemed significant. The code written for clarity, using C notation for operators and constants, with named temporary variables whenever a subexpression in the computation is cited in the text. Practical implementations will very likely use far fewer temporary variables.

All of the number representations discussed here are unsigned, although the subtraction operation rests on the use of ten's complement arithmetic, and this, in turn, suggests the use of the ten's complement operation for negation. In that case, number with a most significant digit greater than 5 can be interpreted as negative numbers.

For unsigned decimal numbers (or for 10's complement numbers with the same
sign), the conventional unsigned binary comparison operators all remain
applicable with these decimal represenations. Thus, if `A` and
`B` are unsigned decimal numbers, and if `>` is the
conventional unsigned greater-than operator for binary numbers,
`A>B` will yield the correct results for comparing these numbers.

Those interested in the origin of the methods describe here should read the very brief note and example showing how to add BCD numbers on a binary computer found on page 66 of the IBM 7040 and IBM 7044 Data Processing System Student Text, published by IBM in 1963. The 7040 family of computers offered no hardware support for BCD arithmetic and the material presented here on doing 6-bit BCD arithmetic is essentially the same as that presented in 1963.

The programmer interested in efficient BCD to Binary conversion should also
check out the code in appendix I.5 of the paper
"Superoptimizer
-- A Look at the Smallest Program"
by Henry Massalin,
*Proc. Second Int. Conf. on Arch. Support for Prog. Lang. and Op. Sys.
(ASPLOS II)*
published as *Computer Architecture News 15*, 5 (Oct. 1987).

Binary coded decimal numbers may be represented in a number of ways. The
most obvious is *packed BCD,* where each decimal digit is represented by a
4 bit field, and the digits are packed consecutively into a computer
word. This scheme allows 4 digits to be packed into a 16 bit word, or
8 digits into a 32 bit word, as shown below.

B31 B15 B0 |_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _| |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_| | D7 | D6 | D5 | D4 | D3 | D2 | D1 | D0 |In addition to its efficiency, this format is useful because it is compatable with the use of hexidecimal I/O routines. Traditionally, packed BCD was viewed as requiring the use of special hardware to do BCD arithmetic, but in fact, it is possible to add two packed BCD numbers using a short sequence of conventional binary and logical operators.

Many early computers used *6-bit BCD* codes, where each BCD digit was
padded to 6 bits. This was compatable with the 36, 48 and 60 bit words used
by many computers in the 1950's and 1960's, and on both the 36-bit IBM 704-709
series of machines (up through the 7044 and 7094) and 60-bit CDC 6600 series,
this format was used to do BCD arithmetic without any special hardware support.
On a modern 16 or 32 bit machine, this allows 3 or a bit more than 5 decimal
digits per word, respectively.

B31 B15 B0 |_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _| |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_| | D5|0 0| D4 |0 0| D3 |0 0| D2 |0 0| D1 |0 0| D0 |In the 32 bit format, productive use can be made of the fractional digit D5 for carry detection and propagation in extended precision arithmetic operations.

It should be noted that in most historically important 6-bit BCD collating sequences, all numerals other than zero were represented as shown above, while zero was sometimes represented by the binary code 1010, allowing the 6-bit code 000000 to be used for space. Ones in the two most significant bits of each 6-bit byte were used in the codes for letters and punctuation marks. See material on punched card codes for more detail on this subject.

The methods discussed here can be extended to other padding systems in obvious ways. For example, 6 5-bit fields can be packed into a 32-bit word, with one bit of padding between BCD digits. Alternately, 3 bits of padding can be added between digits, so that a 32-bit word holds 4 complete 7-bit fields, plus 4 bits that can be used for a 5th BCD digit.

Padding each BCD digit to fill an 8 bit byte is quite practical, but this
inefficient representation is not particularly interesting except in the
special case where each digit is directly represented by its
*ASCII code*, as follows:

B31 B15 B0 |_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _|_ _ _ _ _ _ _ _ _ _ _ _ _ _ _ _| |_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_|_| |0 0 1 1| D3 |0 0 1 1| D2 |0 0 1 1| D1 |0 0 1 1| D0 |Of course, the same approach can be used for EBCDIC, using

The least significant 7 digits of a packed BCD number may be tested for validity as follows:

valid(a) t1 = a + 0x06666666 t2 = a ^ 0x06666666 t3 = t1 ^ t2 t4 = t3 & 0x11111110 if t4 nonzero, a is invalidThis code deliberately ignores D7, the most significant digit, because the arithmetic operators presented here allow this digit to hold values up to 15. To understand this code, note that binary addition of 6 to each digit, to produce

`t2` is similar to `t1`, in the sense that exclusive-oring is
similar to adding. The two will be equal in bits where there was no carry
in to that position, but they will differ in bits where there was a carry in.
Therefore, `t3` holds a one wherever there was a carry into a bit of
`t1`, and `t4` holds a 1 wherever there was a carry into a BCD
digit of `t1`. Since the addition used to produce `t1` should
not produce such carries for a valid BCD number, `t4` should be zero!

valid(a) t1 = a + 0x06666666 t2 = t1 ^ a t3 = t2 & 0x11111110 if t3 nonzero, a is invalid

The above improvement was suggested by John Mertus at Brown University;
Here, note that adding 6 to a 4-bit nibble of `a` produces a carry
out of that nibble if the nibble is invalid, but also, adding 6 does not
change the least significant bit. By exclusive oring the result in `t1`
with `a` to make `t2`, the least significant bit of each nibble
is compared with the original. Finally, we ignore all the other bits and
demand that all the least significant bits be zero, indicating no carry from
the previous bit positions.

Antoine Schweitzer-Chaput suggested an alternative validity check to
me. Note that all BCD digits of the form 0xxx are valid, as are digits
of the form x00x. Therefore, if `d` is a BCD digit, we can test
it for validity with:

valid(d) if (d & 0x8) is zero or (d & 0x6) is zero

Extending this to packed BCD is a fun challenge.

The code to add two BCD numbers must solve the problem of propagating carries from one BCD digit to the next, and the logic for doing so rests on many of the mechanisms introduced above.

add(a,b) t1 = a + 0x06666666 t2 = t1 + b t3 = t1 ^ b t4 = t2 ^ t3 t5 = ~t4 & 0x11111110 t6 = (t5 >> 2) | (t5 >> 3) return t2 - t6Here, the addition used to form

In `t2`, the digits in each position that produced a carry will have
the correct value, but the digits in positions that did not produce a carry
will contain an excess 6. The problem, then, is to remove the extra 6 from
each digit that did not generate a carry.

Note that `t2` and `t3` differ wherever there was a carry,
so `t4` records all carries into positions in the sum. Therefore,
`t5` holds the inverse of each of the 7 carries between successive
decimal digits of the sum. This bit pattern is used to generate `t6`,
where each digit is either 6 where the corresponding digits of the sum did
not generate a carry or 0 where they did. Subtracting this from `t2`
produces the correct BCD sum.

Note that the most significant digit of the sum will exceed 9 if there should have been a carry out of that position. Furthermore, there is no easy way to detect this carry! Out of range values in the high digit may be of some limited use, but in general, the top BCD digit should only be used to accumulate carrys out of 7-digit BCD operands. If its use is limited in this way, the numbers will always be positive and will never produce a 2's complement overflow.

To subtract two numbers, 10's complement addition should be used. To compute the 10's complement of a number, subtract it digit-by-digit from 99999999 (decimal) and then add 1, as follows:

tencomp(a) t1 = 0xF9999999 - a return add( t1, 0x00000001 )Here,

tencomp(a) t1 = 0xF9999999 - a t2 = t1 + 0x06666666 t3 = t2 + 0x00000001 t4 = t2 ^ 0x00000001 t5 = t3 ^ t4 t6 = ~t5 & 0x11111110 t7 = (t6 >> 2) | (t6 >> 3) return t3 - t7This reduces to the following obscure bit of code, with variables renamed again:

tencomp(a) t1 = 0xFFFFFFFF - a t2 = - a t3 = t1 ^ 0x00000001 t4 = t2 ^ t3 t5 = ~t4 & 0x11111110 t6 = (t5 >> 2) | (t5 >> 3) return t2 - t6

All 5 full digits of a 6-bit BCD number may be tested for validity as follows:

valid(a) t1 = a + 006666666666 t2 = t1 & 006060606060 if t2 = 006060606060, a is validTo understand this code, note that binary addition of 066 to each digit, to produce

`t2` holds just the pad bits in the result from `t1`. Where no
carry was propagated in the computation of `t1`, these will be set,
while where a carry was propagated through the padding, they will be reset.
A valid 6-bit BCD number should not have generated any carry, so all of the pad
bits should be set!

The code to add two 5-digit 6-bit BCD numbers must solve the problem of propagating carries from one BCD digit to the next, and the logic for doing so rests on the mechanisms just introduced:

add(a,b) t1 = a + b + 006666666666 t2 = t1 & 006060606060 return t1 - (t2 >> 3)Here, the addition used to form

In `t1`, the digits in each position that produced a carry will have
the correct value, but the digits in positions that did not produce a carry
will contain an excess 6. The problem, is to remove this 6 from
each digit that did not generate a carry.

`t2` contains just the pad bits from the sum. For those digits that
did not produce a carry, the pad bits will still be set to one. For those
digits that produced a carry, the pad bits will be set to zero by the
propagagion of the carry.

Shifting `t2` 3 places right produces a 6 in each digit position that
did not generate a carry and zero elsewhere. Subtracting this from `t1`
produces the required correction!

A carry out from the 5-digit BCD sum will appear in partial digit, D5, that occupies the topmost 2 bits of the 32 bit number.

To subtract two numbers, 10's complement addition should be used. To compute the 10's complement of 5-digit 6-bit BCD number, subtract it digit-by-digit from 99999 (decimal) and then add 1, as follows:

tencomp(a) t1 = 001111111111 - a return add( t1, 000000000001 )Here,

tencomp(a) t1 = 001111111111 - a t2 = t1 + 000000000001 + 006666666666 t3 = t2 & 006060606060 return t2 - (t3 >> 3)Rearranging terms and renaming temporaries gives:

tencomp(a) t1 = (001111111111 - a) + 000000000001 + 006666666666 t2 = t1 & 006060606060 return t1 - (t2 >> 3)Which simplifies to:

tencomp(a) t1 = 010000000000 - a t2 = t1 & 006060606060 return t1 - (t2 >> 3)

To verify that a 4-digit ASCII string contains only valid BCD codes, techniques similar to those used for packed BCD and 6-bit BCD can be used:

valid(a) t1 = a + 0xC6C6C6C6 t2 = t1 & 0xF0F0F0F0 if t2 = 0xF0F0F0F0, a is validTo understand this code, note that binary addition of

`t2` holds just the pad bits from `t1`. Where no carry
was propagated in the computation of `t1`, these will be set, while
where a carry was propagated, they will be reset. A valid 6-bit BCD number
should not have generated any carry, so all pad bits should be set!

Validity checking is more complex if blank and zero are to be interchangable,
and more complex still if only leading blanks are allowed. On the other
hand, oring `0x10101010` with a potential number converts all blanks
to zeros, while also converting certain punctuation marks to other numerals.
In certain applications, this may be a reasonable way to deal with blanks.

The code to add two 4-digit ASCII numbers must solve the problem of propagating carries from one BCD digit to the next, and the logic for doing so rests on the mechanisms just introduced:

add(a,b) t1 = a + b + 0x96969696 t2 = t1 & 0x30303030 t3 = t1 - (t2 >> 3) return (t3 & 0x0F0F0F0F) | 0x30303030Here, the addition used to form

In `t1`, the digits in each position that produced a carry will have
the correct value, but the digits in positions that did not produce a carry
will contain an excess 6.

`t2` contains the two least significant bits of the padding from the
sum; these bits will be 1 where no carry was propagated, so they can be shifted
to subtract the excess 6 from the adjacent digit of the sum. This corrected
sum, in `t3` still contains the incorrect values in the pad fields.
No matter what we do, it seems to take two extra binary operations to restore
the pad fields to their proper values.

The carry out from the 4-digit ASCII sum can be extracted from the pad bits of any of the temporary variables used above, so no digits are wasted in extended precision ASCII arithmetic.

By all means, if your purpose is to write obfuscated code, and if your programming language allows it, substitute ASCII literals for the hex constants wherever possible, as follows:

add(a,b) t1 = a + b + 0x96969696 t2 = t1 & '0000' t3 = t1 - (t2 >> 3) return (t3 & 0x0F0F0F0F) | '0000'

To subtract two numbers, 10's complement addition should be used. To compute the 10's complement of a number, subtract it digit-by-digit from 9999 (decimal) and then add 1, as follows:

tencomp(a) t1 = 0x69696969 - a return add( t1, 0x30303031 )Here,

tencomp(a) t1 = 0x69696969 - a t2 = t1 + 0x30303031 + 0x96969696 t3 = t2 & 0x30303030 t4 = t2 - (t3 >> 3) return (t4 & 0x0F0F0F0F) | 0x30303030Combining terms, renaming, and simplifing the result gives the following strange bit of code:

tencomp(a) t1 = 0x30303030 - a t2 = t1 & 0x30303030 t3 = t1 - (t2 >> 3) return (t3 & 0x0F0F0F0F) | 0x30303030Of course, if your purpose is to write obfuscated code, by all means substitute ASCII literals for the hex constants wherever possible, creating:

tencomp(a) t1 = '0000' - a t2 = t1 & '0000' t3 = t1 - (t2 >> 3) return (t3 & 0x0F0F0F0F) | '0000'

Much of the complexity of the above code is imposed by computations that determine the carry out of each digit of the BCD sum. Hardware that allows the carry propagation from bit to bit of the adder to be interrupted at selected points can greatly speed these computations. Both the DEC (later Compaq) Alpha and the Intel MMX extensions to the 80x86 family provide for this, allowing 64 bit words to be divided into many subfields, with carry propagation from field to field interrupted, and with the carries captured in a form allowing their manipulation. These facilities are not exposed to the high level language programmer, but they are used intensively in carefully handcrafted assembly-language routines for common vector operations such as those involved in pixel manipulation of graphics images.

These techniques generalize to other number bases. For example, consider base 3 (ternary) numbers, represented in packed binary-coded ternary, where each trit (base 3 digit) is represented with 2 bits and we pack 15 trits into a 32-bit number. The algorithm for addition in this representation can be given as:

add(a,b) t1 = a + 0x15555555 t2 = t1 + b t3 = t1 ^ b t4 = t2 ^ t3 t5 = ~t4 & 0x55555554 t6 = (t5 >> 2) return t2 - t6

Unfortunately, as with BCD, we cannot use the highest bits to hold yet another digit because high-level-language arithmetic models do not allow us to capture the carry out of those bits.

This idea has been more fully developed here, where a fairly complete C library is given supporting both unsigned and balanced ternary arithmetic.

Thanks to Sergey Fik, who, on November 25, 2015 pointed out an error in the optimized version of the BCD ASCII ten's complement algorithm.