Operating Systems, Lecture 27

In the previous sections, there have been a number of references to storage allocation and to the sharing of storage resources between various uses; for example, free buffers are frequently shared by many device drivers in a queued input-output system, and storage for user programs is frequently dynamically allocated by the loader or job control system.

Users of Pascal should be familiar with storage management services such as are offered by the new and dispose standard library routines. C programmers should recognize corresponding library routines known as malloc and free. These straightforward storage management functions are typically hidden in object oriented languages, but in these languages, when the initializer for a class is called, it must obtain memory for the new class instance from somewhere, and when the destructor is called, the memory occupied by the instance being destroyed must go somewhere.

However the storage management routines of a programming environment are presented to the user, they are crucial tools in programs dealing with complex data structures such as lists or trees. Inside most operating systems, similar routines are used for such applications as allocating primary memory to hold segments in segmented virtual memory systems, allocating space for the code and variables of user programs, and allocating the system data structures themselves. The routines used to manage other memory resources, for example, disk space in file systems, may be quite similar, although latency considerations generally lead to special algorithms in those contexts.

The following bit of Java code creates a new object of class Widget on the heap, where the variable w is a reference to (or object handle for) that widget:

Widget w = new Widget( params );

The following bit of C code allocates a new structure of type Widget on the heap, where the variable w is a pointer to it.

struct Widget * w = (Struct widget *)malloc( sizeof( struct Widget ) );
WidgetInitialize( w, params );

Notice that this C code and the Java code do exactly the same thing, except that allocation and initializaiton are distinct from each other in C, and the C syntax is more cumbersome.

The storage region from which memory is dynamically allocated in response to arbitrary user demands is usually referred to as the heap. The term heap clearly indicates the relative disorder of the allocation and deallocation requests, as opposed, for example, to the orderly pattern of allocation and deallocation imposed by the use of a stack. This use of the term heap has no relationship to the use of the same term in the context of sorting or priority queues! The heapsort sorting algorithm and the skew heap queue implementation have absolutely nothing to do with the subject of this chapter!

The routines for allocation and deallocation from the heap are frequently referred to as the heap manager component of the system or language which they support. In the following sections, a number of heap manager implementations will be discussed; all will be based on the interface described in Figure 1.

p = allocate(s): returns p, a pointer to s bytes of storage or, in the event this cannot be done, returns NULL. Generally, a NULL return means that there is no free block of memory in the heap large enough to hold s bytes of data.
deallocate(p,s): given p, a pointer previously returned by allocate(s), returns the block of memory pointed to by p to the heap manager for future allocation; the value of s used in the call to deallocate must be the same as the value used in the call to allocate and it is an error to deallocate a particular block of storage more than one time or to try deallocate using a value of p that was not obtained by a call to allocate.

Figure 1. The user interface to a storage manager.

Note that both the allocate and deallocate services require that the size of the region being dealt with be passed as a parameter. In high level languages, such as C++, Pascal or Java, the compiler can infer the size of the memory region to be allocated or deallocated, so in these languages, the user never explicitly mentions the region size. Pascal implementations learn this from the type of the pointer variable, while C++ and Java implementations learn this from the class being instantiated.

Some implementations of deallocate can infer the size of the region being deallocated from the block itself, but this parameter will always be included in this discussion since not all implementations can find this information from the data structures of the heap itself. In C programs, the usual allocate call has the form (t*)malloc(sizeof(t)) where t is the type of the object to be allocated and the pseudo function sizeof(t) gives the size of an object of type t, measured in bytes. The call to malloc() returns an untyped pointer (in C, this of type (void*)), so the cast (t*) applied to the result of the funciton call converts the result to a pointer to an object of type t, as expected.

Actually, the casting is unnecessary in this context. In C, a value of type void* may be assigned to any pointer variable, and any pointer value may be assigned to a variable of type void*. Writing the cast makes the type conversion explicit instead of implicit. By using a void* variable as an intermediary, any pointer value can be assigned to any pointer variable without casting, but because casting is available, C type rules can always be violated, unlike the type rules of strongly typed languages such as Java.

It should be noted that, although it is an error to deallocate the same block of storage more than once or to deallocate a block which was never allocated, these errors are very hard to detect. No code is included in the examples presented here to detect these errors, and most real systems are not much better. Typically, these errors are not detected immediately; instead, the errors corrupt the internal data structures of the heap, and only later does this corruption lead to a detectable error. This, in turn, provides ample justification for keeping the application program's data structures in a separate heap from the operating system, if this were not done, it would be too easy for an error in an application to corrupt the system's data structures. If each the has its own heap, we can reinitialize that heap whenever we start a new application, thus preventing the corruption of the heap from spreading from one application to another.

The bulk of this chapter is structured as a survey of heap management schemes. All of them are general purpose, in that they do not impose any ordering constraints on allocations and deallocations. For example, we do not impose a last-in first-out or LIFO constraint the way stack allocation does, nor do we constrain a first-in first-out or FIFO constraint the way a bounded buffer does.

Fixed Size Blocks

The simplest general purpose dynamic allocation mechanism of all requires that all of the available storage be divided into equal sized blocks. When this is done, the size field on the allocate and deallocate requests is not very meaningful, since only one size is available. If the user requests more than this amount, the request will fail; if the user requests less and there are any free blocks, the request will be successful and an entire block will be allocated no matter how little memory the user actually requests.

The heap manager must maintain enough information to determine which blocks in the heap are free at any time. Typically, this information will be stored in the form of a linked list, where the first word of each free block holds the address of the next free block. As with chained resolution of forward references (see Figure 4.10 through Figure 4.14), there is a problem with representing the end of the free list. Throughout this section, this problem will be ignored, and the symbol NULL, standing for an illegal or reserved pointer value, will be used; it should be remembered that there are occasional contexts where there are no reserved addresses, requiring a more complex solution to the problem of marking the end of a list.

When writing a heap manager in assembly language or other low-level languages, where there is no difference between pointers and integers, it is natural to think of memory as one big array in which it is perfectly appropriate to do arithmetic on array indices. In higher level languages, where the type system discourages or even prevents this, we could take two approaches to explaining such algorithms. One approach is to explain the algorithm in terms of a large array of integers, M, with integer memory addresses. Another is to use pointers with carefully documented and very unsafe conversions between integer and pointer types.

In C, for example, *(int*)a is a pointer to the integer at the memory address a, while *(char*)a refers to the character stored at the same memory address. The variable a used in these examples could have been an integer or a pointer; in either casem, it holds the same memory address. In most C implementations, the integer occupies 4 successive bytes of memory starting at this address, while the character occupies only one byte at this address.

Using these conventions, C code to allocate from a free list on the heap can be written as shown in Figure 2.

typedef void * pointer; /* used for untyped pointers */

pointer freelist; /* pointer to the first free block in memory */

pointer allocate( int size )
{
     if (size > blocksize) {
          error( "size too big" );
          return NULL;
     } else if (freelist == NULL) {
          error( "no space available" );
          return NULL;
     } else {
          pointer block;
          block = freelist;
          freelist = *(pointer *)freelist;
          return block;
     }
}

void deallocate( pointer block, int size );
{
     if (block != NULL) {
          *(pointer *)block = freelist;
          freelist = block;
     }
}

Figure 2. Fixed block size allocation.

Although fixed size block allocation is presented here primarily to set the stage for more interesting allocation algorithms, there are many places where it is the most appropriate approach. When memory is divided into fixed length pages (or sectors on disk), and a non-contiguous allocation scheme is being used, fixed size blocks are clearly appropriate. Fixed size blocks are also appropriate for those applications where all requests will be for the same size block. This latter situation shows up on many systems where the only dynamically allocated storage is for input-output buffering, and all sectors are of the same size. More commonly, there will be two or three common request sizes; perhaps a small request for device control blocks, and a large one for input-output buffers. When this is the case, fixed block size allocation can still be used, but only by maintaining a separate heap for each size.

One disadvantage of a fixed block size is that, if a program needs to create a data structure larger than the block size, the program must use multiple blocks for this structure. Programmers who remember the early days of Microsoft's DOS and Apple's MacOS are likely to remember a time when the largest memory incrment the operating system would allocate to an application was 64K bytes. In those settings, many programmers were forced to write awkwardly structured programs to manipulate such things as databases, images or other large objects.

Another disadvantage of a fixed block size is that, if a program needs blocks smaller than the fixed size, the standard size will be allocated anyway. As a result, each user object will be stored in a container larger than necessary, wasting space. In each block of memory allocated to a user, we refer to the unused space as a fragment of what ought to be free space. Because this fragment is inside an allocated block, we refer to this problem as internal fragmentation of the available free space. In general, any storage allocation mechanism which will, on occasion, allocate blocks larger than the requested size is said to cause some degree of internal fragmentation.

An effective measure of the extent to which a storage allocation algorithm leads to internal fragmentation is the percent of allocated space which goes unused because it is contained in internal fragments. If the block sizes requested by users are uniformly distributed between one and the allocated block size, the fixed size block mechanism described in this section would result in about 50% waste due to internal fragmentation. In fact, a uniform distribution of block sizes is quite uncommon! Most real problems only require a few well defined block sizes.

The Buddy System

One way of dealing with internal fragmentation is to allow a variety of block sizes. Blocks of each size can be allocated and deallocated by the use of a fixed size block allocate and deallocate mechanism such as that illustrated in Figure 2, and if a block of one size is not available, a larger block can be allocated and a block of the desired split off of it. When this is done, all blocks resulting from splitting a particular block are called buddies, and the block from which they were split is called their parent. The resulting storage allocation mechanism is said to use a buddy system. All buddy systems maintain an array of lists of free blocks, where all blocks in each list are the same size, and the array is indexed by a value computed from the size.

The oldest buddy system, the binary buddy system has block sizes that are powers of two. Therefore, when a block is split, it is split exactly in half, and when blocks are combined, two
equal size blocks are combined to make a block twice as big. With the binary buddy system, we arrange things so that blocks of size 2ⁿ always begin at memory addresses where the n least significant bits are zero. Thus, blocks of size 1 (2⁰) may begin at any address, but blocks of size 2 (2¹) may only begin at even addresses, and blocks of size 4 (2²) only begin at addresses with the least significant 2 bits equal to zero.

The constraints on the block addresses in the binary buddy system have an interesting consequence. When a block of size 2ⁿ⁺¹ is split into two blocks of size 2ⁿ, the addresses of these two blocks will differ in exactly one bit, bit n, using the counting scheme that numbers bits starting with 0 at the least significant end. Thus, given a block of size 2ⁿ at address a, we can compute the address of its buddy, the other half of the block from which it was split, by exclusive-oring a with 2ⁿ. This leads to the allocate routine shown in Figure 3.

typedef void * pointer; /* used for untyped pointers */

/* pointers to the free space lists */
pointer freelists[sizes];

/* blocks in freelists[i] are of size 2**i. */
#define BLOCKSIZE(i) (1 << (i))

/* the address of the buddy of a block from freelists[i]. */
#define BUDDYOF(b,i) ((pointer)( ((int)b) ^ (1 << (i)) ))

pointer allocate( int size )
{
     int i;

     /* compute i as the least integer such that i >= log2(size) */
     for (i = 0; BLOCKSIZE( i ) < size; i++);

     if (i >= sizes) {
          error( "no space available" );
          return NULL;
     } else if (freelists[i] != NULL) {

          /* we already have the right size block on hand */
          pointer block;
          block = freelists[i];
          freelists[i] = *(pointer *)freelists[i];
          return block;

     } else {

          /* we need to split a bigger block */
          pointer block, buddy;
          block = allocate( BLOCKSIZE( i + 1 ) );

          if (block != NULL) {
              /* split and put extra on a free list */
              buddy = BUDDYOF( block, i );
              *(pointer *)buddy = freelists[i];
              freeliests[i] = buddy;
          }

          return block;
     }
}

Figure 3. The binary buddy system.

The code for allocate shown in Figure 3 is recursive! There are two terminating conditions. As shown, recursion terminates when a sufficiently large block of free space is available; in this case, either that block is returned directly to the user, or it is split, perhaps repeatedly, and some piece of it is returned. If there is no sufficiently large block available, the algorithm terminates with an error. The same error message results when there is not enough free space as when the user requests a block larger than the largest allowed, because the recursion keeps asking for larger and larger blocks!

The code shown in Figure 3 uses purely integer arithmetic to take the log of the requested block size. This is almost always faster than converting the block size to floating point, taking the log, and then rounding up to the nearest integer. Note that, in C, 1<<i is a very fast way to compute 2ⁱ. Also, note that the exclusive-or of variables a and b in C is a^b.

The deallocate routine for the binary buddy system is also recursive, as shown in Figure 4.

void deallocate( pointer block, int size )
{
     int i;
     pointer * p;
     pointer buddy;

     /* compute i as the least integer such that i >= log2(size) */
     for (i = 0; BLOCKSIZE( i ) < size; i++);

     /* see if this block's buddy is free */
     buddy = BUDDYOF( block, i );
     p = &freelists[i];
     while ((*p != NULL) && (*p != buddy)) p = (pointer *)*p;

     if (*p != buddy) {

          /* buddy not free, put block on its free list */
          *(pointer *)block = freelists[i];
          freeliest[i] = block;

     } else {

          /* buddy found, remove it from its free list */
          *p = *(pointer *)buddy;

          /* deallocate the block and its buddy as one block */
          if (block > buddy) {
              deallocate( buddy, BLOCKSIZE( i + 1 ));
          } else {
              deallocate( block, BLOCKSIZE( i + 1 ));
          }
     }
}

Figure 4. The binary buddy system, continued.

The code in Figure 4 will either find the block's buddy directly above or below it in the heap, so there are two possible recursive calls to deallocate, one dealing with the former case and one with the latter. The recursion will continue so long as we keep finding buddies to combine with the successively larger blocks. Eventually, we must reach the point where the block's buddy is not found in a free list, and at that point, the block goes in that free list with no further recursion.

In the binary buddy system, each block has exactly one buddy, the identity of which can be determined directly from the address and size of that block. At the start, all of memory, or at least, all of the memory used for the heap, may be a single block, which is then subdivided as

required. Only when the user deallocates the very last block would the buddy system be able to reconstruct this block. Figure 5 illustrates this for a heap of 128 bytes starting at address 0.

 _______________________________________________________________
|_______________________________________________________________|
 0                                                             128
                                        p1 = allocate(24);
 _______________________________________________________________
|\\\\\\\\\\\|///|_______________|_______________________________|
 0              32              64                             128
 p1
                                        p2 = allocate(12);
 _______________________________________________________________
|\\\\\\\\\\\|///|\\\\\|/|_______|_______________________________|
 0              32      48      64                             128
 p1             p2
                                        p3 = allocate(24); deallocate(p1,24)
 _______________________________________________________________
|_______________|\\\\\|/|_______|\\\\\\\\\\\|///|_______________|
 0              32      48      64              96             128
                p2              p3
     ______________________________________
Key |______|\\\\\\\\\\\\\|/////////////////|
     free   allocated     internal fragment

Figure 5. An example of the binary buddy system in use.

This example illustrates three things: First, the successive subdivision of blocks as requests are made, second, the internal fragmentation resulting from allocating more than was requested, and third, the external fragmentation resulting from the breaking up of the free space into multiple fragments. By the end of the example, there are two free blocks of size 32, but there is no way to combine these blocks to get a block of size 64 should one be needed.

The cost of external fragmentation is not as easy to measure as that of internal fragmentation, but it can be partially characterized in terms of such statistics as the average number of free blocks and the distribution of free block sizes.

The two blocks of size 16 starting at addresses 32 and 48 in Figure 5 are buddies. Should the block at address 32 be deallocated, these two must be combined into a new block of size 32 starting at address 32. This new block is the buddy of the block at address 0, and since that block is also free, they must be combined to make a block of size 64. The two blocks of size 32 starting at addresses 64 and 96 are also buddies.

What if the initial heap does not begin at zero? What if the size of the initial heap is not a power of 2? The answer used with the binary buddy system is fairly straightforward. The rules already given define what addresses are legal for each block, so if we start at a random address, we must follow those rules, breaking the initial heap into whatever size blocks work. Consider a heap starting at address 100 and running to address 274, as illustrated in Figure 6.

 _________________________________  ___________________
|___|_______|_______________|___   __|_______________|_|
100 104     112             128      256             272
  4     8           16          128          16       2

Figure 6. The initial division of an odd heap into free blocks.

The heap shown in Figure 6 begins with one free block of size 2 (at address 272), one of size 4 (at address 100), two of size 16, and one of size 128. These blocks have no buddies in the heap, so they can never be combined with anything else, but they may be allocated to users.

Some aspects of the performance of the binary buddy system are easy to determine. For example, if over a long period of time, requests for all possible block sizes are evenly distributed, the average allocated block will be about one third larger than required (one quarter of each block will typically be wasted, three quarters may be in use). This is because blocks of size s will only be allocated to satisfy requests for between s/2 and s words of memory. This waste, which represents internal fragmentation, could clearly be reduced if there were a larger assortment of available block sizes. The nature of the external fragmentation is harder to determine without an experimental measurement, and in any case, the assumption of uniformly distributed requests for different block sizes is very unrealistic. In fact, programmers frequently prefer powers of two and powers of 10 when deciding on the sizes of different data structures!

The Fibonacci series provides the basis for a buddy system variant that reduces the degree of internal fragmentation by providing a wider variety of free block sizes. Each member of the Fibonacci series is the sum of the two previous elements, as shown in Figure 7.

    i : 0   1   2   3   4   5   6   7   8   9  10  11 ...
fib(i): 0   1   1   2   3   5   8  13  21  34  55  89 ...

i < 2: fib(i) = i

i > 2: fib(i) = fib(i-1) + fib(i-2)

Figure 7. The Fibonacci Series

Of course, we are not interested in blocks too small to hold a pointer, so if memory is byte addressed, and if pointers are 32 bits each, the smallest block size that we will use with this series is 5 bytes. In the limit, the ratio of successive numbers in the Fibonacci series approaches the golden ratio (approximately 1.61803).

When a block is split using the Fibonacci buddy system, the fragments which result are not the same size; thus, for example, when a block of size 55 is split, the results are of size 21 and 34. Figure 8 illustrates a sequence of operations using the Fibonacci buddy system.

  ______________________________________________________
 |______________________________________________________|
 0                                                      55
                                    p1 = allocate(15);
  ______________________________________________________
 |\\\\\\\\\\\\\\|/////|_________________________________|
 0                    21                                55
 p1                                 p2 = allocate(10);
  ______________________________________________________
 |\\\\\\\\\\\\\\|/////|\\\\\\\\\|//|____________________|
 0                    21           34                   55
 p1                   p2
                                    deallocate(p1,15); p3 = allocate(6);
  ______________________________________________________
 |\\\\\|/|____________|\\\\\\\\\|//|____________________|
 0       8            21           34                   55
 p3                   p2
     ______________________________________
Key |______|\\\\\\\\\\\\\|/////////////////|
     free   allocated     internal fragment

Figure 8. An example of the Fibonacci buddy system in use.

With the Fibonacci buddy system, there is no simple formula for deriving the buddy of a block from information about its size and location. In Figure 8, there are two blocks of size 13; one of these has a buddy of size 8 below it, while the other has a buddy of size 21 above it. Thus, the code for the Fibonacci buddy system will necessarily be more complex than that for the binary buddy system. The additional cost of the Fibonacci buddy system is offset by the reduction in internal fragmentation resulting from having a more diverse assortment of block sizes than the binary buddy system.

Boundary Tags

Neither of the two buddy systems presented above completely eliminates internal fragmentation, and both have external fragmentation problems which were not present when fixed size blocks were used. The alternative is to try to allocate exactly the amount of storage requested, thus eliminating internal fragmentation. As with buddy systems, this will clearly involve splitting free blocks to make blocks of the desired size on allocation, and it will involve merging deallocated blocks with free neighbors to make larger blocks. Unlike the buddy system, however, there are a potentially unlimited number of different block sizes, so it is impractical to use different freelists to store each different size of free block. Furthermore, there is no restriction on the sizes of two neighboring blocks when it comes time to merge them. All that matters is that they are both free, but there is no way to compute the address of the neighboring block from the address and size of the block itself.

As a result of these limitations, new data structures must be introduced to allow allocation of arbitrary sized blocks. These structures must include a way to identify the neighbors of any block given only a pointer to that block, and they must include enough information to determine the size and status of any block given a pointer to it. Two common ways of conveying this information are shown in Figure 9.

a) Tags at each end

 ___________________________________________________________________
|______|________|___________________________________|______|________|
| size | status | data                              | size | status |
                 ^
                 | user's pointer to block

b) Tag at one end

 
 <---            ----->         | user's pointer to block
 ____|__________|_______________v___________________________________
|____o_____|____o_____|________|____________________________________|
| previous | next     | status | data                               |

Figure 9. Descriptions of a block of memory.

In approach a illustrated in Figure 9, each block is bracketed with size and status of that block, thus allowing one end of any block to be found from the other, and allowing the status of a block to be inspected from either end. In approach b, each block is prefixed with its status and with the address of each of its neighbors.

Both block description schemes given in Figure 9 are really equivalent, since a field holding the size of a block may be thought of as a relative pointer. In effect, both schemes place two pointers and status information in between each pair of consecutive blocks on the heap. In both cases, the effect is to establish a doubly linked list of blocks, with status information associated with each block. In general, descriptive information associated with a block of data is called a tag, and because these tag are stored in the boundaries between adjacent blocks in the heap, heap management algorithms based on these approaches are called boundary tag algorithms.

When boundary tags are used, a separate free space list is not needed, since a search of the list of all blocks in the heap can always find all free blocks. However, when the storage utilization is high, only a small fraction of the blocks will be free, so using a separate free space list can considerably speed up allocation. In boundary tag systems, the structure of the free space list may be complicated by the need to delete from the middle of the list, for example when a block is merged with its free neighbor. In the buddy system, the free neighbor was always found by searching the freelist, so this deletion was not difficult. With boundary tags, though, neighbors are found directly from the main list, so a doubly linked free list may be needed to allow deletion from mid list!

There are two basic approaches to allocation using a boundary tag system, first fit and best fit. The buddy system used best fit; that is, the smallest free block large enough to satisfy a request was always the one allocated. Multiple free lists sorted by block size made this easy. With only one free list, it is more natural to use the first block found that is large enough to satisfy the user, whether or not some other block would have given a better fit. Best-fit allocation may sometimes be better than first-fit, but it is significantly harder to implement.

The choice between first-fit and best-fit allocation depends on the distribution of request sizes, and this distribution depends on the application, so a rational choice between the two approaches is not always possible until after an application is fully developed. In general, best fit allocation is ideal when an exact fit is common; and it is particularly bad if exact fits are uncommon because it tends to minimize the size of the free fragments produced, and this, in turn, minimizes the likelihood that these fragments will be of any future use. Thus, when requests are distributed over a broad range of different block sizes, first fit is the method of choice.

The typical boundary tag system discussed in the remainder of this section uses first fit allocation with a free space list stored in the data part of deallocated blocks. The abstract data structure of our example boundary tags is described in Pascal and C in Figure 10.

Pascal:

type blockref = ^tag;
     tag = record
               back, next: blockref;
               case status: (free, used) of
                 used: (
                         -- details left to the user --
                       )
                 free: (
                         freeback, freenext: blockref;
                       )
           end;

typedef struct tag * blockref;

struct tag {
    blockref back, next;
    enum { free, used } status;
}

struct freetag {
    blockref back, next;
    enum { free, used } status; /* must be free */
    blockref freenext, freeback;
}

Figure 10. Declarations for a boundary tag.

Packing blocks with the format given in Figure 10 in memory would result in a data structure such as is shown in Figure 11.

              |       |
              |-------|  |
        back ||   o---+--   <--
             ||-------|        |
tag     next ||   o---+-----   |
             ||-------|     |  |
      status ||  used |     |  |
              |-------|     |  |
              |       |     |  |
              | user  |     |  |
              |  data |     |  |
              |       |     |  |
              |       |     |  |
              |-------|  <--   |
        back ||   o---+--------
free         ||-------|      
tag     next ||   o---+--    
             ||-------|  |   
      status ||  free |  |   
             ||-------|  |   
    freeback ||   o---+--+--->
             ||-------|  |
    freenext ||   o---+--+--->
              |-------|  |
              | free  |  |
              |  space|
              |       |

Figure 11. Blocks packed in memory.

As illustrated in Figure 11, pointers in most languages point to the first byte of the referenced items. In the heap management context, however, we would like the user's pointers to an allocated block to point to the user's data and not the tag, and it is common to generalize on this by having all system pointers also point to the that location, just beyond the tag.

The size of blocks built as shown in Figures 27.10 and 27.11 can be computed using arithmetic on the pointers to a block and its neighbor. Figure 12 illustrates this and gives the conversion functions between pointers to blocks and pointers to their tags.

#define TAGSIZE       sizeof( struct tag )

#define TAGREF(p)     ((struct tag *)((int)(p) - TAGSIZE))
#define FREETAGREF(p) ((struct freetag *)((int)(p) - TAGSIZE))

#define BLOCKSIZE(p)  (((int)(TAGREF(p)->next) - (int)(p)) - TAGSIZE)

Figure 12. Supporting functions for boundary tag code.

The code for BLOCKSIZE given in Figure 12 assumes that the next field of a tag will never be NULL; we will assure this by marking the upper end of the heap with an endless dummy block that is allocated to the system. The important thing to note is that none of our code will ever request the size of blocks which are not marked as free, and the dummy block is allocated so its next field is never used.

We can eliminate similar boundary problems at the ends of the free space list by making it circular, eliminating its ends. This still requires a special case for an empty free list; in that case, the head pointer will be NULL. Circular linkage mildly complicates terminating unsuccessful searches because there is no null pointer at the end. Circular free lists introduce the interesting possibility of rotating the free list to even out the distribution of free space of different sizes. If the free lists is not rotated, small blocks will tend to accumulate near the head; for some distributions of request sizes, this may harm performance.

As with the buddy system, the basic steps in allocating a block are to find a large enough block, remove it from the free list, split it if it is too big, return any excess to the free list, and give the user the remainder. Thus, allocation both removes from and adds to the free list. Figure 13 shows the basic operations on the free list:

blockref freelist;

blockref removefree( blockref b )
/* removes block b from the free space list */
{
     blockref back, next; /* pointers to neighbors of b in free list */

     back = FREETAGREF(b)->freeback;
     next = FREETAGREF(b)->freenext;
     if (back == b) {
          /* b is the only free block */
          freelist = NULL;
     } else {
          FREETAGREF(back)->freenext = next;
          FREETAGREF(next)->freeback = back;

          /* the following line rotates the freelist */
          freelist = next;
     }
 
     FREETAGREF(b)->status = used;
}

void returnfree( blockref b )
/* returns block b to the free space list */
{
     blockref next;

     FREETAGREF(b)->status = free;
     if (freelist == NULL) {
          /* trivial case */
          freelist = b;
          FREETAGREF(b)->freenext = b;
          FREETAGREF(b)->freeback = b;

     } else {
          /* add block to the free list */
          next = FREETAGREF(freelist)->freenext;
          FREETAGREF(b)->freenext = next;
          FREETAGREF(b)->freeback = freelist;
          FREETAGREF(freelist)->freenext = b;
          FREETAGREF(next)->freeback = b;
     }
}

Figure 13. Free space list management for boundary tag allocation.

Note that, in the code given in Figure 13, rotating the free list is very easy; not doing so would require a special case to take care of the possibility that the block being removed from the free list is also the one pointed to by the head pointer for the free list.

When possible, blocks which are larger than requested should be split before being returned to the user. This can only be done if the split off part of the block is large enough to hold a new boundary tag as a prefix on a block holding a free space list entry. For the block structure used in this example, the boundary tag takes 2 pointers and one status word or byte, and a free block requires an additional 2 pointers. On most machines, therefore, it will not be possible to split a block unless the excess is at least large enough for 4 pointers plus a status word or byte.

The fact that we cannot split blocks if the fragment being split off is smaller than some minimum implies that we have failed to completely eliminate internal fragmentation! We have merely succeeded in limiting the maximum size of an internal fragment. This maximum is small enough to be irrelevant to applications where the average size of blocks requested by the user is very large, but for applications such as list processing, where user requests are typically for huge numbers of very small blocks, this can be a serious problem.

The 2 pointers and a status byte or word making up the boundary tag itself can also be considered as being equivalent to an internal fragment of free space, since this space is allocated to the user but it must not be modified by the user! Thus, systems such as the buddy system will always be better than the boundary tag approach if their average internal fragment is smaller than the tag size itself. Since the binary buddy system is expected to waste 25 percent of the allocated memory to internal fragmentation, this suggests that the buddy system is the preferred method for allocation of blocks smaller than about 12 words.

#define MINSPLIT sizeof( struct freetag )

void split( blockref b, int s )
/* split a free block from b so that what remains is at least size s */
{
     blockref new, next;

     /* assume that BLOCKSIZE(b) >= s */
     if (BLOCKSIZE( b ) >= (s + MINSPLIT) {
          new = (blockref)((int)b + s + TAGSIZE);
          next = TAGREF(b)->next;
          TAGREF(new)->next = next;
          TAGREF(new)->back = b;
          TAGREF(b)->next = new;
          TAGREF(next)->back = new;
          returnfree(new);
     }
}

Figure 14. Code to split oversize blocks down to size.

Most of this code simply creates a new boundary tag and links it into the list of tags. The most complex aspect of this code involves the test to see whether a split is possible or whether the excess part of the block should be retained as an internal fragment. Note that the part of the block which has been split off is returned immediately to the free space list, and that the returnfree procedure is responsible for marking it as a free block. This provides the foundation needed to write the actual code for boundary tag allocation, as shown in Figure 15.

void * allocate( int s )
{
     blockref b;

     if (freelist == NULL) {
          error( "free space exhausted" );
          return NULL;
     } else {
          b = freelist;
          for (;;) {
               if (BLOCKSIZE( b ) >= s) {
                    removefree( b );
                    split( b, s );
                    return (void *) b;
               }
               b = FREETAGREF(b)->freenext;
               if (b == freelist) {
                    error( "size too big" );
                    return NULL;
               }
          }
     }
}

Figure 15. Code for allocation using boundary tags.

Deallocation is easily described in terms of a merge routine, as shown in Figure 16.

void merge( blockref b )
/* merge b and its high neighbor if both are free */
{
     blockref next;

     if (TAGREF(b)->status != free) return;
     next = TAGREF(b)->next;
     if (TAGREF(next)->status != free) return;

     /* b and next are free, so we can merge them! */
     removefree(next);
     next = TAGREF(high)->next;
     TAGREF(high)->back = b;
     TAGREF(b)->next = high;
}

void deallocate( blockref b, int s );
{
     returnfree(b);
     merge(b);
     merge(TAGREF(b)->back);
}

Figure 16. Code to deallocate and merge adjacent.

Note that this version of deallocate ignores the size of the deallocated block passed by the user. There are variant boundary tag methods where this is not the case. These variants save some storage, but they rely on the user to correctly remember the size. Since the size allocated may differ from the size requested, it may be an unfair burden on users to require that they know the size actually allocated.

The version of deallocate given here has one hidden trick in it: It would seem that the two merge operations in the code could be interchanged, but this is not the case! The reason is that after the block is merged with its low neighbor, all pointers to the block itself cease to be valid because the merger has eliminated the boundary tag to which those pointers refer. On the other hand, when the block is merged with its high neighbor, the tag eliminated is at the far end of the block, so the pointer remains valid. Therefore, the successor must be merged with the block before the block is merged with it s predecessor.

In fact, the code given here does not wipe out the boundary tag, so exchanging the order of the two calls to the merge routine would not cause damage, but it would be perfectly reasonable to code both the deallocate and merge routines so that the bodies of all free blocks were zeroed as soon as possible; we allow this fairly trivially with the order of the merges given, but it would be difficult if the order of the merges was reversed. Promptly zeroing deallocated memory is particularly important in memory management software intended for use in secure systems, since it can guarantee that there will be no security leaks resulting from deallocating the memory holding sensitive information and then letting someone else find that information in a newly allocated but not yet initialized block of memory.

Alternatives

All of the code presented above is based on the assumption that adjacent free blocks should be combined as soon as possible when they are deallocated. This results in free space lists which contain the largest possible blocks, with the added result that most allocation operations would be expected to involve block splitting. If some small block size is particularly common, it would be preferable to allow blocks of that size to accumulate in the free list so that they could be deallocated and reallocated with minimum effort. This leads naturally to the idea of eliminating the merge operation entirely from the code for deallocate and making this the responsibility of allocate, with the rule that the allocate routine only attempts to merge blocks that are smaller than the size requested. When we do this, we have created a lazy heap manager.

Lazy heap managers are frequently characterized by the following cycle: Initially there is one large free block. As system operation progresses, this becomes more and more fragmented by allocation and deallocation operations, during which adjacent free blocks are never combined. Eventually, an allocation request is made which cannot be satisfied by any single free block, so a scan of the heap is made, combining adjacent free blocks, after which the requested item is allocated (with luck) and the cycle repeats.

Systems that exhibit the cyclic behavior described above typically perform fewer computations per heap management operation, on the average, than those given in the previous sections. On the other hand, they are subject to occasional expensive searches of the entire heap during which adjacent free blocks are combined; as a result, some calls to allocate will take quite a long time. Because of this, prompt heap managers (the opposite of lazy ones, but not a standard term) should be used when the worst-case response time is of primary concern, while lazy managers can be used when the average response time is paramount.

Lazy boundary tag systems can frequently make use of a much simpler free space list organization than was given here because batches of block mergers can be done by a direct scan of the main boundary tag list, building a completely new free list in the process. This eliminates the need for a doubly linked list.

When a particular block size is heavily used, it is common to maintain a special free list of blocks of that size. Programmers in languages such as Pascal or C frequently add special free space lists to their programs for each heavily used dynamically allocated data type. When this is done, the resulting storage manager can be described as employing the a best-fit allocation strategy for blocks of these commonly used sizes, while using something else for uncommon sizes. If there are any requests for blocks of uncommon sizes, there will usually be few requests for any particular size, so a first fit allocator will probably be the best choice for those uncommon sizes.

There is a possibly disadvantage that follows when programmers maintain their own free-spase lists, and it may be serious: Doing so prevents the heap manager from using space in those lists to meet demands for other sizes of blocks. As a result, some sophisticated heap managers include special free space lists for block sizes which are known to be common; when other free space resources are exhausted, the space in the special lists reclaimed by the manager. For example, in a heap manager which defers the merger of free blocks, when a pass is made over the heap to merge free blocks, any blocks in the special lists can also be merged; as a result, after each merger pass, the special free lists are all empty.

Another possibility is to mix basic storage management mechanisms, using one mechanism, for example, the buddy system, to allocate small blocks, while using another, for example, a boundary tag system, to allocate large blocks. If this is done, the storage pool managed by the buddy system would consist of the bodies of some number of large blocks allocated by the boundary tag system. Although the is somewhat complex, it can improve storage utilization at a small run-time cost.

Storage Compaction and Garbage Collection

One approach to reducing external fragmentation is to move blocks in memory so that fragments can be combined; this is called compacting the heap. This is much easier to say than it is to do! To do it, we need a way to find and update all pointers to a block when we move the block. In segmented virtual memories, there is usually a master segment table containing descriptions of all of the segments in the system. If all pointers to a segment are indirect through an entry in this table, segments may easily be moved to compact storage by updating this entry, in general, though, finding all of the pointers to a block is computationally difficult.

Consider, for example, the problem of compacting the storage used by a Pascal, C or C++ program which has built extensive data structures in the heap. Each item allocated by that program in the heap may contain any number of pointers to other items in the heap, and no item may be moved without updating all of the pointers that refer to it. This is possible only when there is a description of each block in the heap which tells the system which words in that block contain pointers. Some machines have what are called tagged architectures, where each word has a small field describing the type of the contents of that word, but the majority of machines provide no help in solving this problem.

A Pascal or Java compiler supporting heap compaction must explicitly include this information in each item stored on the heap, typically by including a special pointer to a type or class description record as the first word of each object. Doing this in Pascal and Java is feasible because these are strongly typed languages. Doing it in C or C++ is harder because the compiler has less licence to include auxiliary information in a C or C++ structure. In those languages, programmers usually assume that the pointer to a structure is also a pointer to the first explicitly declared element of the structure, so insertions by the compiler pose problems.

The ability of the system to find all pointers to a block allows for more than just heap compaction, it allows automatic reclamation of those memory blocks in the heap which are no longer referenced by any of the user's pointers. This is valuable enough that it has a name, garbage collection.

The first programming language to support garbage collection was LISP, a list processing language developed at MIT in the early 1960's. For 30 years, this was the only widely used language where programmers assumed that garbage collection was supported. In the late 1960's, the Simula '67 programming language came into use; all implementations of Simula '67 support garbage collection, but this language never achieved any depth of market penetration.

The object oriented features of C++ are based on those of Simula '67, and most Simula '67 programmers treat C++ as an inferior adaptation of the same ideas. One reason is that few C++ implementations support garbage collection. Java retains most of the syntactic

faults of C while attempting to recover the strong points of Simula '67, including garbage collection; nonetheless, Simula '67 still seems to be a cleaner language. Despite this, the deep penetration of Java into the marketplace is a great step forward because it brings garbage collection into the mainstream in a way that has never occurred before in the history of computer science.

One approach to automatic storage reclamation uses a reference count associated with each block. Many programmers consider reference counting to be a simple form of garbage collection, but it is more properly considered as an inferior alternative to garbage collection. The reference count attached to a block is initially zero when the block is allocated, but it is immediately incremented when the pointer to that block is assigned to any pointer variable. The reference count of a block is incremented each time the pointer to that block is assigned to a pointer variable, and it is decremented each time the pointer to that block, stored in some pointer variable, is overwritten with a pointer to something else or when a pointer variable holding a pointer to that block is deallocated. The block is deallocated whenever the block's reference count falls to zero.

The problem with reference counts is that they cannot detect the fact that a circularly linked structures has been cut loose from the remainder of the structure, and thus, they tend to slowly loose storage in environments where such structures occur. Reference counting is widely used in modern file systems, but it is not generally used in heap managers.

Real garbage collection, in contrast, does not suffer from this problem. With garbage collection, a special routine, the garbage collector, periodically traverses all data structures accessible to the user, marking each item on the heap which is encountered; this is called the marking phase of the garbage collector. After doing so, the garbage collector claims all unmarked blocks as garbage and sweeps them into the free space list; this is called the sweep phase of the garbage collector. The names for these two phases lead to the name for the entire algorithm, the mark sweep garbage collection algorithm.

Generally, the mark-sweep algorithm requires heap data structure that includes a mark bit in each item on the heap. We might store this in the status field of the boundary tag on each item. Given this, the sweep phase becomes a simple scan of the boundary tag list of the entire heap, gathering the unmarked blocks, marking them as free, merging them if they are adjacent, and stringing the results onto the free list. In the process, the mark bits on all blocks are cleared to prepare for the next marking phase.

The marking phase is more interesting. In its classical form, this assumes that the entire heap is reachable from some root item. The marking phase operates by recursively traversing the items reachable from the root. If, during this recursion, an unmarked item is found, the item is immediately marked and then the recursion continues. If an already marked item is found, it is ignored since the presence of a mark indicates that it either has already been dealt with or that it is on the stack of items being dealt with. The recursion itself, of course, only follows the pointers stored in each item, completely ignoring integers, characters or other non-pointer content.

Generalization to multiple roots is easy, but the problem in a language such as Java or Simula '67 is to find the roots, since every pointer (or object handle) in the stack of function activation records is a root, as are all global pointer variables. The solution to this is to scan the stack and the global variables for all pointers, but to do this, we need to have complete descriptions of all activation record formats, telling us where the pointers are in each, along with a complete description of the global variables.

Heap Management and Virtual Memory

In actual practice, most systems use various combinations of different storage allocation methods. This is because each method performs better under a different class of demands. Most Pascal, C, and C++ implementations, for example, rely on a stack for local variable allocation and a heap, typically implemented with a boundary tag method, for dynamic or pointer variable allocation. On machines which use virtual memory, both the stack and heap areas share a single virtual address space which is constructed by a combination of hardware and software from a number of pages. Thus, fixed sized blocks (pages) are allocated to create the memory region from which the variable size blocks on the heap are later allocated!

Internal data structures in many operating systems are frequently allocated in terms of a standard block size from a pool of such blocks which is quite independent of the storage allocation mechanism used for allocation of page frames for user virtual address spaces. For example, virtual address spaces might be composed of 4096 byte blocks, a size which is also appropriate for disk buffers, but inappropriate for the records in service request queues; the latter might be allocated from a separate pool of 16 or 32 byte blocks.

When dynamic data structures are maintained in virtual memory, poor locality of reference can result. It is highly likely that a linked list, for example, will end up with each list element stored in a separate page. Some heap managers have been constructed which attempt to improve the locality of heap data structures, for example, by allocating successive elements of a list in the same page when possible. This requires that the free space data structures be organized, when possible, by page. It also requires that the allocate routine be informed of the desired location of the allocated storage. The Pascal new procedure has access to the address of the pointer to the newly allocated storage because it returns its result in a parameter passed by reference, while the C malloc function is not generally aware of this. As a result, an implementation of new could attempt to allocate the new data structure in the same page as the pointer to the new data structure, while this would be more difficult with malloc. Of course, even if new is coded to allow this, it will only be useful if users are careful to call new(element^.next) to extend a list, instead of using a temporary variable and then assigning that temporary to the appropriate field of the data structure.

The first heap managers that tried to keep linked lists together in one page were developed for the LISP programming language. All data structures in LISP are based on elements that take one of two forms, either atoms (terminal symbols such as numbers or identifiers), or list elements. All LISP list elements have two pointer fields, the car and the cdr fields. (These names are archaic -- they refer to macros on the IBM 709 computer for "copy address field" and "copy decrement field".) By convention, lists in LISP were linked through the cdr fields, while the car fields were used to point to individual list elements. Atoms were also stored as data objects of the same size as a car-cdr pair, so the original LISP could use fixed size blocks for everything, where each block held two pointers plus a tag bit.

With the migration to virtual memory systems, LISP garbage collectors were written that operated by copying all the reachable data structures from one memory region to another (and back again). This is analogous to doing garbage collection by making a back of a file system and then restoring from that backup. This garbage collector always copied consecutive list elements into consecutive memory locations, thus greatly improving the locatlity of reference of list-processing code.

Once it was recognized that consecutive list elements are usually stored consecutively in memory, it seems wasteful to use half of the available memory to store pointers to the immediately following memory word, so a new list representation was introduced -- totally hidden from the user, who still acted as if each list element had separate car and cdr fields. The new representation stored all the car fields of the list in consecutive words of a contiguous block of memory, with one tag bit per pointer indicating that it was a car pointer. The final pointer in each block had the alternative tag value, indicating that it was a cdr pointer. This compacted list representation was called a cdr coded list.

Terminology

The reader should be familiar with the following general terms after reading this section:

dynamic storage allocation      buddy systems
heap                            the buddies of a block
allocate                        binary buddy system
deallocate                      Fibonacci buddy system
fixed size blocks               best fit allocation
free space list                 first fit allocation
internal fragmentation          boundary tag systems
external fragmentation          deferred block merger
reference counts                garbage collection
storage compaction              automatic reclamation

Lecture 27, Dynamic Storage Allocation

Heap management for linked data structures

Storage Allocation Services