CS:3620, Homework 10 Solutions, Spring 2018

On all assignments, your name must be legible as it appears on your University ID card! Assignments are due at the start of class on the day indicated (usually Friday). Exceptions will be by advance arrangement unless there is what lawyers call "an act of God" (something outside your control). Homework must be turned in on paper, either in class or in the teaching assistant's mailbox. Never push late work under someone's door!

Background: The code distributed for MP5 includes this bit of code, used to determine if a newly allocated block is big enough to split:

/* we found a big enough block */
if (GETSIZE( p ) > (size + TAGSIZE)) {
        /* the block is big enough to split */
        void * new = (void *)( (intptr_t)p + size + TAGSIZE );

Here, p points to the block we found that is big enough to allocate, and if p is large enough to split, new points to the block that will be split off from it.

a) Does this code ever produce any internal fragmentation? If so, when, and what can you say about the range of internal fragment sizes? (0.4 points)

Yes. It only splits a block when the size of the block allows space for a boundary tag plus a nonzero-sized block to be split off. Thus, the internal fragment sizes will range from one word (that is sizeof(void*) bytes) to TAGSIZE bytes.

There is a second source of internal fragmentation that is not related to the above code: The required block size is always rounded up to the nearest word. This rounding contributes an additional 1 to sizeof(void*)-1 bytes to each fragment.

b) Why is TAGSIZE part of the consideration when a decision is made to split a block? (0.3 points)

To be split off as a new block, a fragment of free space requires space for a tag.

c) What is the minimum size of the block pointed to by new? (0.3 points)

Because the if statement uses >, the new block will include a tag plus at least one word of memory.

Note: Had the code used >=, the new block could have consisted of just a tag with zero words of memory.

Background: Look at the User-level thread manager. Specifically, look at Part II of threads.c.

A question: Why can't thread_exit() simply call free(current) instead of doing elaborate things involving mem_to_free and longjmp in order to transfer control to the code go_free_it is linked to. (1 point)

If thread_exit() directly callled free(current), the stack holding the local variables and return address of thread_exit() would be returned to the heap before thread_exit() was done with them. The result would be precarious, since the moment memory is returned to the heap, the heap manager is free to reuse or alter the contents of that memory.

Background: In the discussion of the mark-sweep algorithm on Mar. 30, the discussion used historical LISP object structure. Each lisp object had the following form, conceptually:

Each item has mark bit used only for the mark-sweep algorithm.
Each item has a tag field describing the rest of the object:
- tag = pointers: The object contains two pointers to other objects named CAR and CDR.
- tag = atom: The object contains regular data. There may be multiple atomic types, such as character, integer and float, each with its ownn atom value.

All objects were exactly the same size and the heap manager could address the heap as an array of objects as easily as it could build linked lists of objects (typically linked through their CDR fields). If you want to build, in LISP, the equivalent of an object in a Java program or a struct in a C program, you end up having to build it as a linked list of fields, where each list element has its CAR point to the field value and its CDR point to the next element.

Now consider the problem of building a garbage collector for a strongly typed language like Java, Simula-67, Pascal or Ada, where objects are represented by variable-sized records where each record contains a variety of fields, some of which are pointers. In all of these languages, you can assume that all objects of any particular type (or class) identically the same structure.

a) What additional data does the garbage collector need to know about each object and about each word of each object? (0.4 points)

It needs to know which fields of each object are pointers to (handles for) other objects and which are not.

b) Suggest an appropriate representation for this data in terms of the actual representation of objects in memory at run-time. (0.3 points)

Each object could begin with a description of the rest of the object, for example, a bit vector with one bit per word of the object indicating whether that word contains a pointer or just data.

Note: The size of the object, which implies the number of bits needed, would typically be encoded in the boundary tag for the object or it could be inferred from the class of the object. Also, since all objects of the same class typically have the same structure, the bit vector could be stored in the class description for the object, with the first word of the object holding a pointer to the class description.

c) Now, what about garbage collectors for C. What is it about C that makes the problem more difficult? (You'll find ample examples of this problem hidden in the code for the heap manager you're working on for MP5.) (0.3 points)

The C compiler cannot insert useful type information into structures because the weak typing of the language does not guarantee that any fields of an object will be used the way they are declared.

A more subtle problem: C lets you create pointers to any variable, so not all pointers to an object in the heap will point to the start of that object. Some may point to any byte in the object. How do you find the start of an object when given an address in the middle of that object?

Note: In fact, garbage collectors for C and C++ have been written. They work by assuming that every word of every global variable, every word of every stack frame, and every word of every allocated block might be a pointer. If any of those words point to an object in the heap, that object is assumed to be reachable even if the word actually contains text or a floating point number that just coincidentally could be interpreted as a pointer into the heap.

Assignment 10, due Apr. 6

Solutions