Machine Problem 1, due at the end of Feb 5

Part of the homework for CS:2630, Spring 2024
by Douglas W. Jones
THE UNIVERSITY OF IOWA Department of Computer Science

Framework: Your program must begin and end as follows:

/* mp1.c by YOUR NAME HERE */
#include <stdio.h>
#include "/mnt/nfs/clasnetappvm/fs3/dwjones/mp1data.h"

... your code here ...

int main() {
    printit( mp1data );
    return 0;
}

Obviously, where it says YOUR NAME HERE put your name, and where it says ... your code here ... put C code to solve the problem. Obviously, your C code must be in the form of a void function (a subroutine) named printit.

Details: The second header file above, ...mp1.h defines and initializes a variable:

This variable points to the first character of the message the program should output. This message is potentially the root message of a tree of messages.

Each message is a null-terminated string of 7-bit ASCII characters (Unicode positions 0 to 127), but it contains occasional characters in the range 128 to 255. These characters are interpreted as 7-bit relative pointers to sub messages as follows:

If p is a pointer to a character in a message,
*p is the character p points to.

If *p == '\0', or equivalently,
if *p == 0 it marks the end of string.

If *p & 0x80 is zero, or equivalently,
if *p < 128 it is just an ASCII character.

If *p & 0x80 is not zero, or equivalently,
if *p > 127 it is a relative pointer to a sub message.
The sub message begins at address p - (*p & 0x7F),
or equivalently, at address p - (*p - 128).

Top Level Specification: Write a C program that outputs the message described by the header file, including all of the included sub-messages. The program should output nothing else. If you get it right, the output should be in well-formatted coherent English.

The definition of the message and submessage format implies that printit is recursive, and it implies that printit must examine each character in the message, potentially making a recursive call at each character.

To output just one character to standard output, any of the following will work, where 'c' is any expression with a value that is a character:

putchar( 'c' );
putc( 'c', stdout );
printf( "%c" ,'c' );

Grading: 5 points. Correct output is worth 2.5 points. The remaining credit will be offered only for those who have correct output.

Code that does not compile will not earn any credit.

Stylistically clean code is important; that is why half the credit is reserved for style, but only if the code works. Awkward indenting will be penalized up to 1 point from the 2.5 style points. Excessive or inadequate white space within and between lines will be penalized, again, up to 1 point. Excessive or inadequate comments will be judged similarly.

Note: Comments should not repeat the assignment. Assume that the reader of your code has read the assignment. Assume that the reader knows how to program, so commenting i++ with /*increment i*/ would be stupid. Comments should help the reader when the code is hard to follow.

Note: As a general rule, the first lines of every file should identify that file. The header comment in the code framework above does this. Please avoid using lines longer than 80 characters -- this is a common rule in the style guidelines of many employers. As a consequence, it is almost always a bad idea to maximize your editing window, since the default width of the window is 80 characters. Better to use the remainder of your screen realestate to open other windows for reference.

You can use the following tool flag many common file format problems such as overly long lines or invisible characters:

   [HawkID@fastx?? ~]$ ~dwjones/format mp1.c

This format checker does not know anything about C or SMAL, it is looking for things that cause trouble with files imported from Windows systems, over-length lines, and non-printable characters.

Submission: To submit your work, it must be in a file named mp1.c in your current directory on the CLAS Linux system, and you must know your section number. In the following, what you type is shown in bold face. Begin at the linux command-line prompt:

   [HawkID@fastx?? ~]$ ~dwjones/submit 0A0X mp1.c

Note: The ~dwjones must be verbatim, do not substitute another name. Also, use your section number (one of 0A01 to 0A05).

The submission script will copy your code and, after the copy is made, it will output a message saying "successful submission." You may resubmit as many times as you want; each time you resubmit, your previous submission of that file will be replaced, so we will only see your final submission. The completed dialogue on your screen will look like this when you are done:

In the event of insurmountable trouble, do not delete or edit files you have submitted until the graded work is returned. The operating system record of the last edit time and date and the backups maintained by the operating system are proof of what you did and when, and they allow us to investigate what happened. until your graded work is returned. This way, the time stamp marking the time of last edit is a trustworthy indicator of whether you did the work on time.

Historically, students unable to successfully complete MP1 for this course have an extraordinarily low chance of success in the course. If this assignment proves daunting, seek help early.

Suggested development methodology: Please work incrementally:

In developing this assignment, we found it useful to use a this (boring) message instead of the one in the include file:

char * rawdata = " \x00hello\x87\x00\x87world\n";
#define mp1data (&rawdata[9])

This makes the program output "hello world" as a test. If, for debugging, you outputa begin paren before each recursive call and an end paren after each call, the output would be "(hello( ))world."


Discussion

Note added Jan 31

It seems that p - (*p & 0x7F) is not the same as p - (*p - 128). What's up?

It turns out that the C language has an annoying weakness. Type char can be either signed char or unsigned char at the compiler-writer's discretion.

p - (*p & 0x7F) and p - (*p - 128) are equivalet if type char is unsigned char.

What's the difference?

The range of ASCII characters is from 0 to 127, values that have exactly the same representation in both character systems, and it seems that with our C compiler, type char is signed char.

The special characters for our machine problem are negative in the signed domain and positive in the unsigned domain. Ugh.

What are the practical consequences? Either use logical formulations like ch & 0x80, because those are the same in both signed and unsigned meanings, or make sure to declare your character variables as type unsigned char to override the default, or change your tests to account the fact that your characters are signed.

This suggests two fun exercises:

a) The test for *p < 128 detects plain ASCII characters when *p is an unsigned character. What is the equivalent test that works when *p is a signed character?

b) The pointer modification p - (*p - 128) works when *p is an unsigned character. What is the equivalent modification that works when *p is a signed character?

Note added Feb 3

I've seen two solutions to MP1 that used recursion both to traverse each message and to handle subsidiary messages.

One student wrote that their solution gave the output:

world
hello
instead of
hello world
when given the small test data suggested above. Why?

The students code used double recursion, and they put the recursive call to print the remainder of this message before the call to print the subsidiary message. This pushed all subsidiary messages to the end instead of printing them in their places.

Recall that there are 3 basic ways to traverse a binary tree:

These variations generalize to n-ary trees. In effect, the programming error that student made converted an in-order traversal to a preorder traversal.

It is almost certain that the error would not have occurred if the student had written the code to print each message using a simple while loop, using recursion only to follow pointers to subsidiary messages.

Note added Feb 5

Two students asked: How do I see the output from my C program?

Consider the following example:

[dwjones@fastx05 ~]$ head hello.c
/* hello.c -- the classic hello world program */
#include 

int main( void ) {
    printf( "Hello World\n" );
}
[dwjones@fastx05 ~]$ cc hello.c
[dwjones@fastx05 ~]$ ./a.out
Hello World
[dwjones@fastx05 ~]$ 

In the above, I first used the head command to verify that hello.c contained what I wanted. The head command only shows the first few lines of a file, but the Hello-world program is so short it showed the whole file.

The, I use the C compiler with the cc command. That produced no output because there were no errors, but it did create a new file, a.out. The new file contains executable object code, so I ran it with the command ./a.out. The output followed immediately.

Another student asked about the spaces in the output.

The correct output is a limerick. Therefore, it is formatted the way limericks are traditionally presented. See the Wikipedia entry for Limerick (poetry) for many examples of limericks formatted this way.