Except for binary machine code, all computer code is intended to be read by humans. Well-written code makes this easy, and coding standards or guidelines help you create well-written code. Here are some guidelines to follow in code written for this course.
[Adapted from coding standards for Roger Peng’s Biostat 776 at Johns Hopkins University.]
- Program files should always be ASCII text files. Program files should always be immediately source-able into R or read by a C compiler in the case of C programs. If you cannot source your file directly into R, then the file format is not acceptable. Word processing programs like Microsoft Word, by default, do not save files as text files.
- Always use a monospace font to write or display code. Variable space fonts like Times New Roman are not appropriate and can alter the apparent structure of a program (and hence its readability).
- Always indent your code. If you use an editor like GNU Emacs, then there is support for automatic indentation of code. I prefer 4 space indentation, as recommended in the R coding standards. Comments should be indented to the same level of indentation of the code to which the comment pertains. Comments can also appear at the end of a code line, if space permits (but see below).
- Put spaces around operators and after punctuation marks like commas and semicolons. This makes the code easier to read.
- Your code should not extend past 80 columns. This is because standard Unix terminal windows are 80 columns wide and if your code wraps around the end of the line it becomes very difficult to read. Break long lines if you have to. Exceptions should be made only for hard-coded constants (such as path names or URLs) which cannot easily be wrapped or shortened.
- As a rule no function or subroutine should be longer that about 30 lines. In particular it should be fully visible, without the need to scroll, in an editor using a reasonable font size. Being able to see the full code helps in understanding the logic of the code and helps limit the complexity of individual functions. With lower level languages like C this rule occasionally needs to be broken, but exceptions should be thought through very carefully.
- Don’t repeat yourself. In particular, don’t cut and paste. If you find yourself writing the same bit of code, or very similar bits of code, multiple times then it is time to think about abstracting the core idea out into a function of its own.
- Use a consistent scheme for naming variables. I happen to prefer so-called Camel-case, as in
fileLength
to file_length
(called snake case) , but either is fine as long as you are consistent.
- Ideally code should be sufficiently well factored into functions and subroutines, with well chosen function and variable names, to be easy to read and understand without comments. Comments should be used only to explain non-obvious steps in tricky computations, or to provide background or attribution.
Good programming editors will help immensely in following good programming practices.
Some other coding style guides:
Some useful tools:
- A good programming editor that is aware of R and C syntax.
- The
indent
command for formatting C code.
- The
formatR
package for formatting R code.
- The
styler
package.