Thursday, February 2, 2012

Commenting For the Future

What, me comment?


You are in the middle of a large coding project.  You've got the entire design in your head, and you are banging out lines and lines of code.  You've done your best to name your functions and your variables clearly.  It all seems so obvious, why would you need to add comments too?

The simple answer is: "What you know now is not what you'll know later.".  Another simple answer is: "You aren't the only one maintaining the code.".  No matter how obvious something seems to you now, it isn't going to be obvious to the next person who comes along, even if that person is you!

Comments have a definite purpose.  They are a road map for future usage.  If done properly, comments can help you restructure the code, and keep your functionality tidy.

What To Comment
Modules: Every source code module needs a paragraph describing the content of the module.  This allows people to find the proper module to look for a function, and directs them to the proper place to put a new function.

The comment should describe the general usage of the module, and what types of functionality should be found in this module.  If applicable, it can give a general idea of where in an architecture the module belongs.

Functions: Every function in a source module needs a paragraph describing when to call the function, what the operands are, what input and output are expected.  This is also a good place to describe interactions with other parts of the code.  If this function can't be called until some part of the architecture is initialized, then put that in the paragraph.  If this function relies on code sequences elsewhere in the product, add that too.  No one should have to read the code for an entire product to figure out what is happening.

Code: Code should have comments when there is a reason to have comments.  If the code is tricky to understand, or it has some non-obvious code interactions, write that down—don't make other people figure it out the hard way.

If you've had to break up logic, then comment it so the flow is obvious.  You can also use code comments to describe what conditions would lead to the code, and remind you of issues that exist in the code.

You should definitely leave comments behind about suspicious code sequences, code that you recognize is error-prone or overly delicate.  Remind yourself of code that needs rewritting or moved.  You can't always take care of all problems immediately when you are looking at code, but if you leave a note you can always get back to it.

Avoid stupidly obvious comments like:

rc = 0;    /* no error yet */
myVar = value;   /* copy because we need this later for <some reason> */

That's just clutter.  You want your comments to be meaningful and worthwhile to read.

Comments are necessary.  There isn't a programming language in existence that makes the "why and how" of code obvious enough to avoid comments.

Don't fix the bug...yet!

Why fixing that bug should be delayed


You've just finished coding a project and the code is being tested.  You find an erroneous error message, or an error is flagged that you didn't expect.  You whip into Bug Fixing mode, hunt down the cause of the issue, and slap in some code to fix that problem.  Tadah!  All is well again!

But you've missed an opportunity: unless you have diligently forced your code down every error path, you've got code paths that you have never explored.  Before you fix that bug, you have a unique opportunity to make sure that code path does exactly what you want it do.

The fact is, bugs are in your code—there is no getting around them.  Unexpected system interactions, unexpected user input, and sun spots all affect the code in ways you didn't imagine when you wrote it.  Given that you can't fix all the bugs, what you want is a reliable way to handle them.  You want an error reporting system that tells you exactly what went wrong, where it went wrong, and gives you a clear idea how to fix the problem without having to recreate the issue or spend time in a lengthy exploratory debugging session.

So, don't fix the bug right away.  Keep in mind how you want to fix the bug, and give yourself a bit of time to explore the error .

Is the text of the error message clear?
You have the error code and/or error message.  Take a moment to look at that message: is it clear to the user what action they should perform?  "Unexpected Error" conveys no information.  Rid yourselves of useless error messages.  Every time you find one of them, take the opportunity to convert it into an error message that tells you where the failure is and what the likely causes of the failure are.

Make sure the spelling is correct, that the message identifier is correct and documented, and that the syntax of the message is proper.  Error messages that are phrased "Unexpected Error, unknown, failure of" might have made sense when you wrote them but really, they just garble your intent.  Fix the content, then language, then the spelling.

Is that the error message you want?
You've caught the code in an unexpected position.  Is that the message you want to write when it gets there?  You might have a message that says "Unexpected Error, this should never happen", or "Unexpected Error call Bob at (800)123-4567", but this isn't what you really want to say when the error happens.  Now that you know what causes the failure and understand the code so that you can project what else might cause execution to reach this spot, make an error message that says what happened, why, and how to fix it.

Is that where you want the error to be detected?
You now understand the problem and you know what had to happen to get here.  Think about it for a second and see if maybe you couldn't have caught this issue earlier.  Is there a better way to head off this issue?  Was this the result of a subtly bad user input?  Could you screen the user input better and flag this earlier?

If you do redirect the error handling, make sure that works too.  If you've got an error stack (you do have one, don't you?), make sure all the messages in the stack make sense, are written correctly, and convey information too.

Leave some comments behind.
Oh, and comment what you've found.  You spent all that time tracking down the error and what happens when.  Leave yourself a note about how things flow so you don't have to rethink all this stuff again.


Then fix the problem
You've got the error handling worked out: the code is following the correct paths and detecting the issue as early as is feasible, don't forget to fix the problem.

It gets better
Diligently applying this method will make your code better and more readable. It will make your error messages more useful.  It has one added bonus for code that has loads of bad error messages: .it concentrates your efforts on errors that actually happen.

Handling Codepage and ASCII/EBCDIC issues

We Are Not All The Same

Dealing with character sets is never easy.  We all wish the industry would just pick one codepage and go with it, but the realities of multi-language and multi-platform support make that an impossible dream.

Even if you know you will never deal with an EBCDIC-based platform you can't escape the need for multi-language support.  The fact remains that not all ASCII codepages are the same.

It's easy to write code that ignores this problem.  It's hard to retrofit a solution into code that absolutely ignored the existence of the problem.

I'll Worry About That Later
Suppose you write your complex application, totally ignoring the codepage issues.  How bad can it be to fix it later when a customer demands NLS support, or you need to port it to another platform?  After all, that's an issue for the porting engineers isn't it?

Remember we are all in this together.  The goal is to produce a product as quickly as possible so the company makes money and pays our salary.  Unless you want your porting team to take up the head count budget, it pays you and the company to plan ahead.

Difficulty In Retrofitting
You've spent months writing a complex application with multiple I/O points.  The application displays web pages and sends socket requests to the four corners of the world.  It talks to databases and reads files and extracts information from all manner of data sources.  You've completely ignored the codepage issue and now you want to deal with it.  Where do you start?

Chances are the web server is UTF-8.  Everything else is platform-dependent.

You read data from the database: is the database on the mainframe? on AS/400? on Windows?  What codepage did you use to ask for the data?  Did you even specify what codepage you wanted it in, or just take the default at the database?

You read data from a file: is the file a PDS on the mainframe with naming convention issues that look nothing like any other platform?  Is it UNIX-based, where capitalization is an issue and the slashes go in one particular direction?  Is it Windows, where the capitalization might not be an issue, spaces are allowed to be embedded, and the slashes go the other direction?

You want to display data on a web page or end-user application.  Where did the data come from?  What codepage was it produced with, and what codepage do you want to display it with?

Thinking about this after the fact is painful and error-prone.  Every hole you patch shows six other holes, and further testing shows more.  If you manage to ship the code, how do you explain to the user that the error message '   ?>    /    _>' really means "Invalid Input"?

Going on a Wumpus Hunt
You've shipped the code, or testing has shown that somewhere, buried in the code there's a message that isn't getting translated properly.  Either it was never translated or, even worse, it's been translated twice.  I usually don't mind the never-translated.  On some systems you can see the hex values for the message and do the translation by hand and hunt down the message.  But doubly translated?  It's incomprehensible gobbledy-gook.  You can't grep for '  ?>  /  _>'.   How do you hunt down a message when you don't even know what the content was?

Unintentional Lack Of Portability
Suppose your code was written with some level of portability.  You have spots in your code where translation takes place.  Most things are working, but every time you add a test case or make minor revisions, you could be introducing unintentional lack of portability.

Suppose you are reading a line from a file and you want to check for end-of-line.  You add something like this:

if (myString[i] == '\x0A')

or you want to check for a blank and you accidentally slip in:

if (myString[i] == '0x20')

Or you want to make sure the characters you are about to display are readable and you do this:

if (myString[i] >= '0x30' &&
    myString[i] <= '0x7E')

All of these examples are going to fail a portability test, as they tie your code to a specific codepage.

Any one attempting to port this code won't find this unless they look through every line of code that you have written.

Magic Numbers and Phrases
Does your code look like this?

rc = 100;              /* assume all is well */
strcpy(status, "All is well");
rc = doFunction(parm1, parm2, "MakePumpkins", 121);
if (rc != 100)
{
    printf("error %d");
}

This code is full of magic numbers and phrases.  This makes porting things difficult and maintenance of the code costly.

Suppose you decide that the "good" error code should no longer be 100.  If you've hard-coded it every place, then every location needs to be updated to the new "good" error code.

The function call has magic phrases and magic numbers.  Finding and updating these phrases is next to impossible.

Magic phrases add "another layer of impossible" when you have codepage issues.  That magic phrase might have to be in a different codepage, depending on when and where this code is invoked and handled.  In a system with both ASCII and EBCDIC modules, this function call might cross the boundary between an EBCDIC front-end module and an ASCII compiled internal function.

Save yourself a nightmare and put all these magic numbers into a header file as #defines or constants.  You will find maintenance much easier with only one spot to update the values, and no grepping around to find them all.  Your porting team will be able to focus on what values might cause issues and find ways to handle them, while being able to grep for the specific usages of the values.

Configuration Information
In your test libraries you likely have tons of these magic phrases and numbers.  If you are using this test library on multiple platforms, it's quite likely that these values will not be the same for all environments.  Putting them in a header file is nice but there's a better solution.  Externalize them in a file and write a simple application to suck the values into the test programs.

If you go the header route, then you have to compile the code, test it, change the value to something appropriate for the platform, recompile, retest.  Continue until every value in the header file is checked against the test cases.  Then, for every platform you support, add an #ifdef until the file is unreadable. Instead of wasting time with recompiling, putting these values into an external file means all you have to do is update the file and rerun, with no intervening compilation steps.

Steps For More Portable Coding
What can you do to produce code that is more portable and protects itself from codepage issues?

1. Stop, Think, then Code
Don't spend months agonizing over all the places things can go wrong,  but before every coding project give some thought about where a portability issue might come into place.  It's much easier to structure the code to isolation the porting problems when you first write it than afterwards.

Regardless of the rush you are in to finish the code, a few minutes thinking and planning ahead of time will always make the code better.

2. Watch for Magic Phrases, Magic Numbers and Hex Values
Isolate these stumbling blocks into header files or external files.  While coding both the product and the test cases, make these things obvious and easy to find.

3. Comments
Every module, function, and structure should have a comment block describing its usage already.  While you are writing or adding to these comment blocks, include a line or two about what codepage it should be dealing with.  Functions that interface with outside data will most likely be in the external codepage.  Internal functions will most likely be in the product's codepage.  It may seem obvious to you while you are writing the function, but it won't be obvious three years later or to any other programmer.

4. Conversion Functions
Even if you don't need the codepage conversions now while you are writing the bulk of your code, you know where they will be needed.  Add a dummy function that doesn't do anything yet.  doConvertExternalToInternalCodePage() and doConvertInternalToExternalCodePage() will make it clear where the conversions are supposed to take place.  These functions will likely have different implementations for different platforms, but now it will be easy to port the code and spend the company's man-hours writing new products instead of porting.