off by one

Categories: pedro | technical:coding

[Add new subcategory] [Edit] [Delete]

SESE -- Single Entry, Single Exit -- Considered Misunderstood:

I was just wondering about this the other day...

[Edit me!]

Principle of Least Responsibility:

A lot of the code I write these days is for running experiments (or analyzing data from experiments). And I'm always trying to automate my testing more, in order to keep the system from being idle when there are more cases to test. So, given some experimental program E, and a set of inputs to test {1,2,3}, it's tempting to give E the power (through the magic of computer programming) to take in a list of tasks to work on -- so you can tell it "do {1,2,3}" instead of telling it "do 1", then telling it "do 2", and then "do 3".

But there's a problem here, especially if your tests take a long time to complete. If your code crashes (or a test fails) in the middle of "do {1,2,3}" it may not be obvious where in the test sequence the code failed. Or, even if it is obvious, it may not be easy to pick up the tasks from where things crashed.

Instead, it's more robust to write a simple wrapper W that, when given the inputs {1,2,3}, picks them off one at a time, calling E with the individual tasks. It may take a little extra work to do things this way, but it will pay dividends in the long run.

For example, another benefit is intelligent management of repetitions of experiments. Suppose you have tasks {1,2,3}, but you want to run each of them 3 times. A naive approach will run {1,1,1,2,2,2,3,3,3} but then you have potential ordering bias. And, if (say) tests {1,1,1} succeed, but the 2nd test 2 fails, what happens to your results?

Using a wrapper W, you can generate the list of experiments, randomize them, and write them to a file. W can track the success or failure of each test, keeping your experiments running but also separating the wheat from the chaff. This approach can also give you a good way to track progress through a set of experiments.

My "task log" usually looks like a shuffled file with one line for each task, or a directory of files with each file representing one task; this also allows you to modify the task set on the fly -- adding tests to the end of the task log. W doesn't remove a task until it's been completed successfully, but it moves on to subsequent tasks so it doesn't get stuck in a pathological case. (You could have it put failed tests in a separate list.) Anyway, all this allows for better recovery and progress tracking.

Anyway, I know there are lots of names for this general principle of abstraction or compartmentalization, but I'm calling it the "Principle of Least Responsibility" -- with the first working definition being "give a piece of code the least amount of responsibility that is practical". I say "practical", not "possible" because taking it to the extreme could be burdensome.

This is pretty similar to the UNIX philosophy of "Do one thing and do it well." In my case, it specifically means "don't make the code hold the state of a sequence of experiments because all it needs to be responsible for is the single experiment it is currently performing!"

In the moment, it may seem like a great idea to let your code do all the heavy lifting for you, but a little abstraction/functional decomposition can save you a lot of grief in the long run.

[Edit me!]

Codin' Chip:

In the Boy Scouts, there is a thing called a "Totin' Chip". It is "both an award and contract in Boy Scouts of America that shows Scouts understand and agree to certain principles of using different tools with blades" (WP). To get the Totin' Chip, which is a paper card (like a library card or the like) scouts must demonstrate a certain amount of knowledge and responsibility. The Wikipedia page has more on it, of course. The main thing (besides the rules) is that violations of the Totin' Chip code result in one or more corners of the card being removed; when all the corners are gone, you lose your right to tote a blade.

Anyway, I think there should be a "Codin' Chip" -- maybe it's a card, maybe it's an actual chip. If it's a card you lose corners; if it's a chip, you lose pins. Anyway, when you lose em' all, you're done.

Violations can be large or small; for example, not commenting code meant for others to read falls into that category, as does using equality to test floating point numbers inappropriately. Using strcpy and the like is definitely in there.

What else should cause you to forfeit a pin off your Codin' Chip?

[Edit me!]

libconfig problems in Debian and Ubuntu:

Are you still using Hardy Heron? Have you had issues using libconfig -- a configuration parsing library by Mark Lindner -- in Debian or Ubuntu? Here's why. Turns out this has been solved in later releases.

[Edit me!]

getopts -- another one for posterity:

Just a little addition to posterity about getopts. 'getopts' is a handy bash builtin that parses parameter strings. It exists so parameters can be processed in a consistent way, and so you don't have to reinvent the wheel every time you want to use command line switches. It's part of POSIX, is really pretty easy to use, and many languages have libraries which work in a similar way. Anyway, there are many tutorials online, but the one at bash-hackers.org is the best I've seen for bash.

I was inspired to write this post because I've been wrestling with a problem. I used getopts inside a function that I was using for logging. It just wasn't working right and the behavior was strange. The answer, which perhaps should have been obvious in retrospect, is that getopts uses a variable, OPTIND, to index into the parameter list. You can use OPTIND to "shift off" the options you found with getopts, leaving the remainder.

This works great if you run getopts once, at the beginning of a script. But because of bash's scoping, if you use getopts inside a function, the value of OPTIND remains where it was left at the end of the previous getopts execution. So, if you're going to use getopts inside a function, you should reset OPTIND when finished or, to be safe, at the beginning of the function before running getopts.

I'm so glad I spent a few hours chasing that one down.

Update: It also implies that you should be mighty careful if you plan to use getopts in more than one place in the same execution of a program!

[Edit me!]

ah, time for my yearly smattering of blog posts:

Annnnnnnnnnd we're back.

Today's hot tip:

@filearray = <FILE>;

...is a bad idea if <FILE> is rilly large!

Thank you good night.

(No Perl jokes, please.)

Update:

Also, don't do:

foreach (<FILE>)

... it's basically just as bad.

Instead, you are WAY better off doing:

while ($line = <FILE>)

... which only reads one line at a time. You get the same functionality at a fraction of the memory usage.

[Edit me!]

IT'S NOT A RACE CONDITION:

Here's another in an embarrassing series of stupid programming mistakes:

I don't know about you, but I have a bad habit: when I encounter a strange bug in my code and I'm not sure how it could have happened, or it involves some kind of event that happened or didn't happen when I thought it was supposed to, and especially if it involves data corruption, I start thinking it's a race condition. Which, of course reminds me of a certain House, M.D. meme.

It COULD be a race condition. It's possible. Especially if I'm testing on a multi-core machine. (And these days, aren't they all?) It's also more likely to be a race if it locks the machine, or if it's unpredictable. But every time -- every single time -- I have thought that an elusive bug was a race condition, it wasn't. It was an ordinary, mundane, bone-headed move on my part.

Allow me to digress for a moment. In religious circles, you hear people talk about "sins of commission" versus "sins of omission." In the former, it's something you've done, such as stealing, murdering, or coveting your neighbor's gorgeous donkey. Sins of omission are the things you haven't done, but should have, such as failing to honor your father and mother or not loving your neighbor as much as you love yourself. It's easier to recognize the wrong things we do than to recognize the right things we fail to do. It's similar in programming. Most mundane bugs are "wrong things we do" -- erroneous code we wrote into the program. Most race conditions (in my experience) are the result of necessary things we failed to put into the program (locking and/or synchronization).

It's fundamentally hard to pore over your own work to find mistakes. You kind of have to pretend you didn't write it -- or assume that it's wrong -- because if you'd known it was broken, you wouldn't have written it that way to begin with. It can also be hard to swallow your pride and admit that your work is also the most likely source of error. Subconsciously choosing to assume that the problem is a race somehow saves face (even while it dooms you to hours of adding specious locks and fruitless poking around).

In my experience at least, you're only fooling yourself. Next time, unless you're absolutely positive it's a race -- assume that it's not, and start looking for assumptions you made in your own code. Maybe you calculated that pointer incorrectly. Or maybe your loop is exiting earlier than it should. That innocuous helper function that you skimmed may be stabbing you in the back.

[Edit me!]

[Comments] (1) Remedial Coding: Never. Assume. Anything.:

I have friends who are great programmers, technologists, and scientists because they have a really remarkable clarity of thought and methodicity that translates well into programming. Programming is easy for them because they naturally structure problems the way programs are written. I'm not one of those people. My talents are more intuitive than analytical. I tend to think out loud. But that kind of approach can get you into trouble when you're doing a kernel project with a 20-minute compile-test cycle.

Writing code is like playing Operation. Anyone can do it. (Ok, 4 and up due to the choking hazard.) The question is, how many times are you going to get electrocuted in the process? Core dumps, compilers, stack traces, debuggers and print statements will shock you every time you make a mistake -- just like that buzzer and red light. But eventually, you'll get all the little bones out of the patient -- it just might be ugly along the way.

Anyway, in the past, my "fly by the seat of your pants" approach has made my programming a little like playing Operation while riding in the back of a truck. The process is painful, the end result can really be a mess, and it takes 10 times longer than it should. I try to be halfway methodical, but that just ends up being a waste of time. So, I'm trying to replace these sloppy habits with useful, meaningful structure.

My first hard-earned lesson is this: don't assume anything. You know how they say, "Don't assume, it makes an ass out of you and me?" Well, when you're programming, and you assume things, it only makes an ass out of you. I do this all the time though. I'll be writing some code, and I'll see a function named "does_stuff()" that gets called, and I'll think to myself "Oh, I know what that does." And then I spend an hour debugging my code before I think, "I guess I better figure out what that little function does..." I call these things "grey boxes". They're like black boxes, except less scary. You think you understand them, which gives you a comforting yet false sense of understanding.

I know what you're thinking. You tell me, "But does_stuff() could be doing anything! How can I know what it actually does?" Well, I'd like to remind you that it is a PROGRAM executing on a COMPUTER. Chances are, its behavior is totally deterministic. You could take half a day peppering your code with print statements until it narrates its behavior to you in the Queen's English, or you could take 5 minutes to actually look at the function. You'll probably learn something useful. Yes, sometimes it's not worth going down every rabbit hole until you understand the context surrounding it... but at some point, you will need to know what it's doing. Otherwise, you're betting the correctness of your program on something you assumed to be true. And the more code you write around grey boxes, the harder it will be for you to figure out later. You might even forget that you assumed things to begin with!

Assuming things is more pernicious than simply guessing about what does_stuff() does. It's not just functions that abstract away behavior. Function pointers, the meaning of variables, macros, goto labels, structures, methods -- in short anything that is not explicit or obvious -- could be playing games with your head. Functions snicker at programmers who naively assume things about their behavior. So do yourself a favor. Any time you hear the voice in your head say "I think that does/means X..." take 5 minutes and figure out what it ACTUALLY does. You'll thank yourself later.

[Edit me!]

[Main]

Unless otherwise noted, all content licensed by Peter A. H. Peterson
under a Creative Commons License.