off by one

Categories: pedro | technical:kernel

[Add new subcategory] [Edit] [Delete]

access to random information is important!:

Ever thought about what happens when an SSL implementation can't get access to random data? Or how fork() complicates the behavior of a pseudo-random number generator (PRNG)? You should. Now there's a proposal for a getrandom() system call in Linux.

[Edit me!]

a moment of silence:

Linux no longer supports 386.

[Edit me!]

disabling the caches in Linux:

Sometimes you'd like to disable the file (and other caches) in Linux for performance testing reasons. Well, now you can. Simply:

To drop pagecache only, enter: echo 1 > /proc/sys/vm/drop_caches

2 is dentries and inodes only, and 3 is pagecache, dentries, and inodes. You can easily whip together a little script to drop caches regularly enough to simulate running with no caches.

(This information is also under "drop_caches" in linux/Documentation/filesystems/proc.txt.)

[Edit me!]

IT'S NOT A RACE CONDITION:

Here's another in an embarrassing series of stupid programming mistakes:

I don't know about you, but I have a bad habit: when I encounter a strange bug in my code and I'm not sure how it could have happened, or it involves some kind of event that happened or didn't happen when I thought it was supposed to, and especially if it involves data corruption, I start thinking it's a race condition. Which, of course reminds me of a certain House, M.D. meme.

It COULD be a race condition. It's possible. Especially if I'm testing on a multi-core machine. (And these days, aren't they all?) It's also more likely to be a race if it locks the machine, or if it's unpredictable. But every time -- every single time -- I have thought that an elusive bug was a race condition, it wasn't. It was an ordinary, mundane, bone-headed move on my part.

Allow me to digress for a moment. In religious circles, you hear people talk about "sins of commission" versus "sins of omission." In the former, it's something you've done, such as stealing, murdering, or coveting your neighbor's gorgeous donkey. Sins of omission are the things you haven't done, but should have, such as failing to honor your father and mother or not loving your neighbor as much as you love yourself. It's easier to recognize the wrong things we do than to recognize the right things we fail to do. It's similar in programming. Most mundane bugs are "wrong things we do" -- erroneous code we wrote into the program. Most race conditions (in my experience) are the result of necessary things we failed to put into the program (locking and/or synchronization).

It's fundamentally hard to pore over your own work to find mistakes. You kind of have to pretend you didn't write it -- or assume that it's wrong -- because if you'd known it was broken, you wouldn't have written it that way to begin with. It can also be hard to swallow your pride and admit that your work is also the most likely source of error. Subconsciously choosing to assume that the problem is a race somehow saves face (even while it dooms you to hours of adding specious locks and fruitless poking around).

In my experience at least, you're only fooling yourself. Next time, unless you're absolutely positive it's a race -- assume that it's not, and start looking for assumptions you made in your own code. Maybe you calculated that pointer incorrectly. Or maybe your loop is exiting earlier than it should. That innocuous helper function that you skimmed may be stabbing you in the back.

[Edit me!]

T get_pageblock_migrationtype:

Are you a kernel newbie like me? Have you been perusing kernel code and wondering what "migration type" is or whether it's important for you to understand? Here's a link.

Page migration is about moving pages around to alleviate differences in RAM in access times. How can acccess times be different? Well, in traditional systems, they're not, because the system has only one bank of RAM, and the time to access it is always the same.

However, some new systems are NUMA systems. NUMA stands for Non Uniform Memory Access and describes a system where the memory access times are not uniform from processor to processor. For example, my dual Opteron board has two banks of RAM, one for each processor. The bank for CPU0 can hold 4G, but the bank for CPU1 can only hold 2G. As you might imagine, there are times when CPU1 needs more than 2G of RAM, so it can "borrow" from CPU0 -- but of course, memory access to the other bank will take longer than memory access to its own, local bank, so sometimes we'd like to "migrate" the data from one memory bank to another.

You can imagine how this kind of thing could get very complicated in a large multi-system cluster or in future "1000s of cores" designs.

[Edit me!]

[Main]

Unless otherwise noted, all content licensed by Peter A. H. Peterson
under a Creative Commons License.