off by one

Categories: pedro | technical:research

Wed Mar 20 09:57:47 PST when will I ever learn?:

I do a lot of performance testing in the course of my research, and so, like a lot of other people I imagine, I wrote a little test harness to manage running jobs. That harness treats each test as a discrete part of a larger test -- and runs through them individually until complete. The benefit of this is that you can use the model to make sure that your conditions are the same for each test, you can see the progress of the overall tests by looking at the job queue, and you can pick up where you left off if something goes wrong. I've spent a lot of time on that script.

But then, I always forget about it and start a long string of jobs using a nested for loop... and then I lose all those benefits. My tests aren't particularly sensitive to conditions, but since I didn't use a job queue of any kind, I have no idea when the overall tests will end, and if I kill the task, I'll lose all my progress. Gar!

Thu Jan 24 14:57:33 PST looking for volunteers to donate their web data to SCIENCE!:


Do you know what network sniffers like Ethereal or Wireshark are? Do you like SCIENCE? Would you be willing to surf the web UNENCRYPTED and UNCOMPRESSED for about an hour (or so) and send me, a trustworthy individual, captures of the data you downloaded? If so, please read on.

If you don't know what Wireshark is or how to get it, you're not really sure what I'm talking about, how it would work technically, or you DO understand very well thankyouverymuch and are horrified or unwilling to do so because of security or privacy concerns, thank you, but you can stop reading now.

Still reading? Cool!

I'm working on a research project involving the properties of web data in the real world. It would be super convenient if I could just browse the web all day and use my own data for my experiments... unfortunately, I can't assume that the way I use the web is typical, so I need other people out in the world to surf the web and send me the logs of the data they viewed. I will perform analysis on the data, which *probably* won't require me to look at the data. (I can tell you more about my research if you are interested, but I don't want to post about it here.)

Does this sound like a privacy risk? It is. But, it's not as bad as it sounds. (If I thought I was putting you at risk, I wouldn't ask.) I need the full contents of the data you view through the browser, but I don't need or want you to do anything sensitive like do your online banking, email, Facebook, or researching that mole on your arm on WebMD.

Chances are, there's a lot of other stuff you do online that isn't normally encrypted and doesn't fall into the category of "highly personal" (unless you think everything you do is highly personal). That's exactly the kind of data I want. Still, there is a risk in saving your web data and sending it to me, since there could always be some unexpectedly private information in there. This SHOULD make you think twice... so no hard feelings if you just don't want to do it.

If you're still willing, I promise you that I will make every effort to keep the contents of your data private. In *most* cases I should not even need to examine the contents of the data, no one else will be given copies of it, and I will destroy it after my experiments. My analysis is mostly statistical and numerical in nature -- sizes of files, how long they would take to transfer over a network, etc. NO names of websites, IP addresses, content or anything like that will be revealed in my papers or graphs. There should be no way to personally identify that you contributed to my research.

Still willing to help?

Here's the process I would need you to go through. These instructions are for Firefox only:

1. Quit all open Firefox windows, and restart it. Open your preferences. Under Advanced | Network, clear your "web content cache".

2. Next, we need to disable encryption, because I can't analyze the properties of data if it is encrypted. (Don't worry, we'll arrange a way for you to transfer me the data in an encrypted fashion.)

In Preferences | Advanced, under Encryption, *uncheck* "Use SSL 3.0" and "Use TLS 1.0". Yes, again, I am asking you to DISABLE YOUR ENCRYPTION. This is a necessary step, but I will personally remind you to re-enable encryption when you send me your data.

3. Next, I need you to disable compression. In the "address bar" of Firefox, enter "about:config" and hit enter. It will (probably) print a warning about how this could ruin your browser, but what we are doing is safe and reversible. In the search window, type "encoding", which will limit the configuration lines to those that match.

Right-click on the line for the preference "network.http.accept-encoding". Look for the an option "Reset", which should be grayed out if you have not changed the setting. If it is NOT grayed out, then you have changed the setting. If it does not exist, you are using an old version of Firefox. Either way, write down the setting for that preference (the default is probably: "gzip, deflate").

Double-click on the entry, erase the text, and confirm the change.

4. Now, start Wireshark (or Ethereal if you are using that). On the upper left side of the corner (probably under the "Edit" menu) there is an icon of a network card with a wrench on it. Click it. This will configure the capture settings. We need to do four thigns: a) choose the interface, b) limit the capture to web traffic, c) limit the capture size, and d) start the capture. Here's how to do that:

a) Select the network card you are using -- probably eth0 if you are using a wired connection and wlan0 if you are using wireless.

b) Then, in the Capture Filter field, type "tcp port 80". This will limit the captured traffic to web data only.

c) Next, on the bottom left, check the box to "Stop capture ... after" a certain number of megabytes. I would prefer 25 or 50 megabytes of data.

d) Finally, click Start. Wireshark will begin logging web traffic.

5. Start surfing! Don't do anything sensitive like online banking or anything personal, but otherwise please just use the web as you normally do. You may notice that you can't access certain sites like Gmail or Facebook with SSL disabled. That's OK. As Han said to Chewie, "Fly Casual".

6. Wireshark will stop capturing data automatically, but you may want to periodically look at it to see if it is still capturing data.

Once the data has hit the limit and stopped (you can see that the packet count at the bottom of the application will stop growing when you load new pages), save the data.

To do this, click File | Save As, and save the file as "whatever_you_want.pcap". Then, contact me at and I will give you instructions for delivering me the data via SSH or your delivery method of choice.

7. FINALLY, remember to turn encryption and compression back on:

a) Preferences | Advanced | Encryption -- enable SSL and TLS

b) Enter "about:config" in the address bar, search for encoding, and right click on "network.http.accept-encoding" and select "Reset" -- or double click and enter whatever you wrote down previously.

Thank you! Your help will hopefully help make computers more efficient someday, and it will definitely make a tangible contribution towards me graduating!

Update: If you want to use tcpdump instead of Wireshark, here's a script to automatically stop capturing after a certain size:


rm $FILE
touch $FILE

sudo tcpdump -i $ETH -s0 -w $FILE -Z $USER tcp port 80 & 

trap '{ echo "Quitting with ctrl-C"; kill $PID; exit 1; }' INT 

while ps -p $PID &> /dev/null
    if (( $(stat -c%s $FILE ) > ( $MB * 1048576) ))
        then kill $PID
        sleep 5


Unless otherwise noted, all content licensed by Peter A. H. Peterson
under a Creative Commons License.