Tuesday, June 26, 2007

The Day from Hell

I know it is cliche, but Mondays can really be hell sometimes. On Sunday, I received an email from the mail/web server at work, notifying me that one of the hard drives in the RAID-1 array (mirrored) had failed. So, I readied myself for some server work on Monday.

I pulled out the faulty HD, and replaced it with a backup drive. I then start up the server, and watch the handy messages go by, until the system freezes at the partition check. Oh... no... Try a few more ideas, and still no go. Both hard drives died at the same time? Come on!

Time for the backup plan. I had prepared a Debian Etch server just for this event, and it was time to press it into service. I got it up and running, accepting email and serving web pages. Unfortunately, the server was not serving its role as a gateway to the regular desktop systems behind it. Frak! Something was (not) going on with the ip masquerading functionality. I installed the ipmasq package, but all packets were being dropped by the server, so no outgoing connections were possible. The server could access both subnets, but no joy for the desktops.

Damn, damn, damn! Now what? 6 hours later, and I am now seriously screwed sideways and upside down. I am sweating profusely, and am almost ready to give up. I call a local "Debian" consultant, but it turns out he's not real familiar with iptables/nat. What to do... what to do?

I finally get the idea to try the hard drive from the additional server in another box, as I kept coming back to the fact that having two hard drives fail simultaneously seemed a bit suspect. I powered up the box, and voila! She booted, and everything looked good! I moved the NIC cards over from the old box, placed the spare hard drive in her, and fired it up. After partitioning the new drive and marking said partitions for linux raid auto-detect, I was able to add the new disk partitions to the existing raid array, and everything seems good so far.

Needless to say, I learned a few lessons:

  1. My backups of the mail from the mail server worked well. However, there was additional information that should have been backed up but wasn't. This has been remedied.
  2. Even if you go to the trouble of making a backup server, you need to test it in exactly the same kind of conditions in which you need it to function. I tested out the services and such, but not the iptables/nat requirement (thinking it would be a breeze - ha!)
  3. I need to find a way to move my OS from one set of hardware to another. I'd rather avoid the reinstall and subsequent system build up if at all possible, but this might be too much to ask. If you have any ideas, lemme know!
  4. I *really* need to buy some new server hardware.
So, things are bck to normal now. I figure I lost at least a year of my life in those eight hours, and it wasn't until T+24 hours that I actually started feeling normal again. Maybe I should look into Google hosting our email...

Friday, June 08, 2007

An Amazing Overview of the WWII Eastern Front

If you are interested in the Eastern Front of World War II, or you are a fan of military history, then do yourself a favor and visit POBEDITELI - Soldiers of the Great War. It is a flash application which takes the viewer from 1941 through the end of the war in Europe in 1945, providing information on battles, historical notes, video clips, and audio (subtitled in English) from those who lived through that time period.

It is very thorough, and extremely interesting. Make sure to click on the boxes with red/green arrows for further detail. You'll need a bit of time to explore it thoroughly, so plan accordingly!

Wednesday, June 06, 2007

Odd GCC Optimization Issue

I am working on a modification to some C code for a customer, and stumbled upon an odd problem with the GNU C Compiler (gcc version 4.0.3-1ubuntu5). The code computes payments and amortization schedules for requested loans, and returns the information in data structures.

I noted the problem when writing a payment stream generator (e.g. "x payments of $a", "y payments of $b", etc.). With optimizations turned off, the payment stream was being output correctly for the loan in questions (2 payments of $47.26 followed by 4 payments of $2530.39).

However, once I turned on optimization (-O2), the payment stream gave me a hiccup and reported: 1 payment of $47.26, 1 payment of $47.26, 4 payments of $2530.39.

Hmmm - odd. Printing out the first two payments to 15 decimal places revealed an extremely slight difference (47.259999999999998 vs. 47.260000000000005), even though the two values are computed using the same factors, rounded in the same manner, etc.

I wish I could reduce the code to something simple for demonstration purposes, but the code is extremely complex and has many dependencies on other units, so it would not be a trivial matter to say the least.
For now, I have replaced the -O2 optimization flag with the following: "-finline-functions -floop-optimize -falign-functions -falign-loops -funroll-loops". With those optimization options turned on, the results are as expected.

I just checked to see if the same issue occurred with an older version of gcc (version 3.3.5 Debian Sarge) and the same problem does appear. Odd.