Came into work to find that about half my machine were down. Mostly just switched off, but some in indeterminate stages of “not working right”. Powered-on all the machines that were obviously down and re-booted the others (I wouldn’t normally do that but with so many machines down I had to remove the easy targets, not to mention the fact that console management in the machine room is woefully inadequate for anything more than brute-force diagnostics.)
So at this point it was becoming fairly clear that we had had a little incident with the power supply. While waiting to see what would come back up I tried to log into my mail and found that down also. Clearly whatever interrupted our supply hit the whole building. A look through the logs of those machines that were up shows that everything re-booted on Saturday morning. Definitely a power problem.
Further prodding of those services that refuse to come up reveal two culprits: bouscat (the server that world+dog get dumped on) has let the magic smoke out, it can probably be frankensteined. More seriously the two Sun V880 servers that run one of our main data-stores are refusing to boot. This elicits much running about in search of the correct serial cable (see previous post in which I clearly caught the attention of the fuckup fairy).
After finding a cable that would give me console access I discover that the root filesystem is failing fsck and mounting ro. This apparently is enough to completely hose a Solaris8 box. Oh Joy. After consulting with #solaris I can poke OpenBoot in the requisite manner to boot in single-user mode. This gets me precisely no-where. Even in single-user mode it refuse to get me to a shell where I could do anything useful. Tonight I shall mostly be reading the Solaris documentation to try to work out how to get the box to boot from DVD so that I have some hope of fixing it from the rescue system.
Current Mood: Maybe London Wasn’t So Bad