Ranger

http://blogs.sun.com/jonasdias/entry/tacc_ranger_tour

A nice video of the new Ranger cluster at Texas ACC. Uses the same APC cooling system as the new Cardiff machine. Hopefully in the not too distant future I’ll be able to post some pretty pics of our new cluster.

Incidentally, Bristol just announced start of service on their new machine. It also uses the APC cooling system. Can you see a pattern developing?

Moving ZFS filesystems between pools

When I originally set up the ZFS on my development v880 I added the internal disks as a raidz together with two volumes off the external fibre-channel array. As is the way with these things the development box has gradually become a production box. And I now realise that if the server goes pop I can’t just move the fibre-channel to another server because the ZFS pool contains that set of internal scsi disks.

To my horror I now discover that you can’t remove a top-level device (vdev in ZFS parlance) from a pool. Fortunately I have two spare volumes on the array so I can create a new pool and transfer the existing zfs filesystems to it. Here is a quick recipe for transferring zfs filesystems whilst keeping downtime to a minimum.

zfs snapshot oldpool/myfilesystem@snapshot1

zfs send oldpool/myfilesystem@snapshot1 | zfs receive newpool/myfilesystem

this will take a while but the filesystem can stay in use while you are doing it. Once this finishes you need to shut down any services that are relying on the filesystem and unmount it.

zfs unmount oldpool/myfilesystem

And take a new snapshot.

zfs snapshot oldpool/myfilesystem@snapshot2

you can now do an incremental send of the difference between the two snapshots which should be very quick.

zfs send -i oldpool/myfilesystem@snapshot1 \
             oldpool/myfilesystem@snapshot2 | zfs receive newpool/myfilesystem

Now you can point the services at the new filesystem and start over until all the filesystems on the original pool have been transferred.

Sun x4600

I’ve been running some tests at work on a shiny new Sun x4600 with 8 dual-core Opteron processors.
x4600 booting up

It’s very nicely put together.
x4600 with top removed

So far benchmarks have ranged between “That’s really quite fast” and “Is it powered by Hamsters?”.

More pics here

new server

The newest addition to the racks at WeSC is a Dell 3250. Two Itanium processors, redundant power supplies and a proper lights-out management card.

Total Cost?

Less than 500 quid from ebay.

One of those days

Due to a scarcity of meetings (a rare thing these days) I thought I might actually be able to get some work done today. What I actually ended up doing was

  1. trying to work out why my Solaris 10 V880 decided to reboot
  2. cobbling together enough spare parts to build a workstation after mine went pop.

On the plus side Fedora 6 runs surprisingly well on a Pentium III with 256MB of RAM. Getting the BIOS to believe that it really did have a 120GB hard drive took a bit of work though.

Here’s hoping for a better day tommorrow.

Big Disks?

We’re probably going to need a large amount of disk space shortly. It’s basically somewhere to back things up so it doesn’t need to be terribly fast. I’ve been having a look around and I’ve come up with two possibilities.

Sun X4500

  • 24TB SATA
  • 4U
  • Software RAID (ZFS)
  • well engineered
  • Sun support
  • 20k with academic pricing

DNUK Teravault

  • 27TB SATA
  • 6U
  • Hardware RAID (Areca 110)
  • DNUK rails are usually horrid
  • 13k full price

The x4500 is smaller and I know it will be less hassle to physically install. But the DNUK box is a lot cheaper and has more storage. From looking at the hardware specs I think that the x4500 is the superior product but I’ve no reason to believe the Teravault won’t get the job done.

If anyone has had hands-on experience of either box I would really like to here about it.

It’s 2am: do you know where your ZFS pools are?

No, no I don’t.

wesc21-comsc# zpool list
no pools available
wesc21-comsc# df -h
Abort (core dumped)

Bollocks. I’m beginning to think this machine is cursed. The mounted ZFS filesystems are still there and apear to be functioning so I guess I can fix this tomorrow. I don’t really want to interrupt the MySQL database that’s indexing 40GB of data.

Strangely googling for “Where the hell did my ZFS go?” doesn’t return any usefull results.

MySQL Backup Script - Updated

I had to move data from a large MySQL database (where large == 8GB tables). So I updated my trusty mysql backup script to backup tables inividually. The creation of 20+ GB files always seems like a bad idea.

The updated script can be found here.

Solaris 10 - A Wailing and Gnashing of Teeth

I’ve been playing around with Solaris 10 on my development V880 at work and generally enjoying it. ZFS is a thing of beauty and Zones have turned out to be very helpful. It’s not unusual for a researcher to come along needing to set up gigabytes of database temporarily while they try out some things. Being able to throw a zone up and give them full access to it without worrying about them messing up your server is very handy.

Of course the patching system on Solaris leaves something to be desired. The pretty GUI tools that Sun reccomends don’t work if you have zones running on your machine. So I’ve resorted to using PCA which works but is horribly slow.

Anyway on thursday I get a call from Information Services telling me they have been informed by JANET that hostx is aggressively scanning the network. So they have disconnected the network port it lives on. hostx happens to be a Solaris 10 zone on the afore-mentioned V880. So I wander into the machine room log into the console and halt the zone. A quick look through the filesystem reveals the traces of the Solaris Telnet Worm. Great. In six years the only machine I’ve had hacked.

Eventually I work out what happened. When the advisory about the telnet vulnerability came out I disabled telnet on the Solaris 10 machines and voiced my displeasure that telnet was enabled by default. However for reasons that I can’t explain I didn’t actually patch the machine.

A month later I set up a new zone for one of the researchers. And it turns on telnet again. Because that’s the sensible thing to do. Zone gets rooted, I look like a moron.

I’m going to go away and put PCA into a cron job like I should have done originally.

Solaris 10 and Telnet

Yes it’s still turned on by default at least in Solaris 10 update 2. That’ll teach me not to portscan new *nix installs where I’m not completely familiar with the OS defaults.
Seriously though, is there a good reason for telnet to be on by default in this day and age?

Next Page »