How to get Ubuntu Live running

This article described how to boot Ubuntu from a bootable CD without installing anything on the harddrive. It is meant for the many users of my script that described how to adjust the fan thresholds of a Dell PowerEdge.

Step 1
Download Ubuntu and burn it on a CD. The latest version with 32-bit will do fine.

Step 2
Boot from the CD and select “Try Ubuntu” (don’t install for obvious reasons). This allows you to try Ubuntu without installing anything. You can find more detailed instructions on the download page.

Ubuntu Search Menu

Step 3
After Ubuntu booted, click on the Ubuntu button on the top left. A search menu as seen on the right should open.

Step 4
Type “terminal” into the search box. Ubuntu should give you an icon for the corresponding application. Click on it to start the terminal.

Screenshot of the terminal


Step 5
You should now have a console prompt open. Type the following into the prompt to install the dependencies required for the script:

sudo apt-get install freeipmi freeipmi-tools openipmi ipmitool python wget

This should install the software necessary to query your server using IPMI. IPMI is a protocol for server management. See this link for more information regarding IPMI on Ubuntu.

That’s it! You can continue here: How to adjust the fan thresholds of a Dell PowerEdge.

Converting a PowerEdge SCSI backplane to support SATA

This post describes how to build a 8 drive storage server for under a hundred bucks (drives not included).

Last summer I bought a Dell PowerEdge 2800, which I converted to use silent fans and SATA drives instead of SCSI drives. See the project page for this server for more information on how to hack the BMC’s firmware in order to swap the fans agains silent ones.

This post is the writeup of the conversion of the server’s backplane from SCSI to SATA. My main reason for this conversion was money: servers that offer a lot of swappable SATA drive slots are quite expensive. SCSI-based servers, on the other hand, are quite cheap – nobody uses plain SCSI anymore I guess. After searching a bit on the internet, I’ve bought a Dell PowerEdge 2800 that supports 8 SCSI drive slots for 25 bucks.

I gutted the SCSI backplane and replaced the connectors with SATA connectors. Adding in a 3Ware 9500S-12 PCI-X card, I’ve ended up with a raid system that supports 8 SATA drives. (Cold-swappable only, which is what I aimed for. This server is not a production server, obviously, but just our home storage).

In the following, I illustrate the steps how I converted the PowerEdge 2800 to SATA. It’s quite easy to do, so I hope I can inspire others. Apropos others: there is a similar project that was featured on Hackaday around the same time I’ve started with my project.

Assumptions:

  • you do know how to use a soldering iron and which end of a screwdriver is the front
  • you have a Dell PowerEdge generation 8 (or similar, this should also work with generation 7 and 9 servers as the 2900)
  • you have a SATA hardware raid card (I used a 3Ware 9500S-12, but any other will do)

Required material:

  • 90 degree angled all-in-one SATA connector (I used Delock type 84307)
  • cables for connecting the drive LEDs
  • connector(s) maching the pin headers of your raid controller
  • glue, solder, time

So, let’s start! Open up your server and remove the backplane. Obviously, you have to remove all drives before being able to detach the backplane. I say obviously after I ripped of the connector of my floppy drive by brute force – I removed the harddrives but forgot completely about the other drives. D’oh!

Step

I first thought about replacing the whole backplane, but after I bought the server I realized that all the front-panel buttons, the optical drive etc are connected to the backplane. The backplane is therefore essential to the server, and the server does not even boot without it. So we have to leave the backplane in, fair enough.

The first step is to remove the old SCSI connectors from the backplane. Remove first the protective plasic shield from the backplane. Keep it, we will stick it back on later! The SCSI connectors are 68-pin bastards, so unsoldering them is out of the question. In the picture you can see the first connector removed – I pulled of the plasic part with some heavy pliers and stripped the pins one by one with a cutter.

Step

Next, remove all connectors as shown in the picture. After removing the first ones by hand, I ended up using a Dremel to remove the pins. It does not have to be perfect, but pay attention to remove any short circuits. You do not want to disturb the underlying SCSI system that is still active in the server, or (even worse) short circuit the power planes!

Step

Next, we are going to solder the harddisk access LEDs. This step is optional, but as everybody loves flashing LEDs this will give your project some street cred. No, seriously, the LEDs help you to see access on the harddisks and (if your controller supports it) to identify a specific disk/controller port.

The PowerEdge features 2 LEDs for each drive, of which we will use only one (does not matter which one). First, identify the traces on the backside of the backlane for each of the LEDs. Mark them with a pen according to drive number and if it’s the anode and the cathode. Cut the traces leading to the controller of the backplane leaving enough copper for you to solder a cable on.

In the picture you can see how I did this for one of the ports: one cable connected to the anode of the LED, and the other to the resistor, which in turn is connected to the cathod of the LED. Don’t forget to include the limiting resistor into the mix, because else you might burn your LEDs!

Step

Here you can see how I connected all ports on one side of the backplane. Note that I soldered the cable on the back of the backplane (is that the frontplane?? ;) Remember to fix the cables in place with a little hot-glue. No hacking project can call itself a proper hacking project without a little hot-glue, right?

Step

Next, we prepare the connectors to the raid controller. Most raid controllers support pin headers for connecting the drive LEDs. You’ve checked that your’s does support this before starting to solder the cables to the LEDs, right? Along the same lines, you’ve made the cables long enough to go all the way to the raid controller, even when all the fans etc are in the system? Ok, good.

Step

Configure your connector according to your raid controllers pinout. In my case, the pinout was given in the manual of the raid controller: here’s the relevant page. Pay attention not to swap pins.

Step

Reattach the isolating protective plastic shield and put back the power cable. Your backplane should look like in the picture above. Note that I’ve cut away the two bars at the bottom of the backplane’s PCB (at the bottom of the two big cutouts, which were closed before). This allows me to insert the backplane easier with all the cables in place, but it is not really required. If you do this on another type of system, make sure that there are no traces running in this part of the PCB.

Step

Now we start doing the actual conversion to SATA. Mount all drives in their drive cages and stick them in your server. With the backplane still unmounted, you should be able to see the back of the drives. Here, I plugged in the angled all-in-one SATA connectors (I used Delock type 84307, which worked really good).

As the PowerEdge servers do not have many drive power connectors and we cannot reuse the power connectors of the backplane, we have to connect the power cables of the all-in-one connectors together. This also reduces the cable mess. In the picture you can see how I did it: add the connectors one by one: plug the all-in-one connector in the drive, cut away the power cable that is longer than the distance to the next drive, add the next connector and solder the two cables together. Repeat until you have the whole column completed. It’s hard to explain but easy enough to do, so have a look at the pictures.

Step

This is how the end result should look like. Remember to properly protect all solder joints by shrink-tubes or similar. Use zip-ties to clean things up. In the end you’ll have a single power connector for each column of drives.

Step

Next, we need to attach the SATA connectors to the back plane. The idea is that the connectors stay on the backplane when you pull a drive out – without this you would have to dismount the backplane each time you want to swap a drive. I thought about many different ways to do this, but in the end I chose to use a kind of construction glue to do it. There are for sure ways to do this in a cleaner way, but this serves the purpose and is quick and cheap.

The glue should have the following attributes:

  • stick on plastic
  • must remain (at least a bit) flexible
  • must be able to cover distances of a few millimeters
  • must not shrink upon drying (will pull the connector back)
  • must be viscose (not too liquid)

Apply the glue to the back of the SATA connectors as shown in the picture (the white stuff). Be sure to clean the connectors and the backplane from any fat residues before doing so. Additionally, push your drives as far as possible to the front (there’s usually a little play in the cage mounting mechanism, and you’ll want it to be maxed out).

Then, put the backplane in. Wait.

Step

This is how your server should now look at the interior. You can see the 8 SATA cables coming out of the cutouts at the bottom of the backplane, just along the two power connectors. Additionally, you can see the cables that will connect the drive LEDs to the RAID controller (the two gray cables).

Step

After you’ve waited long enough (see glue instructions), you can remove and re-insert your drives to check if everything is in order. This is how your empty drive slots should look like.

Step

Plug the power connectors and connect the SATA and LED cables to the RAID controller. Fire up your machine and check the drive status, and the drive LEDs. Remember that your drives are only cold-swappable, so you have to shut down your server before changing them.

Welcome to your new SATA-based storage server!

How to adjust the fan thresholds of a Dell PowerEdge

Adjusted lower critical thresholds for the fans of a PowerEdge 2800

Note: you MUST change your fans against slower, quieter ones to reduce the noise. The threshold adjustment discussed in this article only allows you to do so – without new fans, it’s useless!

Intro

In order to swap the fans on a Dell PowerEdge with slower, more quiet ones you have to adjust the lower critical threshold (LCR). If you don’t, the server’s firmware actually lowers the fan’s speed under it’s own LCR, panics, spins them back up a 100%, lowers them again etc. Very noisy, very annoying.

Previous, related posts:

This behavior is controlled by the BMC, an embedded management controller. You can configure many parameters of the BMC using the IPMI protocol. Unfortunately, the BMC’s firmware of a Dell PowerEdge does not allow to change the thresholds mentioned above. I contacted Dell support, and they refused to change the thresholds for such an old server.

So I had no choice but to change them myself. It took me quite a while to isolate the proper setting in the BMC’s firmware, the checksums etc. But I managed, and the server’s running now very quiet with adjusted thresholds.

Below, I explain how to adjust these thresholds with a python script I wrote. Note that you’ll need Python 2.6 in order to run the script. In case someone is interested I can also write up how I did it, but this is for another post.

Update: I created a project page for my server.

The result

First, here’s the result: my PowerEdge 2800 with swapped fans and patched fan thresholds.

This has been recorded with my laptop, 10cm/4in in front of the server. The system is now more silent than my desktop!

Prerequisites

I assume that you have a sufficiently recent Linux distribution up and running, with python installed and IPMI set up. If you don’t, have a look at this article that explains how to get a recent Ubuntu version running (without installing anything on your harddisk!).

Adjusting the fan thresholds

I assume that you have now FreeIPMI installed, the BMC configured and that you can query the BMC using IPMI.

  1. Query the sensors
    First, you have to query the sensors of your server using IPMI. The output should look a bit like this:

    you@server$ ipmi-sensors
    1: Temp (Temperature): NA (NA/125.00): [NA]
    2: Temp (Temperature): NA (NA/125.00): [NA]
    3: Ambient Temp (Temperature): NA (3.00/47.00): [NA]
    4: Planar Temp (Temperature): NA (3.00/72.00): [NA]
    5: Riser Temp (Temperature): NA (3.00/62.00): [NA]
    6: Temp (Temperature): NA (NA/NA): [NA]
    7: Temp (Temperature): NA (NA/NA): [NA]
    8: Temp (Temperature): 71.00 C (NA/125.00): [OK]
    9: Temp (Temperature): NA (NA/125.00): [NA]
    10: Ambient Temp (Temperature): 27.00 C (3.00/47.00): [OK]
    11: Planar Temp (Temperature): 46.00 C (3.00/72.00): [OK]
    12: Riser Temp (Temperature): 50.00 C (3.00/62.00): [OK]
    13: Temp (Temperature): NA (NA/NA): [NA]
    14: Temp (Temperature): NA (NA/NA): [NA]
    15: CMOS Battery (Voltage): NA (2.64/NA): [NA]
    16: ROMB Battery (Voltage): [NA]
    17: VCORE (Voltage): [State Deasserted]
    18: VCORE (Voltage): [NA]
    19: PROC VTT (Voltage): [State Deasserted]
    20: 1.5V PG (Voltage): [State Deasserted]
    21: 1.8V PG (Voltage): [State Deasserted]
    22: 3.3V PG (Voltage): [State Deasserted]
    23: 5V PG (Voltage): [State Deasserted]
    24: 5V Riser PG (Voltage): [State Deasserted]
    25: Riser PG (Voltage): [State Deasserted]
    26: CMOS Battery (Voltage): 3.11 V (2.64/NA): [OK]
    27: Presence  (Entity Presence): [Entity Present]
    28: Presence  (Entity Presence): [Entity Absent]
    29: Presence  (Entity Presence): [Entity Present]
    30: Presence  (Entity Presence): [Entity Absent]
    31: ROMB Presence (Entity Presence): [Entity Present]
    32: FAN 1 RPM (Fan): NA (1575.00/NA): [NA]
    33: FAN 2 RPM (Fan): NA (1575.00/NA): [NA]
    34: FAN 3 RPM (Fan): NA (1575.00/NA): [NA]
    35: FAN 4 RPM (Fan): NA (1575.00/NA): [NA]
    36: FAN 5 RPM (Fan): NA (1575.00/NA): [NA]
    37: FAN 6 RPM (Fan): NA (1575.00/NA): [NA]
    38: FAN 1 RPM (Fan): NA (2025.00/NA): [NA]
    39: FAN 2 RPM (Fan): NA (2025.00/NA): [NA]
    40: FAN 3 RPM (Fan): 4875.00 RPM (2025.00/NA): [OK]
    41: FAN 4 RPM (Fan): 4800.00 RPM (2025.00/NA): [OK]
    42: FAN 5 RPM (Fan): 1800.00 RPM (900.00/NA): [OK]
    43: FAN 6 RPM (Fan): 1950.00 RPM (900.00/NA): [OK]
    44: FAN 7 RPM (Fan): 1875.00 RPM (900.00/NA): [OK]
    45: FAN 8 RPM (Fan): 1875.00 RPM (900.00/NA): [OK]
    46: Status  (Processor): [Processor Presence detected]
    47: Status  (Processor): [NA]
    48: Status  (Power Supply): [Presence detected]
    49: Status  (Power Supply): [NA]
    50: VRM  (Power Supply): [Presence detected]
    51: VRM  (Power Supply): [Presence detected]
    52: OS Watchdog (Watchdog 2): [OK]
    53: SEL (Event Logging Disabled): [Unknown]
    54: Intrusion (Physical Security): [OK]
    55: PS Redundancy (Power Supply): [NA]
    56: Fan Redundancy (Fan): [Fully Redundant]
    73: SCSI Connector A (Cable/Interconnect): [NA]
    74: SCSI Connector B (Cable/Interconnect): [NA]
    75: SCSI Connector A (Cable/Interconnect): [NA]
    76: Drive (Slot/Connector): [NA]
    77: Drive (Slot/Connector): [NA]
    78: 1x2 Drive (Slot/Connector): [NA]
    79: Secondary (Module/Board): [NA]
    80: ECC Corr Err (Memory): [Unknown]
    81: ECC Uncorr Err (Memory): [Unknown]
    82: I/O Channel Chk (Critical Interrupt): [Unknown]
    83: PCI Parity Err (Critical Interrupt): [Unknown]
    84: PCI System Err (Critical Interrupt): [Unknown]
    85: SBE Log Disabled (Event Logging Disabled): [Unknown]
    86: Logging Disabled (Event Logging Disabled): [Unknown]
    87: Unknown (System Event): [Unknown]
    88: CPU Protocol Err (Processor): [Unknown]
    89: CPU Bus PERR (Processor): [Unknown]
    90: CPU Init Err (Processor): [Unknown]
    91: CPU Machine Chk (Processor): [Unknown]
    92: Memory Spared (Memory): [Unknown]
    93: Memory Mirrored (Memory): [Unknown]
    94: Memory RAID (Memory): [Unknown]
    95: Memory Added (Memory): [Unknown]
    96: Memory Removed (Memory): [Unknown]
    97: PCIE Fatal Err (Critical Interrupt): [Unknown]
    98: Chipset Err (Critical Interrupt): [Unknown]
    99: Err Reg Pointer (OEM Reserved): [Unknown]

    You have to note the part about the fans (d’oh). Record sensor numbers, fan names and thresholds (the value in brackets). You’ll need it later to identify your system.

  2. Download the latest BMC firmware
    Got to http://support.dell.com/support/downloads/ and get the latest BMC firmware for your system. Select any Linux OS; the BMC firmware should be listed under something like Embedded Server Management. On the download page, select the .BIN package. In my case the file was called BMC_FRMW_LX_R223079.BIN. Download it!

  3. Fix and extract .BIN package
    In my case the .BIN package did not properly work. I had to fix it first, and then extract it. For this, open a terminal and go to the folder you’ve downloaded the package to.

    Then execute:

    you@server$ sed -i 's/#!\/bin\/sh/#!\/bin\/bash/' BMC_FRMW_LX_R223079.BIN  # fix interpreter bug
    you@server$ chmod 755 BMC_FRMW_LX_R223079.BIN                              # make executable
    you@server$ sudo mkdir bmc_firmware                                        # create dir as root
    you@server$ sudo ./BMC_FRMW_LX_R223079.BIN --extract bmc_firmware          # yes, you have to do this as root! :(
    you@server$ cd bmc_firmware

    This should extract your firmware. Check that you have a file called extracted/payload/bmcflsh.dat. If not, game over, your system isn’t compatible. If yes, yay!

  4. Patch firmware
    Next, download the program I wrote for patching the firmware. Then, use the program on the firmware as shown below:

    you@server$ wget https://raw.github.com/arnuschky/dell-bmc-firmware/master/adjust-fan-thresholds/dell-adjust-fan-thresholds.py
    you@server$ chmod 755 dell-adjust-fan-thresholds.py
    you@server$ ./dell-adjust-fan-thresholds.py payload/bmcflsh.dat

    The program is a python (version >= 2.6) script, that first lets you choose a system from the ones available in the firmware and the adjust the fan thresholds of this system. Yes, there can be support for multiple systems in a single firmware. You recorded the fan values before? Now you know why: you have to use it to identify your system from the ones the script shows to you. Just use the number of fans, their names and thresholds to identify your system. Maybe you’re lucky and the system name has already been found and is directly displayed.

    In the next step you can select fans and change their threshold. Just remember that the result is a multiple of 75. Half the usual speed has proven to be a good value. I’ve never tested what happened if you set it to 0, but this would be quite stupid as you can’t detect broken fans.

    If the program display a code at the end and asks you to report back, please do so! That way we can identify the other systems using their code (for example, the code of a PowerEdge 2800 is “K_C”).

  5. Flash firmware
    Finally, flash the firmware like as shown below.

    Disclaimer: I am not responsible for any damage you do to your system! If you flash this firmware, you might render your PowerEdge server unusable. It might even be unrecoverable. Additionally, badly set thresholds might cause overheating.

    Additionally, use the usual caution when flashing (do not interrrupt power, do not flash other a network link, do not be stupid).

    you@server$ LD_LIBRARY_PATH=./hapi/opt/dell/dup/lib:$LD_LIBRARY_PATH ./bmcfl32l -i=payload/bmcflsh.dat -f

    Cross your fingers. The flasher should accept the firmware. If not and it complains about the CRC, something went wrong. Don’t worry if the fans speed up fully and go dead afterwards during the flash, that’s normal. The system should stabilize afterwards. There is not need to reboot.

  6. Check the sensors
    Check that everything is in order:

    you@server$ ipmi-sensors

    That’s it. Enjoy your silent PowerEdge!

Trivia

Some things that I learned while messing with the firmware:

  • There can be multiple systems per firmware
  • Generally it’s quite well engineered
  • I’ve found Dell’s default password root/calvin. What is the 444444 for?
  • Dell server systems seem to be named internally after cities. BER, LOND, OSLO etc are easy enough to guess. But what the hell is K_C??? (my system)
  • The firmware package is probably the most horrible over-engineered script I’ve ever met on Linux
  • Dell uses CRC-16 for checksum – two different algorithms in the same firmware!

Update 1: I created a project page for my server.

Update 2: I wrote this article that explains how to get a recent Ubuntu version running (without installing anything on your harddisk!). This is for all the Windows users out there!

Update 3: I moved the code of this project into a GitHub repository: http://projects.nuschkys.net/2012/04/06/how-to-get-ubuntu-live-running/ GitHub is great because people can easily collaborate, fork, submit issues and patches and so on.

Please don’t ask me basic Linux questions! Google is your friend. If you don’t know what you are doing, you shouldn’t be doing it as you might damage your server!

The battle againt the BMC – Part 2

Update 4/11/2011: I managed to find out some more info about the packaging scheme Dell uses for their BMC firmware files. I deciphered most of the container format. I am in the process of testing modifications right now, but for the moment I updated the version of the tool below with a new version. You can also download the program directly here: dell-extract-bmc-firmware.tar.gz.

Firmware header

Deciphering the firmware header.

As mentioned earlier, I started to look into hacking the BMC firmware in order to solve my problem of the hard-coded failure thresholds of my PowerEdge 2800.

I had a look into the firmware flash file, and noticed that it seems to consist of several files (as usual for BIOS/firmwares). As this might increase my chances not to brick my BMC, I decided to I separate the individual files for starters. I couldn’t find a program that does that (the firmware tools of the Dell linux community are closed-source, unfortunately), so I grabbed a hex-editor and deciphered (more or less) the firmware’s header. Here’s the corresponding C program:

// vim: ts=4 ai noexpandtab nopaste
/**
 * This program can extract and check the different files contained in a firmware file
 * for a Dell PowerEdge BMC.
 */

#include <stdlib.h>
#include <stdio.h>
#include <stdint.h>
#include <string.h>
#include <unistd.h>
#include <arpa/inet.h>

typedef struct
{
    uint8_t     hex02;
    uint8_t     numBlocks;  // number of subfiles in system
    uint32_t    filesize;
    uint16_t    zero;
    char        dellHeaderStr[9];
} header_t;

typedef struct
{
    uint8_t     zero1;
    uint8_t     type;       // 0x000b -> SD_${system}.FLC
    uint8_t     zero2;
    uint8_t     system;     // 0, 1, 2
    uint8_t     zeros[3];
    uint16_t    unknownFixedData;
    uint16_t    crc16;
    uint32_t    length;
    uint32_t    offset;
    char        filename[32];
} flc_block_t;
// 4x1+3x1+2+2+4+4+32=51

uint16_t endian_swap16(uint16_t x)
{
    return (x>>8) |
           (x<<8);
}

uint32_t endian_swap32(uint32_t x)
{
    return (x>>24) |
            ((x<<8) & 0x00FF0000) |
            ((x>>8) & 0x0000FF00) |
            (x<<24);
}

/** CRC table for the CRC-16. The poly is 0x8005 (x^16 + x^15 + x^2 + 1) */
uint16_t const crc16_table[256] = {
        0x0000, 0xC0C1, 0xC181, 0x0140, 0xC301, 0x03C0, 0x0280, 0xC241,
        0xC601, 0x06C0, 0x0780, 0xC741, 0x0500, 0xC5C1, 0xC481, 0x0440,
        0xCC01, 0x0CC0, 0x0D80, 0xCD41, 0x0F00, 0xCFC1, 0xCE81, 0x0E40,
        0x0A00, 0xCAC1, 0xCB81, 0x0B40, 0xC901, 0x09C0, 0x0880, 0xC841,
        0xD801, 0x18C0, 0x1980, 0xD941, 0x1B00, 0xDBC1, 0xDA81, 0x1A40,
        0x1E00, 0xDEC1, 0xDF81, 0x1F40, 0xDD01, 0x1DC0, 0x1C80, 0xDC41,
        0x1400, 0xD4C1, 0xD581, 0x1540, 0xD701, 0x17C0, 0x1680, 0xD641,
        0xD201, 0x12C0, 0x1380, 0xD341, 0x1100, 0xD1C1, 0xD081, 0x1040,
        0xF001, 0x30C0, 0x3180, 0xF141, 0x3300, 0xF3C1, 0xF281, 0x3240,
        0x3600, 0xF6C1, 0xF781, 0x3740, 0xF501, 0x35C0, 0x3480, 0xF441,
        0x3C00, 0xFCC1, 0xFD81, 0x3D40, 0xFF01, 0x3FC0, 0x3E80, 0xFE41,
        0xFA01, 0x3AC0, 0x3B80, 0xFB41, 0x3900, 0xF9C1, 0xF881, 0x3840,
        0x2800, 0xE8C1, 0xE981, 0x2940, 0xEB01, 0x2BC0, 0x2A80, 0xEA41,
        0xEE01, 0x2EC0, 0x2F80, 0xEF41, 0x2D00, 0xEDC1, 0xEC81, 0x2C40,
        0xE401, 0x24C0, 0x2580, 0xE541, 0x2700, 0xE7C1, 0xE681, 0x2640,
        0x2200, 0xE2C1, 0xE381, 0x2340, 0xE101, 0x21C0, 0x2080, 0xE041,
        0xA001, 0x60C0, 0x6180, 0xA141, 0x6300, 0xA3C1, 0xA281, 0x6240,
        0x6600, 0xA6C1, 0xA781, 0x6740, 0xA501, 0x65C0, 0x6480, 0xA441,
        0x6C00, 0xACC1, 0xAD81, 0x6D40, 0xAF01, 0x6FC0, 0x6E80, 0xAE41,
        0xAA01, 0x6AC0, 0x6B80, 0xAB41, 0x6900, 0xA9C1, 0xA881, 0x6840,
        0x7800, 0xB8C1, 0xB981, 0x7940, 0xBB01, 0x7BC0, 0x7A80, 0xBA41,
        0xBE01, 0x7EC0, 0x7F80, 0xBF41, 0x7D00, 0xBDC1, 0xBC81, 0x7C40,
        0xB401, 0x74C0, 0x7580, 0xB541, 0x7700, 0xB7C1, 0xB681, 0x7640,
        0x7200, 0xB2C1, 0xB381, 0x7340, 0xB101, 0x71C0, 0x7080, 0xB041,
        0x5000, 0x90C1, 0x9181, 0x5140, 0x9301, 0x53C0, 0x5280, 0x9241,
        0x9601, 0x56C0, 0x5780, 0x9741, 0x5500, 0x95C1, 0x9481, 0x5440,
        0x9C01, 0x5CC0, 0x5D80, 0x9D41, 0x5F00, 0x9FC1, 0x9E81, 0x5E40,
        0x5A00, 0x9AC1, 0x9B81, 0x5B40, 0x9901, 0x59C0, 0x5880, 0x9841,
        0x8801, 0x48C0, 0x4980, 0x8941, 0x4B00, 0x8BC1, 0x8A81, 0x4A40,
        0x4E00, 0x8EC1, 0x8F81, 0x4F40, 0x8D01, 0x4DC0, 0x4C80, 0x8C41,
        0x4400, 0x84C1, 0x8581, 0x4540, 0x8701, 0x47C0, 0x4680, 0x8641,
        0x8201, 0x42C0, 0x4380, 0x8341, 0x4100, 0x81C1, 0x8081, 0x4040
};

static inline uint16_t crc16_byte(uint16_t crc, const uint8_t data)
{
    return (crc >> 8) ^ crc16_table[(crc ^ data) & 0xff];
}

uint16_t calccrc16(uint8_t const *buffer, size_t len)
{
    uint16_t crc = 0x0000;

    while (len--)
        crc = crc16_byte(crc, *buffer++);
    return crc;
}

int main(int argc, char *argv[])
{
    if (argc != 2)
    {
        fprintf(stderr, "Usage: %s <firmware>\n", argv[0]);
        exit(1);
    }
    FILE* flashFile = fopen(argv[1], "r");

    // get filesize
    fseek(flashFile, 0, SEEK_END);
    uint32_t filesize = ftell(flashFile);
    fseek(flashFile, 0, SEEK_SET);

    // read the header
    header_t header;
    if (fread(&header, sizeof(header_t), 1, flashFile) == 0)
    {
        fprintf(stderr, "Error: Can't read header.\n");
        exit(1);
    }

    // check that it's a valid header as far as we know
    if (header.hex02 != 0x02 ||
        header.zero != 0 ||
        filesize != header.filesize ||
        strncmp(header.dellHeaderStr, "DELL_INC", 8) != 0)
    {
        fprintf(stderr, "Error: Header not valid.\n");
        exit(1);
    }

    // calculate header crc
    fseek(flashFile, 0, 0);
    uint16_t totalHeaderSize = sizeof(flc_block_t) * header.numBlocks + sizeof(header_t);
    uint8_t headerBuf[totalHeaderSize];
    fread(&headerBuf, totalHeaderSize, 1, flashFile);
    uint16_t headerCRC16 = calccrc16(headerBuf, totalHeaderSize);

    // calculate total file crc
    fseek(flashFile, 0, 0);
    uint8_t fileBuf[header.filesize-2];
    fread(&fileBuf, header.filesize-2, 1, flashFile);
    uint16_t fileCRC16 = calccrc16(fileBuf, header.filesize-2);
    uint16_t fileCRC16Dell;
    fread(&fileCRC16Dell, 2, 1, flashFile);
    printf("\n\n");
    printf("Valid Dell PowerEdge BMC firmware header found:\n\n");
    printf("  - number of blocks : %d\n",   header.numBlocks);
    printf("  - oemstr (fixed)   : %s\n",   header.dellHeaderStr);
    printf("  - total file size  : %d\n",   header.filesize);
    printf("  - total header size: %d\n",   totalHeaderSize);
    printf("  - header CRC16     : 0x%04x\n",   headerCRC16);
    printf("  - total file CRC16 : 0x%04x\n\n", fileCRC16);
    if (fileCRC16 == fileCRC16Dell)
       printf("  * CRC16 check OK\n");
    else
       printf("  * CRC16 check FAILED, actual CRC16 is 0x%04x instead of 0x%04x\n", fileCRC16, fileCRC16Dell);

    printf("\n\n");

    // read all blocks
    fseek(flashFile, sizeof(header_t), 0);
    flc_block_t flcBlock[header.numBlocks];
    fread(&flcBlock, sizeof(flc_block_t), header.numBlocks, flashFile);

    uint8_t i;
    for (i = 0; i < header.numBlocks; i++)
    {
        // check if our understanding of format is correct
        if (flcBlock[i].zero1 != 0 || flcBlock[i].zero2 != 0 || flcBlock[i].zeros[0] != 0 ||
            flcBlock[i].zeros[1] != 0 || flcBlock[i].zeros[2] != 0)
        {
            fprintf(stderr, "Error: Block %d not valid.\n", i);
            exit(1);
        }

        printf("Block %d:\n\n", i);
        printf("  - type     : %d/0x%02x (defines block type, 0x0b is sensor data table)\n", flcBlock[i].type, flcBlock[i].type);
        printf("  - system # : %d/0x%02x (running number for systems in this firmware file)\n", flcBlock[i].system, flcBlock[i].system);
        printf("  - unknown  : %d/0x%04x (always same for all blocks in a single firmware file)\n", flcBlock[i].unknownFixedData, flcBlock[i].unknownFixedData);
        printf("  - offset   : %d\n", flcBlock[i].offset);
        printf("  - length   : %d\n", flcBlock[i].length);
        printf("  - filename : %s\n\n", flcBlock[i].filename);

        // extract the block according to the offset and length given in the block desc.
        printf("  * extracting block...");
        char* blockData = (char*) malloc(flcBlock[i].length);

        fseek(flashFile, flcBlock[i].offset, 0);
        fread(blockData, flcBlock[i].length, 1, flashFile);

        FILE* blockFile = fopen(flcBlock[i].filename, "w");
        fwrite(blockData, flcBlock[i].length, 1, blockFile);
        uint16_t blockCRC16 = calccrc16(blockData, flcBlock[i].length);
        fclose(blockFile);
        free(blockData);
        printf("done.\n");

        if (blockCRC16 == flcBlock[i].crc16)
          printf("  * CRC16 check OK\n");
        else
          printf("  * CRC16 check FAILED, actual CRC16 is 0x%04x instead of 0x%04x\n", blockCRC16, flcBlock[i].crc16);

        printf("\n\n");
    }

    fclose(flashFile);
    exit(0);
}

The names of the individual files are listed below. They are organized in blocks (that’s what I call them), and apparently by function. Get the latest BMC firmware (30/6/2009, v1.83, A10) and apply my program to retrieve the individual files.

  • block 0 (code, big files):
    • BB.FLC
    • OB.FLC
    • ID.FLC
    • OEM_DEF.FLC
  • block 1 (*_BB files):
    • SD_BB.FLC
    • FI_BB.FLC
    • TOC_BB.FLC
    • IO_BB.FLC
    • IS_BB.FLC
    • OEM_BB.FLC
  • block 2 (*_K_C files):
    • SD_K_C.FLC
    • FI_K_C.FLC
    • TOC_K_C.FLC
    • IO_K_C.FLC
    • IS_K_C.FLC
    • OEM_K_C.FLC

The BMC seems to be little-endian (makes only sense I guess). I’ve scanned the different files for appearances of the threshold values (900 and 2025/0x07e9 and 0×0384 in int and 0×6144000 in float). No avail. Darn. Either I am doing something wrong or the thesholds are not hard-coded in the firmware (I had my hopes up when I saw the OEM_DEF.FLC file, which actually contains the default BMC password and the like). Maybe the thresholds are stored in the configuration flash after all – only how can we access it?

Update

I finally managed to adjust the critical fan thresholds by patching the BMC firmware! Here’s the howto. Additionally, I created a project page for my server.

The battle againt the BMC – Part 1

Earlier, I wrote about the problem of the noisy fans in my Dell PowerEdge 2800. Since then, I investigated a bit more. Just as a reminder: I can’t run silent fans because they have a lower RPM than a hardcoded panic-threshold of the PowerEdge. *grrr*

Brainstorm:

  1. make the fans faster/buy faster fans
  2. make the system hotter
  3. hack the fans into reporting more RPM than they actually do
    1. hack the fans themselves
    2. alter their tacho signal
  4. hack the BMC
  5. find out how the OEM sets these thresholds

Well, as I mentioned earlier 1. and 2. are for obvious reasons dissatisfactory.

Couple of fans taken apart

A couple of fans taken apart. Notice the blob of brown paint on the ring magnet of the fan on the right.

Concerning 3A, I took a couple of fans apart, looking at how they create the tacho signal. Almost all fans I opened (luckily I have a whole stack of noisy, throw-away fans lying around) have a sensor sitting just under the ring magnet, which is part of the rotor as can be seen in the photo on the right. I had no idea what this sensor might be, but I noticed on all of the fans one or more blows of brownish-red paint. I figured that this paint might be used to create a signal for the sensor – and I did some tests with other paint in order to replicate the effect (left fan on the photo with magnetic paint applied). Well, nice idea, but total bullshit as it turns out. The sensor is a hall-sensor that senses the change in the magnetic field of the ring-magnet, and thus changes inevitable 2 times per revolution. I figure that the paint applied on the rotor is used for calibrating the fans… Well, it was a nice idea.

3B might be an option, but it would require either a microcontroller or some analogue circuit – not really what I want to fiddle into the fan trays of 6 fans.

Concerning option 5, I thought that there may be hidden ipmi OEM commands for configuring the thresholds. I dug around the Dell OEM extensions for ipmitool (can be retrieved from the Dell Linux Community Repositories). This code officially earned worst code of the year – I completely understand why the ipmitool maintianers flatout refuse to integrate that piece of crap. It’s a hacked-up collection of extensions, seemingly done on the fly to fix customer problems. Horrible. Even more so as it does not seem to be able to set the thresholds either. After a few hours of digging in the code I managed to query the BMC sensors with Dell’s OEM commands, and the returned capability flags do indicate that the thesholds cannot be changed. Darn.

Now I am back to hacking the BMC firmware – but that’s for another post…

Update

I finally managed to adjust the critical fan thresholds by patching the BMC firmware! Here’s the howto. Additionally, I created a project page for my server.

Dell DRAC/Remote Console discoveries

1. Logout else lockout

The system has usually a limited number of users, and if you don’t logout properly multiple times you find yourself in the position that all slots have been taken. In this case, you have to wait for a timeout…

2. Firefox & Java

On Linux, the only supported browser is Firefox. On standard Ubunut, this does not work in combination with the DRAC as Ubuntu installs the Icetea Java plugin by default. Switch to the Sun Java plugin:

sudo update-alternatives --set mozilla-javaplugin.so /usr/lib/jvm/java-6-sun/jre/lib/i386/libnpjp2.so

3. Firewalls

On many systems, you will have problems to access the DRAC (or at least the remote console) because of firewalls in your way. You can forward the whole thing by using a host inside the network as an proxy. For this, you have to tunnel the connections through SSH as follows:

sudo ssh -L 443:<DRAC_IP>:443 -L 5900:<DRAC_IP>:5900 -L 5901:<DRAC_IP>:5901 -l <LOGIN> -N <SSH_PROXY_SERVER> -o ExitOnForwardFailure=yes

Update: I created a project page for my server.

Those darn fans!

As I explained before, I changed all the fans of my PowerEdge 2800 to low-noise, low-RPM models.

Fans of a PowerEdge 2800

The fans of my PowerEdge 2800. Front the cpu fans (stock Delta left, replacement everflow right); at the back the 120mm case fans (left the Artic, right the stock Nidec)

This was nice and shiny, and the system was quiet and quite cool. Too cool unfortunately, because the BMC throttles the fans too much so that their RPM values go below the configured thresholds. Then, the system goes into panic mode, spins up all fans at 100%, and lowers them successively below the threshold again. Repeat. :)

I ended up putting the Delta’s back in, because you cannot configure the lower failure threshold of the fans. So, lots of work for nothing, and the system is still way too loud. Very unsastifactory.

Thus, I started thinking how to get around this problem. Possible ways I see:

  1. make the fans faster/buy faster fans
  2. make the system hotter
  3. hack the fans into reporting more RPM than they actually do
  4. hack the BMC
  5. find out how the OEM sets these thresholds

Well, 1. and 2. are for obvious reasons dissatisfactory.

Regarding 3: I thought about getting the sensor in the fan to report a higher value. Most fans actually consist of a ring-magnet. The sensor lies just below it, with the rotating magnet passing over it. On 1-4 places on the magnet, the manufacturers apply some paint. I guess it’s some sort of EMI shield and the sensor detects the change in the field – but I couldn’t find any paint that would reproduce the effect. This would be a very neat and nice solution.

Regarding 4: The BMC is implemented in some microcontroller on the motherboard of the PowerEdge. I haven’t found out yet where it is. I am not even sure what type of architecture it is, so reverse engineering the BMC firmware wasn’t possible. Damn, I don’t even know if this is little-endian or not.

Regarding 5: The manual states that these values are read-only and are to be configured by the OEM. But how? I doubt that they create a new firmware for each combination of fan manufacturers that they use. So my guess is that there are some hidden OEM IPMI commands that allow to set the threshold. I asked the guys over at FreeIPMI if they’ve got a clue, but they don’t know about any such functionality.

Anyone out there with some hints?

UpdateI continued my search for a solution

Update 2

I finally managed to adjust the critical fan thresholds by patching the BMC firmware! Here’s the howto. Additionally, I created a project page for my server.

Replacing the fans on a Poweredge 2800

Fans of a PowerEdge 2800

The fans of my PowerEdge 2800. Front the cpu fans (stock Delta left, replacement everflow right); at the back the 120mm case fans (left the Artic, right the stock Nidec)

My “new” PowerEdge was way too loud. Like, jumbo-loud. I’ve chosen to replace all the fans with quieter models, but oh boy, this turned out to be a major pain! As always, I didn’t follow the most important rule when tinkering: do stuff incrementally, and test after each step. Well, maybe I’ll learn someday.

For reference, here the fan models:

Funtion Dell Replacement
4x Memory/Disk, 120x120x32mm, PWM Nidec Beta V TA350DC, M34789-35 Arctic F9 PWM
2x PSU, 60x60x25mm, Tacho Nidec Beta V TA225DC, B34605-33 Akasa AK-192BKT-B
2x CPU, 60x60x35mm PWM Delta AFB0612EHE Everflow F126025BU

I’ve changed all the fans in one go and ended up with a system that didn’t boot anymore. It didn’t even get to the end of the BIOS initialization, it shut down directly. I gathered that it must be the power supply’s fans (else I would have had a warning message on the POST). I’ve checked and I realized that I bought some fans with a build-in thermal control; a sensor that slows down the fan rotation. Well, as I had them already I hacked them up to ignore the sensor (cracked open sensor casing, soldered two pins together; no resistance acts as if the sensor measures a very high temperature).

Connector for PowerEdge 2800 fans

The connector with the soldered cables of the everflow fan. Here, the yellow and blue cable on the connector has been switched.

Next problem was that all fans spun at full speed. Impressively noisy, even for a seasoned cluster admin as me (I could hear the vibration 2 floors below). The reason were swapped PWM and Tacho pins (as I’ve found out after trying for a night to install Dell OpenManage 6.4 on Ubuntu 11.04 64bit). I don’t know why, but I needed different PIN configurations that every other source I’ve found on the Internet. Here’s what I’ve used in the end (This is looking on the bottom of the connector):

+---+---+
| 2 | 1 |-+
+---+---+ |
| 4 | 3 |-+
+---+---+
Pin Funtion Dell Everflow Arctic
1 VCC red yellow red
2 Control/PWM blue blue blue
3 Sense/Tacho yellow green yellow
4 Ground black black black

So everything is exactly as in the datasheets, except the swapped Control and Sense pins. After I’ve swapped them, the server became quite quiet and I quite happy. :D Actually, my desktop is now more noisy. Argh! BTW, thanks a lot to Brent Ozar for his blog entry on making a power edge quieter.

Update

Now I seem to have another problem: as reported by Brent Ozar and others

One problem shown above is that sometimes fans spin slow enough that they trigger Dell’s thresholds for slow-moving fans. Gotta figure out how to fix that for good one of these days.

Damn. Now I either have to find a way to make the fans a little bit faster or to change the threshold. See the follow-up post for more info on the threshold problem.

Update 2

I finally managed to adjust the critical fan thresholds by patching the BMC firmware! Here’s the howto. Additionally, I created a project page for my server.

Installing Dell OpenManage on Ubuntu 11.04 64bit

Installing Dell OpenManage 6.50 on Ubuntu 11.04 64bit (Meerkat? Olifant? damn names) was a pain. Mostly because Debian-based systems are only supported since recently and thus there is too much outdated info out there (mostly grisly hacks how to force the rpm-based install to work on your system).

Thus, here a very quick summary:

  1. Follow instructions on this page:http://linux.dell.com/repo/community/deb/latest/
  2. Afterwards, in order to be able to authenticate, edit /etc/pam.d/omauth as follows:
    -auth required /lib32/security/pam_unix.so nullok
    -auth required /lib32/security/pam_nologin.so
    -account required /lib32/security/pam_unix.so nullok
    +auth required /lib/x86_64-linux-gnu/security/pam_unix.so nullok
    +auth required /lib/x86_64-linux-gnu/security/pam_nologin.so
    +account required /lib/x86_64-linux-gnu/security/pam_unix.so nullok

Surprisingly painless (if I would have had this information 5 hours earlier). Ah yes, don’t forget to open your firewall for port tcp 1311… :)

Update: I created a project page for my server.