Sometimes when you have a server closet, the cables find a way to creep and crawl their way into a very messy ball of spaghetti. Servers start stacking up and before you know it the closet is running hot and working in it becomes a risky job. This is a story of how much of a challenge organizing such a mess can be.
Lets step back a few weeks ago to describe exactly what the situation was and how we intended to remedy it. Over the years after adding new users and moving users around, as well as running new cable drops and removing old ones, the server closet had built up a mismatched assortment of patch cables of various lengths and colors. This is often normal for small businesses as there usually isn’t a cabling standard set into stone that gets followed and eventually when it comes time to troubleshoot the physical network layer it becomes a bit of guess work or trial and error.
As with any electronic device dust is a huge problem and when you have 7 servers running as well as two rack-mount UPS units you tend to have even more dust accumulation than normal. Since we couldn’t afford the downtown during most of the year there was always the worry that a heat-sink would get clogged or a fan would fail. On top of that the gigabit switches we had installed were making funny sounds which scared us because whenever those things fail it is always at the most inopportune time.
After assessing the problems we faced we had all the necessary information needed to redo the closet. We decided to stack all of the rack-mount servers at the bottom just above the UPS units so that they would be in a thermally optimal position, leaving space above them for further servers to be added. The two tower servers would then be stacked next to the rack, which made cable lengths for each server essentially the same. Each of the current dumb switches would be replaced with fancier D-Link DGS-1100-24 smart switches which have all the awesome L2 features that efficient IT people need!
Now this obviously wouldn’t be the task that it was without having to switch out ALL of the patch cables in the closet. There were a total of 96 patch panel ports in addition to 7 servers, 4 switches, 1 router, 1 modem, and 2 APC units which each had at least one if not two NIC ports. To accomplish an acceptable level of standardization we decided to color code just about everything as well as label each cable (except for patch panel cables). For each server eth0 would be red while eth1 would be green. The links between each switch would be yellow, while the internet backbone would be orange. As for the patch panel all of the cables would be blue.
The reasoning behind our color coding deals with the severity of what could go wrong if something was unplugged. If you unhooked a server even for a second certain things such as a sql sync could get messed up which would be very bad, but unhooking the patch panel, or even some of the network links themselves wouldn’t have as severe of an impact.
Since this was going to be a very long and drawn out endeavor we decided to come in on a Sunday morning at around 8:30 am. This would give us more than ample enough time to complete everything and work out any hiccups that may come up during the process. When I arrived my boss was already in the process of removing the servers from the rack. Luckily I had brought some doughnuts in so he was in a very good mood after that! We got all of the servers out and removed all of the cable clutter. There was also a shelf that we used to keep a monitor hooked up to the KVM on, we removed that so we could have even more room in that tiny closet.
With everything out of the closet and only the remaining patch panel cables left, I made sure to document each cable that was left behind and what it went to (you’ll see why this was important later on). Each server had to be dusted out and check for anything physically wrong. We dusted each server out and everything went well without a hitch until it came to our RDP server. The battery for the raid controller was swollen which was quite the terrifying find. We had to order a spare from Dell so until that was available we had to make do and finally got the battery back in. That is when the worst possible thing to happen, happened.
The server was booted to check that all was still well with it, but it wasn’t. After the post happened the dreaded degraded raid text flashed on the screen and the server had an amber light. Sure enough the raid had entered a degraded state. This sort of thing always happens whenever a computer is shut down and then rebooted after having years of uptime, there is just some sort of magic associated with it. After doing a few quick Google searches to see what options we had, we realized that we could actually just hot swap the degraded disk in again and pray that it was a simple sector error and it would rebuild the raid no problems. We would hold off doing that until the end of replacing everything else.
For the most part everything else was installed without so much as a hiccup. The switches all went in well, the servers were all back in the rack the new KVM was hooked up. It was really starting to look like a new closet! The final step in all of this (besides check the degraded raid) was to replace the patch panel. As I was removing the cables I found a few Cat3 cables that were ran into the punch downs for the phone system. This was obviously odd and not something that should have ever been down but we figured that whatever it was we would eventually find out by way of someone running into our office and screaming that “xyz” wasn’t working. We took note of the port number and continued on. After about an hour of playing with the cables, we had all of them beautifully tidied up and done with.
Our luck was really panning out for us at the end of the day because the raid array was able to rebuild itself using the same disk. We do have a hot spare just in case the other disk decides to crap out but hey, fingers crossed! As for those cables?, the next day someone from the retail store came into our office screaming about how the phone in the hold room wasn’t working. Yes, someone actually ran a phone into a network wall jack, into a patch panel and into a phone punch down. The best part was when a tech came in from the company that provides us with phone service, they pointed out that they were the ones who ran the phone (by request of the old old old IT admin at the time).
All in all it was a very long but rewarding day. Some of the takeaways I learned were to always plan for a least one failure, and that dust builds up really fast! Never settle for less when it comes to your server closet. The wiring standard I implemented has made trouble shooting a breeze. Tracing a cable takes only a second because of the color coding and how everything is labeled on both ends, even the power cables! There is absolutely no guess work involved. So the next time you have to redo your closet just remember to give it your all and be prepared for something to go wrong!