Backing Up Lots of WordPress Clients: How to Make Life Easier

The Task

Recently at work we were about to be handed the task of making sure that all of the client sites were backed up. For a while, some other people we handling that task, but it was being handed to the developers since we are the ones that actually maintain the websites. There are about 30 sites we are managing so that is a lot to backup.

Some Considerations

Before we could really tackle the task of handling the backups we had to make a lot of considerations. The first was understanding exactly what we needed to backup. Next we had to find the best method for doing the backups themselves because it does take a lot of time. We also needed to know what sort of controls and monitoring we needed to have over these jobs.

Why is this thing so slow?

The internet at the office isn’t the greatest. It is an old T1 connection that isn’t the fastest but it gets the job done.  Originally the jobs were being performed by using FTP and downloading all the files. That is obviously a problem for several reasons. Doing that many FTP downloads takes a lot of time on our bandwidth, secondly unless you automate the backup, it will take time to go to each site and download the files. There is also the issue of what happens if the power goes out, or the internet goes down that night. It wouldn’t be unheard of to have that many connections happening for a length of time taking down the network temporarily, even for a few seconds.

The Options We Looked At

There were several options we were considering when we talked about this process. First we considered writing a script for each site to do the backups and using a cron to run them. While that is probably the best idea especially from a power user standpoint, I’m the only one there with the level of backend server administration experience. My concern was that others wouldn’t be able to fix an issue if I wasn’t around (bus plan thinking).  Next we discussed doing the backups internally via FTP or a backup program, but as I mentioned before there were far too many issues facing that type of backup. Not only that, keeping backup copies locally really doesn’t help you in the event of a fire unless they are backed up to a medium and stored in a fire safe.

The final option we looked at was using a plugin since the sites we build are on the WordPress platform. Using a plugin would make it easy for anyone to configure as well as troubleshoot. We looked into several but finally decided on using WPBackup Pro. While it isn’t the cheapest plugin, it certainly has a lot of worthwhile features.

What We Liked About WPBackup Pro

One of the things we really liked about the plugin was that it allows you to backup to cloud storage providers like Dropbox and Google Drive. We use Google for emails and thus we have storage with them as well.  What makes it so perfect is that it centralizes the backups in a place where we have control over access, and we know that it is relatively secure, plus far more redundant than any local server we have. Scheduling the backups is really easy to do as well, you can literally run a Job Wizard and backkup in minutes. The Google Drive setup does take a little of figuring out just because the API requires a few extra steps to connect, but once you do it one time, each new time goes faster and faster.

Our Setup

The setup we followed uses these principals:

  • Files should be kept for 90 days. This gives us three months of files. In the event of an unknown malware infection it gives us enough of a time window to hopefully restore from a version before the infection.
  • Databases should be backed up nightly. This helps in situations where data  becomes corrupted.  Since the database gets updated a lot, it makes sense to back it up a lot.
  • Site files should be backed up weekly. They don’t change very often so there isn’t a point in doing it frequently.
  • All plugins, and upper level directories should be backed up. Sometimes there are directories above the WordPress install that we use, so we should back those up.
  • Alerts should be sent whether the job succeeds or not. This is something I learned in IT. If you are only getting alerts if the job fails, you have no way to know if the job is stuck unless you look. While that might be easy in a centralized backup application, you don’t have that with 30 client websites. This allows us to count the emails really quickly and see if we have our 30 successes from the previous day.
  • TarGZ should be used as much as possible to limit filesize.
  • Backups should be staggered to limit server load. Each job is given a 5 minute window (unless it needs longer) so that there are never two jobs overlapping.

All in all after setting this up it seems to be working really well. There were a few hiccups along the way where a job or two didn’t run, but after resetting the job it seemed to go away. This was definitely a great investment because of the amount of time it saves as well as how easy it is for anyone to go in a do a backup.  I’m glad that we didn’t go the FTP commando, or scripted way. If you manage a lot of sites I highly recommend checking the WPBackup Pro plugin out.


Nagios – What A Pain in the Pi!

nagios1 nagios2
Working in the IT world you deal with a lot of servers (unless you are lucky and virtualize most of what you have). With all of that power comes great responsibility…and the need to monitor everything.  Wouldn’t it be great if there was a way to know about problems BEFORE they happened? Luckily there are a lot of solutions out there.  Being that I work in a SMB my buget is small, as in $0 small.  That leaves me with a lot of issues when it comes to implementing worthwhile solutions.  That is where Nagios comes in.

Nagios is a free unless you pay for more network and server monitoring solution.  Therefore it is great for people in situations like mine, so long as you understand Linux enough to set it up and get it running. Since I have no problem working in Linux that normally isn’t an issue. However since we didn’t have the money or spare hardware for a server to run Nagios on, we opted to use a spare Raspberry Pi my boss had laying around.

Being the super tech savvy Nix nerd I am, I opted to do a simple sudo apt-get install Nagios3, not realizing what I was getting myself into.  I thought I would be avoiding a whole messy compile of the program for a clean and easy install.  Instead I got one messy Pi to clean up after.

After the install completed I tried to long in to the web interface, no luck. I tried to start the service again, and that is when I started seeing errors about missing .cfg files.  Apparently when you install Nagios on Raspbian most of the .cfg files are commented out and you have to uncomment them.  I’m not going to go into detail about fixing the issue, as this is more of a rant/vent than a How-To and besides it was a fun experience learning how to get it all to work.

Once I finally had all the config files loading correctly, as well as a few plugins for SQL server I wanted to try out, I installed a new theme and called it a day.  I’m still currently setting up Nagios but so far I have learned a lot more about Linux than I ever knew before.  This is a great example of why you should never give up on a project just because it doesn’t work at first.


From A Rats Nest Into A Server Closet: How We Cleaned Up A Major Mess

Sometimes when you have a server closet, the cables find a way to creep and crawl their way into a very messy ball of spaghetti. Servers start stacking up and before you know it the closet is running hot and working in it becomes a risky job.  This is a story of how much of a challenge organizing such a mess can be.

Lets step back a few weeks ago to describe exactly what the situation was and how we intended to remedy it.  Over the years after adding new users and moving users around, as well as running new cable drops and removing old ones, the server closet had built up a mismatched assortment of patch cables of various lengths and colors.  This is often normal for small businesses as there usually isn’t a cabling standard set into stone that gets followed and eventually when it comes time to troubleshoot the physical network layer it becomes a bit of guess work or trial and error.

Messy_Server_Closet

As with any electronic device dust is a huge problem and when you have 7 servers running as well as two rack-mount UPS units you tend to have even more dust accumulation than normal. Since we couldn’t afford the downtown during most of the year there was always the worry that a heat-sink would get clogged or a fan would fail.  On top of that the gigabit switches we had installed were making funny sounds which scared us because whenever those things fail it is always at the most inopportune time.

After assessing the problems we faced we had all the necessary information needed to redo the closet.  We decided to stack all of the rack-mount servers at the bottom just above the UPS units so that they would be in a thermally optimal position, leaving space above them for further servers to be added. The two tower servers would then be stacked next to the rack, which made cable lengths for each server essentially the same.  Each of the current dumb switches would be replaced with fancier D-Link DGS-1100-24 smart switches which have all the awesome L2 features that efficient IT people need!

Now this obviously wouldn’t be the task that it was without having to switch out ALL of the patch cables in the closet. There were a total of 96 patch panel ports in addition to 7 servers, 4 switches, 1 router, 1 modem, and 2 APC units which each had at least one if not two NIC ports.  To accomplish an acceptable level of standardization we decided to color code just about everything as well as label each cable (except for patch panel cables).  For each server eth0 would be red while eth1 would be green.  The links between each switch would be yellow, while the internet backbone would be orange. As for the patch panel all of the cables would be blue.

The reasoning behind our color coding deals with the severity of what could go wrong if something was unplugged.  If you unhooked a server even for a second certain things such as a sql sync could get messed up which would be very bad, but unhooking the patch panel, or even some of the network links themselves wouldn’t have as severe of an impact.

Since this was going to be a very long and drawn out endeavor we decided to come in on a Sunday morning at around 8:30 am.  This would give us more than ample enough time to complete everything and work out any hiccups that may come up during the process.  When I arrived my boss was already in the process of removing the servers from the rack. Luckily I had brought some doughnuts in so he was in a very good mood after that! We got all of the servers out and removed all of the cable clutter. There was also a shelf that we used to keep a monitor hooked up to the KVM on, we removed that so we could have even more room in that tiny closet.

With everything out of the closet and only the remaining patch panel cables left, I made sure to document each cable that was left behind and what it went to (you’ll see why this was important later on).  Each server had to be dusted out and check for anything physically wrong. We dusted each server out and everything went well without a hitch until it came to our RDP server. The battery for the raid controller was swollen which was quite the terrifying find. We had to order a spare from Dell so until that was available we had to make do and finally got the battery back in. That is when the worst possible thing to happen, happened.

The server was booted to check that all was still well with it, but it wasn’t. After the post happened the dreaded degraded raid text flashed on the screen and the server had an amber light.  Sure enough the raid had entered a degraded state. This sort of thing always happens whenever a computer is shut down and then rebooted after having years of uptime, there is just some sort of magic associated with it.  After doing a few quick Google searches to see what options we had, we realized that we could actually just hot swap the degraded disk in again and pray that it was a simple sector error and it would rebuild the raid no problems.  We would hold off doing that until the end of replacing everything else.

For the most part everything else was installed without so much as a hiccup. The switches all went in well, the servers were all back in the rack the new KVM was hooked up. It was really starting to look like a new closet! The final step in all of this (besides check the degraded raid) was to replace the patch panel. As I was removing the cables I found a few Cat3 cables that were ran into the punch downs for the phone system.  This was obviously odd and not something that should have ever been down but we figured that whatever it was we would eventually find out by way of someone running into our office and screaming that “xyz” wasn’t working.  We took note of the port number and continued on.  After about an hour of playing with the cables, we had all of them beautifully tidied up and done with.

Our luck was really panning out for us at the end of the day because the raid array was able to rebuild itself using the same disk.  We do have a hot spare just in case the other disk decides to crap out but hey, fingers crossed! As for those cables?, the next day someone from the retail store came into our office screaming about how the phone in the hold room wasn’t working.  Yes, someone actually ran a phone into a network wall jack, into a patch panel and into a phone punch down.  The best part was when a tech came in from the company that provides us with phone service,  they pointed out that they were the ones who ran the phone (by request of the old old old IT admin at the time).

All in all it was a very long but rewarding day.  Some of the takeaways I learned were to always plan for a least one failure, and that dust builds up really fast! Never settle for less when it comes to your server closet. The wiring standard I implemented has made trouble shooting a breeze. Tracing a cable takes only a second because of the color coding and how everything is labeled on both ends, even the power cables! There is absolutely no guess work involved.  So the next time you have to redo your closet just remember to give it your all and be prepared for something to go wrong!

Clean_closet