Data center management: two words that encompass far more than most people realize. Faster than a speeding SSD! More powerful than hot-swap power supplies! Able to leap tall … well, most managers wouldn’t be able to jump over a full-size rack. With so many different elements coming together to build, develop and maintain a data center, a data center manager has to be fluent in a number of different technical disciplines as well as being a good boss. This is not an easy role to fill by any means, but for those that can handle it, the rewards are massive.
Level 1 — Closet Versus Cavern
1. What would you consider a data center?
This actually isn’t as straightforward of an answer as you would think. Some people would call anything that has commercial-grade storage a data center, while others would claim you have to have 80 racks, three different types of dedicated cooling and armed security to qualify as a data center. Virtualization doesn’t make defining things any easier, as you could theoretically have a location that only has a set of four or five physical VM hosts and a small commercial air-conditioner, but effectively have hundreds of servers onsite.
2. What does managing a data center mean to you?
Regardless of the scale involved, managing a data center comes down to making sure that the wheels keep spinning for the organization. The specific responsibilities that involves vary wildly depending on scale and availability requirements, but keeping the data safe, secure and available are all top priorities.
3. Have you ever had to deal with an employee that wasn’t performing their duties adequately?
Data center managers deal with a huge number of departments and specializations, each with their own needs. However, if you are responsible for making sure that project gets completed and on time, this means that your team members each need to do their part to get the job done. If a critical employee doesn’t do their portion in a timely fashion, this could potentially have catastrophic repercussions. Sometimes they just need some help wrapping their heads around a concept, while other times they just don’t care. Unfortunately, the right action in this situation may not be a nice one, but the job still needs to be done.
4. How have you balanced management and technical responsibilities in the past?
This is not easy. Some organizations have literally written into employee contracts that managers are not permitted to work on projects, while others expect them to lead from the front. Still others demand that even if all of the employees working for a manager are re-assigned, the project still get done on time — making it their responsibility to quickly get up to speed in elements that they may not be completely familiar with. Find out exactly what the organization expects of you before accepting the position, as you may be signing up for a whole lot more than you bargained for.
5. How would you define “cage-level redundancy”?
Let’s say for a moment that a meteor drops out of the sky and completely destroys one of your racks with a ridiculous amount of equipment in it … and everything keeps spinning with potentially a blip or two. Having cage-level redundancy means that an entire server rack full of equipment can be powered off, blown up, hauled out into traffic and run over but the affected systems will still remain up and available. As a result, when storage or server applications have a priority of MUST NOT GO DOWN EVER, this can be an incredibly effective tool. Granted, the exact manner in which this is achieved changes based on vendor, application, scale and certainly is not cheap, but in these cases it absolutely is worth it.
6. Have you ever used patch management?
Most organizations manage updates in one form or another. With a great many servers to take care of, not having to touch every single one during an update cycle is a huge time savings and a must for data centers. But while keeping servers current with security and performance updates is critical, making sure that those same servers do not reboot during production times or become unusable due to bad updates is even more important. Certain vendors have been having a bit of a run of bad updates lately, meaning that making sure that patches are actually ready for production instead of applying them as soon as they are available is crucial to data center stability.
7. How can moving from a mostly physical environment to a virtualized environment change a data center?
The most obvious element in this case is size: physical servers take up a significant amount of rack space. Removing physical boxes from a data center helps to condense what needs to be there, freeing up space for additional growth and future projects.
8. What critical environmental situations have you dealt with? How have you dealt with them?
Hypothetically let’s say that somebody was performing maintenance on two of your heat exchangers, and instead of putting one into maintenance to work on it accidentally turned both off. This meant that for however long it took to bring the units back online, your data center was completely without cooling. What would you do?
Or a truck driving down the street takes out half a dozen electrical poles, knocking out street power to the entire block. What would you do?
Or a dedicated line connecting two of your locations together has been torn apart by a backhoe. What would you do?
Environmental surprises can come from a massive number of potential sources. As the data center manager, it will be your responsibility to make sure that as many of these elements have redundancies or backups available as is feasible, up to and including the aforementioned meteor strikes.
9. How have you dealt with difficult department heads?
“Yes, I know that’s the company rule, but I want this.” “Look, we bought this, and you’re going to make it function.” “I don’t care what you think, this is going on the network.” A data center manager usually has to have the final say for anything that lives there. Granted, a large number of these decisions may be made by Security or Development, but when something strange pops up, that’s when things start going up the chain. And for some department heads used to getting their way, that can be a major culture shock. Like it or not, welcome to politics.
10. Why does doing your homework before presenting a spending proposal help everyone in the long run?
When it comes to major purchases (and there are an awful lot of them for data centers), doing research ahead of time is going to save you massive headaches as well as gain you brownie points with upper management. Most of the time they don’t care if something is going to give a 0.25% better response time per query than their competitors. They’re going to care about if this is the best bang for buck available, how long it’ll be good for and if it will be able to support the organization for an extended period.
Far more importantly, though, they’ll want to know how much of a break they’re getting compared to the list price. If you can supply the answers to all of the questions they may ask, they’re going to start trusting your opinions considerably more.
Level 2 — Roll For Sanity Check
11. How have you kept from taking on too many responsibilities?
Data center managers wear a number of hats. With these hats come a large number of responsibilities, with the traditional accessories of long nights and lousy sleep. Thankfully, though, no one has to do this alone. Making sure that you have the right people in the right places can reduce the strain that the job will take on you and reduce the load you have to think about at any given time. Yes, there will still be issues coming up from time to time that will require either your opinion or your expertise, but for day-to-day operations this can make a world of difference.
12. How do you manage storage requirements?
According to Domo’s “Data Never Sleeps 5.0,” in 2017 the world generated 2.5 quintillion (18 zeroes) bytes of data per day, which works out to around 2.5 billion gigabytes PER DAY. Most situations won’t require this kind of capacity growth, but if you do, I really hope you invest in deduplication.
Data dedupe allows for the automatic eliminating of multiple copies of the same thing on advanced storage. This can be extremely helpful if someone has copied their hard drive multiple times to the network — each time they get a new system, for example. Quotas are also tremendously helpful as they can allow a specific amount of storage space per department, with automatic warnings if they start to approach the maximum allowed used space.
13. How have you dealt with backups?
“At the end of the day, our job is keeping data safe.” Several people I have known live by this creed, and over the years I’ve come to realize just how right they are. Everything else — fixing issues, replacing components, blowing out cobwebs, all the long nights — are just peripheral. If the data is intact, everything else can be replaced, and this means making sure that backups are available.
With virtualized environments, things are a bit more interesting than they used to be: being able to bring back up an entire lost server within a few minutes is still nothing short of phenomenal. But for large-scale devices, the data is more important than the system. This means whether you use advanced storage, tape backups, multi-site, cloud storage or a combination of multiple solutions, whatever you decide to go on it needs to be reliable, effective and as fast as possible.
14. How would you rate your existing team?
This one can be a bit tricky, because they may or may not want you to be fully honest about your opinions. A manager fights for their staff but is also there to correct them. A manager believes in their people but makes sure they know what they’re doing. Be ready to adjust your answer to match the mood of the interviewer and how things have gone to this point.
15. How would you recommend changes to policies or procedures if you were brought on board?
New eyes and a different frame of reference can be fantastic for improving the way things are done. However, there are also reasons why things are done the way that they are. Without this additional information, it may be possible to cause far more damage than you intend to fix, along with alienating your staff. Taking the time to understand, making sure that your staff can wrap their head around what you mean and give them the time to give you feedback as to whether this would work or not instead of just firing off orders is very important for getting the support of your new staff.
16. Do you understand the on-call requirements?
I’m not talking about calls like “Hey, can I get a new mouse?” or “Hey. this person on the phone wants to know if we want a subscription to Cosmo.” On-call at this level means “IT’S ON FIRE SEND HELP AHHHHHHH” and a call at 2 AM saying “So, um, yeah, I just deleted this entire share and need it back up by 8 AM for a shareholder meeting.” Again, it doesn’t necessarily mean that you personally are going to have to touch these elements, but depending on what’s going on, you very likely will have to be in the loop and that means a long night.
17. How do you “unplug”?
With always-connected and always-on-call policies in place, being able to unplug and get away from everything for a while becomes a priority to be able to maintain your effectiveness and sanity. Sometimes this means having a three-day weekend every once in a while, while other times it’s a vacation in an area that has zero cell reception. Whatever method you choose, make sure that your staff is prepared for anything they might come up against so that you can actually enjoy your break.
18. What would you consider a “failure” for a project?
To some, a failure is just having a project not completed on time, while to others it can be a successful project that gets shut down immediately afterwards due to outside influences. Understanding which ones can be controlled and learning how to make sure they don’t happen again can be a massive step forward in managing stress.
19. What is your personal policy on off-site services?
Some organizations demand everything remain in-house, while others want to outsource everything they possibly can. Both come with their own risks and rewards, so understanding what upper management wants and judging where your preferences lie can help to plan for future projects much more easily.
20. How have you handled BYOD in the past?
Going hand-in-hand with off-site services, Bring-Your-Own-Device also has good and bad aspects. On the one hand, you have a much more available employee base right out of the gate, while on the other, it may mean a much more vulnerable infrastructure. While some organizations go completely in one direction or the other, most exist somewhere in the middle where certain staff need to be accessible at all times and that means connecting them up in one form or another. For these organizations, they may not even advertise that such a thing is a possibility and deny access to most that request such access. This is one of those fields that policies will have to be reviewed every few years as technology and needs change.
Level 3 — “I’m afraid we need to use … MATH.”
21. What does trusting your staff mean to you?
Trust isn’t easy — it is difficult to earn, yet easy to lose. When it comes to responsibilities at a data center, however, things are a little bit different. You may not trust this person with your banking information, but do you trust them to remember or have written down instructions for how to deal with a specific situation? If you were the one that hired them, do you trust their capacity to learn and perform the job as assigned? Trusting your people doesn’t necessarily mean that you are giving them complete autonomy, but it does mean that you understand that you can’t do everything yourself, and you need to be able to trust them to do their tasks.
22. Are you prepared to do odd hour shifts?
Just like on-call issues can happen at any time of day or night, there are some situations that come up at data centers that can only occur during non-work hours. This may mean starting your day at 8 AM and working until 8 AM the following morning, and this is before catastrophic events come into play — it’s not unheard of to have issues come up where entire holiday weekends are just gone due to situations far beyond anyone’s control.
23. How do you deal with cable routing?
When you take a look at Google’s or Rackspace’s data centers, you see row after row of clean racks and servers like something out of a Matrix movie. What you don’t see, however, is the rat’s nest that can happen if the people maintaining that data center stop consciously routing cables in a manner that is clean and out of the way. It certainly doesn’t take long, either: a couple of months and you’ve got a mess that looks like birds have started nesting on the email server.
24. How would you brief a help desk on an upcoming issue?
Cross-department communication tends to fall apart without deliberate effort, and data center project implementations can end up falling on help desk personnel the hardest when they have no information and have nothing to give users contacting them for assistance. If something is known to be coming down the pipe, it really does end up helping out everybody if information is provided to these people so they can give informed responses.
25. What is your experience in contacting end users?
This legitimately can become a major issue for larger organizations, where rules or tradition force IT staff not to contact end users to confirm issues or work on fixes. Finding out what this organization’s policy is before attempting to make contact would be a good idea.
26. Top management wants to start pushing for site-level redundancy. How would you begin planning this out?
Checking out your existing infrastructure is a great start — seeing what has built-in replication and what already needs to be updated in the next 12-18 months that could be virtualized or clustered. Once you have this baseline, start planning out the upgrades that are already penciled in for the next few years for storage and processing capacity.
Once you’ve got these figures, start shopping around for geographically-separate locations where they would be safely out of harm’s way should anything happen to your current data center. This will start giving you an idea of where to look, and with some feedback from upper management, you can check to see if the organization already has property nearby that could be upgraded to data center-grade. Alternatively, the organization may wish to create a new purpose-built location for a new data center or go with a co-located center if the cost involved would be too massive.
27. You’re on vacation right now and receive a phone call from one of your immediate staff. Do you answer it?
Certain managers treat PTO as something to never interrupt — if you’re on vacation, you’re not reachable. Unfortunately, this also means that if a call is coming through from somebody that knows this, it very likely is a major problem. On the flip side of that argument, however, some vacations are granted with the express understanding that you need to be available, just in case the worst happens. Therefore, context is critical here: are you supposed to be available, or are you supposed to be unreachable?
28. You’ve been asked to start reducing the amount of electricity used by the servers at the data center. What are some options you could use?
Winding down physical servers is one of the easiest methods to reduce electricity use. For example, if you have 12 physical servers that were spec’d out years ago to handle a job that could now be filled by three virtual servers, you would have a massive net gain. As these physical servers go away the load on the virtual server hosts will go up a small amount, but not nearly as much as is being removed. With fewer objects in the data center generating heat, HVAC doesn’t need to work as hard to maintain the environment, thus resulting in more savings.
29. A company has just been purchased and their assets will be rolled into our data center. How would you start migrating them?
Before anything starts, it would be best to look at their existing setup and get the opinions of their current IT staff. They may very well have a virus-infested crypto-mining setup that would explode across your network if it got attached. Or, just like we mentioned before, there may be 20 servers active where only two will actually be needed post-migration. Getting a migration done right the first time can not only save costs in new hardware and services; it can also pay massive dividends with fewer licenses and support contracts.
30. Do you have any questions for us?
One of the most often overlooked questions, but when you’re a manager this is vital. Asking about current issues they’re fighting and future projects coming down the pipe, and providing feedback about situations they that your existing position may have already solved, can be huge in seeing what kind of person both you and the interviewer are. One of the best interviews I ever had almost felt more like it was an hour of talking shop rather than a traditional back-and-forth.
Data centers are the heart of any organization: it’s not really exaggerating to say that without them, the organization would slow significantly, if not stop. Getting the right person in charge to make sure that everything remains running is vital for their future, so this is going to be a challenge before the interview ever begins.
Because there is just so much that a data center manager needs to understand and be ready for, being able to research any given topic is a fantastic skill to have. You might want to check out Skillset.com, which has over a hundred thousand questions and answers covering a huge range of topics. When a new project is coming down the pipe, find out any weak spots you may have, do the homework and be better prepared to make those key decisions.
A data center manager interview starts way before you actually are speaking with the interviewer. Be sure to research the organization, find out as much as possible about their existing setup and be ready with notes and observations you’ve seen in the past that they may have come across. This preparation will help you to be less nervous and that will come across in your answers and mannerisms — potentially the edge you need to get that offer.