Ein professionell gekleideter Geschäftsmann zeigt auf ein digitales Diagramm, das eine Cloud-Backup-Lösung für verschiedene Geräte symbolisiert

An ominous dark cloud and a wake-up call for back-up and recovery: Are you sure that your back-ups are secure?

10. May 2021, Felix Zech

If there’s one thing that most of us tend to take for granted it’s the safety and reliability of data stored in the cloud. Until recently, many companies, IT professionals and most users have assumed that was true.

Unfortunately, that assumption has turned out to be wrong in the aftermath of a disastrous datacenter fire in March at OVH, one of the largest cloud service providers in Europe. Fortunately there were no casualties.

Although incidents of this magnitude are rare, it’s clear that not every cloud data back-up is automatically safe. For IT administrators, the fire should serve as a wake-up call to take a critical look at their own back-up and recovery strategies and procedures.

We know from our experience in working with partners and customers, and in our own internal operations, that most organizations use a combination of back-up systems matched to the importance of the data and the frequency of access needed. Our experience also tells us that those best practices are not universally or consistently used at every company.

A must-have: protection of the back-up

Every file server and backup server has a particularly high level of multi-layered safeguards in place. Among other benefits, this protects against unauthorized access. You can imagine the nasty consequences if an attacker gained access to the domain controller's backup files. The point is that just protecting the file and other company servers from intrusion but not the backup system is a very bad idea, obviously.

This also means that it is not enough to simply to back up the data! You need to know exactly what was backed up, where and how, what access options the IT teams have, and, of course, how to recover the data during an emergency or other less-dire incidents of data loss.

A lot of work goes into setting up and ensuring the actual back-ups are running as intended. But many companies take a much more casual approach to testing data recovery procedures, including the time that it takes to restore mission-critical systems in the event of an emergency. Only if the functionality of the back-up and recovery processes are regularly and thoroughly tested can you actually realize the benefits of the work you put into set-up and maintenance. If you have to dig out the back-up systems manual to remind yourself of what to do in an emergency then you either lead a charmed life with unbelievably good luck or you’re about to make a bad situation much, much worse.

Where to go with the data?

Another important question is where do you securely store the ever-growing volumes of backup data? The OVH example also shows that backing up to the cloud alone is not enough. Geographical redundancy is needed – at least for critical systems. If it is financially feasible, we recommend that you store all data in at least two locations.

Alternatively, but not quite as secure, it is also possible to use fireproof data storage. These are ideal when it comes to long-term backups of less-critical data. For example, you can use a fireproof safe in your own IT datacenter. But you always have to consider worst-case scenarios, including how long it may take to locate and secure the fireproof container to regain access to your backups. You’d be wise to consider additional storage methods and locations.

Time intervals

There’s no standard for how often you should test your back-up and recovery processes. From my point of view, testing should be regular with frequency that reflects the nature of the data in the back-up. What’s “regular”? You might say that annually or once every decade is “regular,” but realistically the periods should be much shorter. My recommendation is:

Critical systems once a month
Less-critical systems once a quarter
Non-critical systems every six months

Murphy’s Law applies. Testing should accurately simulate various emergency scenarios to ensure that system and IT staff procedures work when they’re needed most. A key metric is how quickly and completely you can restore lost or damaged data. This means that the IT teams should go through the entire recovery process in accordance with the intervals above. This necessarily includes the verification of restored data in all related core applications. Does it read, write and behave as expected? After all, the process is not just about data recovery but business recovery. Testing should ensure that all IT processes function as intended and that users can resume normal operations within the shortest time possible.

View all posts