Data integrity and backups
Data integrity means that a given data set is accurate, complete, consistent, unchanged (original or a true copy) and trustworthy throughout its entire lifecycle, which is ensured through a combination of processes, rules and standards.
For achieving data integrity, companies need to develop a data management strategy on data storage and migration that is tested in accordance with the best practices.
There is an intrinsic connection between backing up data and data integrity as the former could ensure that the latter is up to par.
Data security helps preserve data integrity. According to the CIA Triad model, information security has three building blocks and integrity is one of them. Therefore, data integrity is a conditio sine qua non for a database to be considered secure, but data integrity is the result of data security, not the other way around.
Two main categories of data integrity exist: physical integrity and logical integrity. Physical integrity takes into consideration external factors that can endanger the physical means on which databases reside. Natural disasters, power outages, maintenance issues, old storage and design flaws may be detrimental to storing and retrieving data. Logical integrity has more to do with human errors or software defects. There are four subcategories related to logical integrity of databases:
- Entity integrity – each record of data is uniquely identifiable
- Referential integrity – each record of data is stored and used consistently (the concept of foreign keys)
- Domain integrity – each value that can be placed in a database should pertain to its correct domain
For a data set to have data integrity would mean that it possesses all of the following features:
Retrievability – it is about easy access to data, which also means that it should be searchable, among other things
Traceability – data points can inform you about every touchpoint an organization creates while interacting with its customers
Reliability – this feature means that you have a consistent and trustworthy data set and that is what data integrity is all about
Protection of data integrity (Checklist)
- Validate Input: it verifies and validates the accuracy of the input
- Validate Data: it certifies that the data process works in a precise manner
Each transfer or replication of data creates a possibility of data integrity to be compromised one way or the other. Therefore, error checking and validation procedures are an effective way to achieve data integrity in these cases. All data must come together with its corresponding metadata and appropriate validation data.
- Remove Duplicate Data: stray files or duplicate data should be removed
- Back up Data: This process can be critical for the prevention of permanent data loss
- Access Controls: All measures taken so far could be rendered useless if there are no access controls in place to ensure the data is accurate, unchanged and trustworthy. A least privilege model is the top recommendation in terms of cybersecurity. Do not ignore the physical security/access, too
- Keep an Audit Trail: A useful measure that can direct your attention to the source of the problem. It is an automatically generated process that can track and record whether each event on a system is created, deleted, read, modified or stamped.
Despite that the terms data backup and data archive are often used interchangeably, they may have, however, different purposes in the data integrity process. A data backup is like temporary copying of full sets of data to a secondary site (See Note 1).
A data archive, on the other hand, is better organized (e.g., indexed and searchable) and provides long-term data preservation. Moreover, archiving is closely related to backup retention, whose implementation typically entails following strict industry-specific requirements such as the ones that govern healthcare or financial institutions.
Note 1: A timeline of how often data gets backed up is recommended. Use a more frequent backup schedule if necessary.
Backup as a compliance mechanism (US and EU)
Nowadays, the overwhelming majority of patient information is stored in some form of electronic format and it resides in the IT section of the healthcare industry. Backup and recovery of EHR data is very important for achieving HIPAA compliance. Pursuant to HIPAA (the Health Insurance Portability and Accountability Act of 1996) security rule, all covered entities in the healthcare sector are required to:
- Perform a frequent data backup that will be able to recover any loss of data
- Have two off-site back-up storages (See Note 2)
- Encrypt or destroy data at rest to secure it (See Note 3)
- Encrypt data in transit
- Draft written policies and procedures on data backup and have a recovery plan
- Test the recovery plan
Note 2: A popular remote storage option, which still remains hooked to the IT infrastructure, can be a physical dedicated server or cloud-based server.
Note 3: Backup encryption throughout the whole storage period would ensure data confidentiality rather than data integrity, because data can still be modified even if encrypted.
HIPAA prescribes several requirements for a data backup of data electronic health records (EHRs):
- Technical requirements – minimum encryption is 128-bit (See Note 4) + an adequate disposal of data system
- Physical requirements – areas of secure access + physical locks for rooms that keep EHRs
- Administrative requirements – such as a security management process, managing information access, training for security awareness and emergency planning are some of them
Note 4: What is the golden standard of backing up data is the application of signed cryptographic hashes (SHA-256) to each file. Because this measure protects against both accidental and malicious modification of the underlying data, it ties in well with the principle of data integrity.
To observe GDPR compliance, whose one of the six main principles is integrity and confidentiality, organizations need to have appropriate measures in place to protect personal data.
Backups are important for data integrity and data availability
Having a proper backup procedure and system in place is an essential element in the management process and security posture of every organization, and companies like Garmin, which experienced a ransomware attack in July 2020, have learned this lesson the hard way.
No security solution can guarantee that something wrong will not happen to your data for whatever reason. The only way to be sure that you will get off scot-free if something happens to your data is to regularly and properly back it up.
In February 2017, the multi-million tech company GitLab.com experienced a series of issues. At first, just like in a DDoS attack scenario, lots of spammers bombarded the website in question, and eventually made GitLab’s database unstable, which in the end resulted in a couple of hours of downtime. Then, the IT team of the company decided to delete some part of the database, and later when they tried to restore its full record they realized they could not do it. “Out of 5 backup/replication techniques deployed none are working reliably or set up in the first place. We ended up restoring a 6 hours old backup,” says the GitLab’s incident reports.
This case makes it as clear as day that backups are useless if they cannot restore data. In addition to having a workable disaster recovery plan, contingency procedure or business continuity policy – whatever you want to call it – IT teams need to make a solid habit out of testing backups and recovery capacity. So, it may seem after all that to achieve data integrity, your IT team needs to have a lot of men of integrity in its ranks.
- Backup, replication and archiving… What measures to take to preserve the integrity of your data?, A3P
- Ensure Data Integrity With a Proactive Data Management Strategy, MasterControl
- HIPAA Requirements for EHR Data Backup, National Center for Medical Records
- HIPAA Rules on Data Back Up and Disaster Recovery Plan, HIPAA Guard
- In an age full of data, integrity is essential, Veeam® Software
- Lessons Learned from Gitlab’s Massive Backup Failure, StorageCraft Technology Corporation
- Security 101: Backups & Protecting Backups, System Overlord
- Quick Guide to Best Practices for Data Backup, Liquid Web, LLC
- What is Data Integrity? Definition, Best Practices & More, Digital Guardian
- What is Data Integrity and How Can You Maintain it?, Varonis