Saving Analytical Data Without Violating GDPR – Part 1: Data Minimization and Masking

Don Loden, Managing Director Data and Analytics
Ernie Phelps, Senior Manager Data and Analytics

With an effective date less than four months away, the General Data Protection Regulation (GDPR), known officially as “REGULATION (EU) 2016/679 OF THE EUROPEAN PARLIAMENT AND OF THE COUNCIL of 27 April 2016,” is becoming a pressing concern for companies inside and outside the European Union (EU). Broadly, the regulation specifies that personal data protection of natural persons residing in the EU (aka EU data subjects) is a fundamental right. Personal data has a broad definition in the EU, applying to typical personal identifiers (national number identifier, passport number, etc.) as well as broader categories like location data and online identifiers (IP address, cookies). GDPR goes on to outline severe measures for non-compliance, including fines up to the greater of 20 million euros or 4 percent of total worldwide annual revenue for the preceding financial year.

The GDPR spells out a number of restrictions for the use, storage, removal and access to personal data. This can have potentially significant effects on analytical data (enterprise data warehouse, data mart, data lakes, report systems, etc.) as data removal and rectification requests can change historical reporting, introduce data gaps and complicate backup and ETL processes (“ETL” refers to three database functions – extract, transform and load – that are combined into a single tool designed to pull data out of one database and place it into another database).

Mitigation Techniques

There are several possible strategies for reducing the impact of GDPR on a company’s analytical data. Since compliance will be required for a large number of companies by May 25, the best methods are those that can either utilize processes already in place or that can be implemented with as small an effort as possible. Each company will need to look at the strategies below and make decisions on which strategy to apply and to which data elements to apply it. Below, we discuss two of those techniques – minimization and masking.

Minimization

The simplest way to comply with GDPR is to remove any non-essential personal data from analytical systems. The lower the number of data elements that identify a unique individual, the easier it is to deal with any remaining elements. The viability of this strategy will vary widely, but in many cases companies have taken the approach that it is better to have data and not need it than need it and not have it. GDPR turns this axiom on its head but it also provides an opportunity to take a hard look at what the company is storing and what the use case is for keeping it in an increasingly privacy-centric international environment.

Minimization will likely not be a standalone solution. Most companies cannot simply remove all personal data and still use the data for the business purposes it was originally designed to satisfy. However, minimization will reduce the number of data elements that need to be addressed by other strategies and thus should be strongly considered as a first priority.

Masking

Masking is replacing some or all of the characters in a data field with data that is not tied to the original string. These can be random or static, depending on the situation (i.e. 999-99-2479) but should always remove the ability to uniquely identify the record even when combined with other elements from the company’s records.

Masking is probably the least desirable solution from a security standpoint, since in many cases it does not sufficiently de-identify the record. If a phone number, for example, has its area or city code digits masked but is associated with a person’s place of residence, one would only need to know the area or city code(s) of the place of residence to unmask the identity of the person. Even if the entity masks some of the non-area digits, the number of possible exchanges may still be low enough that an automated hacking algorithm can uncover the number.

There are some cases when masking can still be useful or can augment other strategies. If the company has transactional data sets that must be retained for statutory, business or other exception cases, masking can help control data access by limiting the data shown based on existing access control mechanisms. In other cases with more possible combinations (credit card number, street address, etc.), masking can be used situationally to satisfy GDPR requirements.

In Part II of this topic, we will review data aggregation and anonymization.

Add comment