Metadata discovery software vendor Silwood Technology has conducted research into five of the largest and most widely used application packages to understand the scale of the challenge encountered by their customers when locating personal data for GDPR compliance.
It is vital to perform this 'discovery' work for any GDPR project. Without a clear understanding of where personal data is located in each of the systems in an enterprise, it will not be a straightforward task to carry out any of the steps to reach GDPR compliance.
The research reveals that the task facing organizations in the coming few months is significant. In SAP alone there are over 900,000 fields that may (or may not) contain personal information that require data discovery and risk assessment. The size and complexity of the databases mean that businesses that are not well-advanced in data discovery or are undertaking manual discovery processes may not be ready on time for GDPR.
Using Safyr, Silwood Technology's metadata discovery software, the research team selected the top five application packages based on customer base and size – SAP, JD Edwards, Microsoft Dynamics AX 2012, Siebel and Oracle E-Business Suite. The terms Date of Birth and Social Security Number were selected for test purposes and searches performed to see how often they appeared.
The researchers using Safyr were able to conduct these searches across whole systems in just a few minutes.
This is due to Safyr's unique ability to retrieve metadata about each application from the application layer itself – including any customizations made by the customer. Safyr is designed specifically to make the discovery of metadata in ERP and CRM packages easy, fast and accurate.
Founder and Technical Director of Silwood Technology, Nick Porter, commented: "Whilst GDPR needs to be considered for any 'system' that potentially stores information about individuals (including paper-based systems), much of the data in a medium to large sized organization will be found in one or more of the major application packages from SAP, Oracle or Microsoft.
"With GDPR coming, those application packages that have been modified or customized will be the most difficult in which to locate personal data information. Whilst SAP is the biggest of the ERP vendors (exact figures are hard to come by, but it is generally accepted that there are around 30,000 SAP ERP customers), Oracle and Microsoft also have a significant presence."
What is Personal Data in GDPR terms?
The 'Data' in the General Data Protection Regulation is what the guidance calls Personal Data. For example, if a living individual can be identified from any data being processed, it is covered by GDPR. This might be a single piece of information, like a Social Security Number, or several pieces of data that can be combined to identify someone (e.g. Name and Date of Birth).
Exactly what constitutes Personal Data will vary from customer to customer, depending on the industry type. For example, in the healthcare sector, Patient Number would be a means to identify a person, but this would be irrelevant in, say, manufacturing.
Personal Data in five leading application packages
Silwood Technology used its Safyr metadata discovery software to conduct a deep dive into five of the largest and most widely used ERP and CRM application packages.
When looking at the five leading application packages, Silwood Technology's focus was to locate the personal data fields in their databases. Databases are composed of tables, which for those unfamiliar with the relational database model, are like files in a physical storage system. Each table has a number of columns – like the fields on a physical form and are referred to as fields in this analysis.
If, for example, Date of Birth appears 100 times in the tables, each of the 100 occurrences needs to be reviewed to determine whether it constitutes personal data from the enterprise's (and hence a GDPR) perspective. In another package, the fact that the Date of Birth appears only ten times might sound 'better' – but only if there is an efficient way to find the ten mentions amid the thousands of other fields in the system.
The team wanted to research the frequency with which certain personal data categories occurred in the chosen applications.
Several instances of each package were examined and the statistics presented give an indication of how many occurrences of each field will be found in a typical system.
Silwood Technology selected Date of Birth and Social Security Number as examples for test purposes. However these packages, and others like them, have a host of other Personal Data fields that would also need to be considered in any GDPR compliance programme. The results for each package are below.
SAP is by far the largest ERP application package in terms of its market presence, size of customer base, breadth of functionality and the sheer number of tables in its database.
According to Panorama Consulting Solutions*, SAP has over 20% of the ERP market share.
Silwood found that:
- There are typically in excess of 90,000 tables in a SAP system and over 900,000 fields
- Social Security Number, or its equivalent appears in over 900 tables
- Date of Birth appears in over 80 tables.
Nick Porter said: "Less than 1% of a typical SAP system contains the personal data that could cause GDPR breaches that cost your organization up to 4% of its annual turnover.
"It's often medium-sized businesses that attempt manual data discovery. On average, an SAP implementation will take more than 20 times longer to locate personal data using traditional approaches, compared with an automated solution."
JD Edwards offers ERP functionality that at the superficial level provides similar features to SAP but at a much lower cost of ownership. Like SAP, JD Edwards' strengths are in manufacturing. JD Edwards is one of a number of packages offered by Oracle that includes PeopleSoft, Siebel and Oracle EBS. Overall, Oracle has nearly 14% market share, second only to SAP*.
JD Edwards does not have the depth of industry-specific applications offered by SAP, and is much smaller than SAP, in terms of the metadata footprint, but still very challenging.
Silwood found that:
- There are approximately 5,000 tables and 140,000 fields
- Social Security Number (JDE calls it Tax ID) appears in over 170 tables
- Date of Birth in over 210 tables.
Microsoft Dynamics AX 2012
Microsoft Dynamics AX 2012 is an ERP system suitable for midsize to large enterprises. The solution has particular strengths in manufacturing and distribution. There are a number of differing packages that fall under the 'Dynamics' umbrella, and together these give Microsoft nearly 10% market share*, putting them third place in Panorama Consulting's ranking.
Silwood found that:
- There are approximately 7,000 tables and 100,000 fields
- Social Security Number (Microsoft Dynamics calls it Tax Code) is located in over 150 tables
- Date of Birth is in approximately 10 tables.
Whilst Siebel has been largely overtaken by Salesforce as the leading CRM package, it retains a large user base.
Silwood found that:
- There are around 5,000 tables in a typical Siebel system and approximately 170,000 fields
- Social Security Number was found in 14 tables
- Date of Birth in over 6 tables.
Being a CRM system, Siebel and similar systems will be a prime target for GDPR.
Oracle E-Business Suite
Oracle E-Business Suite is another of Oracle's package offerings with strong functionality across the range of ERP applications.
Silwood found that:
- There are around 22,000 tables and approximately 570,000 fields
- Social Security Number was found in 5 tables
- Date of Birth in over 40 tables.
The task of locating Personal Data is part of the 'Information Audit' phase of a GDPR project. This will inform other steps in the process of becoming compliant, such as delivering Data Subjects' Rights for Rectification, Deletion and Access.
Unfortunately, no ERP or CRM application specialist will be familiar with all the tables in their databases. And the Social Security Number and Date of Birth are just two types of data – there are tens if not hundreds that need to be located and recorded. Therefore some form of automation is required to make the task achievable.
Nick Porter concluded: "The GDPR becomes enforceable across the EU in May 2018, and not since Y2K has there been so much confusion and hype around a single business issue. Every software company and consulting firm that even remotely plays in the data governance space is jumping onto the GDPR bandwagon. The reality is that there is no one GDPR 'solution' and any company saying they have one is probably overplaying their capabilities – unless of course throwing bodies at the task is considered to be a solution.
"The scale of the issue means that businesses that are not well-advanced in data discovery or are undertaking manual discovery processes will struggle to be ready on time for GDPR."
To assist SAP customers who are trying to find personal data in their ERP systems, Silwood have released a Safyr GDPR Starter Pack. This will accelerate the information audit process for them. We are also planning further Starter Packs for other applications in the near future
* Panorama Consulting Solutions – 2017 Top 10 ERP System Ranking.