On May 25th 2018, after nearly two years of deliberation and public debate, the European parliament adopted the General Data Protection Regulation. This was triggered by an alarming trend in data breaches and privacy violations globally. GDPR declares the protection and privacy of personal data as a fundamental right of all the European people, and thus requires any company dealing with EU customers to comply with it. While essential, achieving compliance is not trivial: Gartner estimates a compliance rate of <50% by the end of 2018.
Why is compliance challenging? GDPR’s goal of data protection by design and by default sits at odd with the traditional system design goals of optimizing for performance, cost, and reliability. As our first investigation reveals, several design principles and operational practices widely followed in the real-world, conflict with the proposed regulations. These deep-rooted tussles are hard to fix via cosmetic changes. Second, unlike other privacy regulations like HIPAA and FERPA, GDPR is comprehensive: it defines personal data as any information relating to an identifiable natural person as well as gives people broad set of rights over their personal data. As we demonstrate in our second project, current generation storage systems incur significant overhead to support these GDPR queries. Finally, though GDPR is clear in its high-level goals, it is intentionally vague in its technical specifications. As a result, system designers do not have well-defined targets, making it harder to achieve and maintain compliance.
What are our goals? Our work focuses on understanding and quantifying the impact of GDPR on storage systems and cloud platforms. We are currently developing a benchmark, whose primary goal is to provide a measurable interpretation of GDPR compliance of storage systems. Our long term goal is to design and implement mechanisms and abstractions that make it easy for people to exercise their privacy rights, and for companies to efficiently achieve compliance.
Summary: In this paper, we review GDPR from a system design perspective, and identify how its regulations conflict with the design, architecture, and operation of modern systems. We illustrate these conflicts via the seven privacy sins: storing data forever; reusing data indiscriminately; walled gardens and black markets; risk-agnostic data processing; hiding data breaches; making unexplainable decisions; treating security as a secondary goal. Our findings reveal a deep-rooted tussle between GDPR requirements and how modern systems have evolved. We believe that achieving compliance requires comprehensive, grounds up solutions, and anything short would amount to fixing a leaky faucet in a burning building.
Summary: Motivated by the finding that more than 30% of GDPR articles are related to storage, we investigate the impact of GDPR compliance on storage systems. We illustrate the challenges of retrofitting existing systems into compliance by modifying Redis to be GDPR-compliant. We show that despite needing to introduce a small set of new features, a strict real-time compliance lowers Redis’ throughput by ∼95%. Our work reveals how GDPR allows compliance to be a spectrum, and what its implications are for system designers. We discuss the technical challenges that need to be solved before strict compliance can be efficiently achieved.
We are a group of CS and law researchers investigating privacy regulations from a systems perspective. We would love to hear from you. Please send a note to Supreeth or Vijay if you have any questions, feedback, or collaboration.