It has been almost impossible to avoid reading about the numerous large-scale data breaches reported on a seemingly daily basis. Stories of bad actors getting their hands on personal data are terrifying and always result in bad press for the breached company.
However, not all data exposure scenarios make the news, and many can go unnoticed or unreported for years. Data leaks are far more common yet no less dangerous than sinister data breaches.
What is a data leak?
A data leak differs from a data breach in that the former usually happens through omission or faulty practices rather than overt action, and may be so slight that it is never detected. While a data breach usually means that sensitive data has been harvested by someone who should not have accessed it, a data leak is a situation where such sensitive information might have beeninadvertently exposed.
I came across an example of this while preparing my company’s GDPR strategy. I use WordPress for my professional websites, and came across an exposé on how the use of Gravatar on WordPress sites can inadvertently expose the names and email addresses of folks who have commented on blog posts or other content. While names and email addresses are not typically considered as sensitive as, for example, health information or financial data, those attributes are highly personalized and should be protected (especially in the shadow of GDPR).
On a larger scale, Twitter and GitHub each discovered that they had been logging user passwords in plain text in some of their internal logs. A California-based nonprofit inadvertently left unsecured an AWS S3 bucket containing millions of documents with highly sensitive information. Although there were no reported cases of misuse of data in these leaks, the fact remains that someone could have innocently stumbled across this data.
I recall a situation with a former employer where my own personal information was being transferred in a highly insecure manner. I had visited our benefits office for a question regarding insurance, and our benefits rep contacted the insurance company to get an answer to my question. To do so, she had used my social security number to identify me, and had even put my SSN in the subject line of the email. Although nothing bad came of this event, such a data leak could have exposed me to identity theft, a fact that I politely but firmly pointed out to the parties that had exchanged the information in this insecure manner.
A recent study suggested that fewer than half of all enterprises can detect a major breach within one hour. For data leaks, however, I suspect the detection percentage is far smaller than that. Small leaks of sensitive data happen with a greater regularity than could ever be reported because many of these go undetected. While the tools for securing data and detecting breaches are getting better, there will likely always be the opportunity for simple human failures to inadvertently leak data.
Where are your data leaks?
The takeaway here is that we must always be vigilant about how data is stored, processed, and shared. We as individual data consumers and users must fully understand where our data comes from and how it is being used downstream. Those who guard the data must be ever vigilant about trickling data egress (as opposed to the more newsworthy data floods). Institutions must also build in the protections – using both software tools and policy/procedure declarations – to reduce the possibility of data leakage.
Author’s note: This post originally appeared on my Data Geek Newsletter.