Your most sensitive data is likely exposed online. These people try to find it

Justin Paine sits in a pub in Oakland, California, searching the internet for your most sensitive data. It doesn't take him long to find to find a promising lead.

He opens Shodan, a searchable index of cloud servers and other internet-connected devices, on his laptop. Then, he types the keyword "Kibana," which reveals more than 15,000 databases stored online. Paine starts digging through the results, a plate of chicken tenders and fries growing cold next to him.

"This one's from Russia. This one's from China," Paine said. "This one is just wide open."

From there, Paine can sift through each database and check its contents. One database appears to have information about hotel room service. If he keeps looking deeper, he might find credit card or passport numbers. That isn't far fetched. In the past, he's found databases containing patient information from drug addiction treatment centers, as well as library borrowing records and online gambling transactions.

Paine is part of an informal army of web researchers who indulge an obscure passion: scouring the internet for unsecured databases. The databases -- unencrypted and in plain sight -- can contain all sorts of sensitive information, including names, addresses, telephone numbers, bank details, Social Security numbers and diagnoses. In the wrong hands, the data could be exploited for fraud, identity theft or blackmail.

The data-hunting community is both eclectic and global. Some of its members are professional security experts, others are hobbyists. Some are advanced programmers, others can't write a line of code. They're in Ukraine, Israel, Australia, the US and just about any country you name. They share a common purpose: spurring database owners to lock down your info.

The pursuit of unsecured data is a sign of the times. Any organization -- a private company, a nonprofit or a government agency -- can store data on the cloud easily and cheaply. But many software tools that help put databases on the cloud leave the data exposed by default. Even when the tools do make data private from the start, not every organization has the expertise to know they should leave those protections in place. Often, the data just sits there in plain text waiting to be read. That means there will always be something for people like Paine to find. In April, researchers in Israel found demographic details, including addresses, ages and income level, on more than 80 million US households.

No one knows how big the problem is, says Troy Hunt, a cybersecurity expert who has chronicled the issue of exposed databases on his blog. There are far more unsecured databases than those publicized by researchers, he says, but you can only count the ones you can see. What's more, new databases are constantly added to the cloud.

"It's one of those tip-of-the-iceberg situations," Hunt said.

To hunt databases, you have to have a high tolerance for boredom and a higher one for disappointment. Paine said it would take hours to find out whether the hotel room service database was actually a cache of exposed sensitive data. Poring over databases can be mind-numbing and tends to be full of false leads. It isn't like searching for a needle in a haystack; it's like searching fields of haystacks hoping one might contain a needle. What's more, there's no guarantee they'll be able to prompt the owners of an exposed database to fix the problem. Sometimes, the owner will threaten legal action instead.

Database jackpot

The payoff, however, can be a thrill. Bob Diachenko, who hunts databases from his office in Ukraine, used to work in public relations for a company called Kromtech, which learned that it had a data breach from a security researcher. The experience intrigued him, and he dove into hunting databases with no experience. In July, he found records on thousands of US voters in an unsecured database, simply by using the keyword "voter."

"If me, a guy with no technical background, can find this data," Diachenko said, "then anybody in the world can find this data."

In January, Diachenko found 24 million financial documents related to US mortgages and banking on an exposed database. The publicity generated by the find, as well as others, helps Diachenko promote SecurityDiscovery.com, a cybersecurity consulting business he set up after leaving his previous job.

Publicizing a problem

Chris Vickery, a director of cyber risk research at UpGuard, says big finds raise awareness and help drum up business from companies anxious to make sure their names aren't associated with sloppy practices. Even if the companies don't choose UpGuard, he says, the public nature of discoveries helps his field grow.

Earlier this year, Vickery looked for something big by searching for the term "data lake," a term for large compilations of data stored in multiple file formats.

The search helped his team make one of the biggest finds to date, a cache of 540 million Facebook records that included user's names, Facebook ID numbers and about 22,000 passwords stored with no encryption on the cloud. The data had been stored by third-party companies, not Facebook itself.

"I was swinging for the fences," Vickery said, describing the process.

Getting it secured

Facebook said it acted swiftly to get the data removed. But not all companies are as responsive.

When database hunters can't get a company to respond, they sometimes turn to a security writer who uses the pen name Dissent. She used to hunt unsecured databases herself, but now spends her time prompting companies to respond to data exposures that other researchers find.

"An optimal response is, 'Thank you for letting us know. We're securing it and we're notifying patients or customers and the relevant regulators,'" said Dissent, who asked to be identified by her pen name to protect her privacy.

Not every company understands what it means for data to be exposed, something Dissent has documented on her website Databreaches.net. In 2017, Diachenko sought her help in reporting exposed health records from a financial software vendor to a New York City hospital.

The hospital described the exposure as a hack, even though Diachenko had simply found the data online and didn't break any passwords or encryption to see it. Dissent wrote a blog post explaining that a hospital contractor had left the data unsecured. The hospital hired an external IT company to investigate.

Tools for good or bad

The search tools database hunters use are powerful.

Sitting in the pub, Paine shows me one of his techniques, which he said was "hacked together with various different tools" that let him find exposed data on Amazon Web Services databases. The makeshift approach is necessary because data stored on Amazon's cloud service isn't indexed on Shodan.

First, he opens a tool called Bucket Stream, which searches through public logs of the security certificates websites need to access encryption technology. The logs let Paine find the names of new "buckets," or containers for data, stored by Amazon, and check whether they're publicly viewable.

Then he uses a separate tool to create a searchable database of his findings.

For someone who searches for caches of personal data in the couch cushions of the internet, Paine doesn't display glee or dismay as he examines the results. This is just the reality of the internet. It's filled with databases that should be locked behind a password and encrypted, but aren't.

Ideally, companies would hire experts to do the work he does, he says. Companies, he says, should "make sure your data isn't leaking."

If that happened more often, Paine would have to get a new hobby. But that might be hard for him.

"It's a little bit like a drug," he said, before digging into his fries and chicken.