Skip to comments.Google Keeps Your Data Forever - Unlocking The Future Transparency Of Your Past
Posted on 06/21/2010 11:46:31 AM PDT by SeekAndFind
By Tom Foremski - March 8, 2010
Wayne Rosing, when he was VP of Engineering at Google, once told me that Google saves every bit of data from people's searches and puts it onto tapes and ship it off to a storage facility.
Why does Google collect all that data I asked? We don't know, but we collect it all, he said.
These days Google has a better answer but it continues to save all that data.
Yes, Google will tell people that it removes data after 18 months but that is not strictly true. It removes the data that can be used to easily identify a person but the rest of the data is kept.
Google says it keeps the data to help advertisers with behavioral targeting. Or rather, its to help Google serve up ads to users based on their behavior.
Nate Anderson, at Ars Technica, reports:
Search data is mined to "learn from the good guys," in Google's parlance, by watching how users correct their own spelling mistakes, how they write in their native language, and what sites they visit after searches. That information has been crucial to Google's famously algorithm-driven approach to problems like spell check, machine language translation, and improving its main search engine. Without the algorithms, Google Translate wouldn't be able to support less-used languages like Catalan and Welsh.
Data is also mined to watch how the "bad guys" run link farms and other Web irritants so that Google can take countermeasures.
Google eventually anonymizes the data:
The last octet of the IP address is wiped after nine months, which means there are 256 possibilities for the IP address in question. After 18 months, Google anonymizes the unique cookie data stored in these logs.
This isn't especially ambitious; Europe's data protection supervisors have called for IP anonymization after six months and competing search engines like Bing do just that (and Bing removes the entire IP address, not just the last octet). Yahoo scrubs its data after 90 days.
But this data could still be traced to individual users.
This is what happened when AOL released search data on 685,000 search users in August 2007. The data had been anonymized but it was easy for reporters to find the actual users from clues in their searches, such as zip codes and town names.
You can search for what AOL users searched at these sites:
The AOL searches revealed a glimpse into the unguarded thoughts of the digital haves.
In one instance, it looks as if a wife and a husband are using the same computer, each hiding their extramarital affairs from the other, then later looking for help online to deal with the pain of a failed relationship.
And there are real soap operas, tracked over a period of months... from the excitement of first meetings:
"how to get rid of nervousness of meeting a blind date 23 Apr, 12:27"
"if your spouse has an affair should you contact the other person's spouse and let them know : 07 May, 09:58"
And the same user account asks:
"i had sex with my best friend and now he treats me differently :26 May, 13:58"
There are also "how to kill your wife" searches and more.
All this data was anonymized but all the searches from a single computer were kept together and that means they can eventually be traced. A New York Times reporter quickly managed to find one of the searchers.
Welcome to the future transparency of your past.
In the future, there will be vast databases of anonymized data from a variety of sources: search engines, credit card companies, cell phones, geo-location data, etc. It will be possible to triangulate that data, and if one piece of that data is linked to a user, it will unlock everything else.
Yes, it would take quite a bit of data mining but we have the technologies to do it today.
While each silo of information, technically might be anonymous, in aggregate, it would help identify users from their behavior. Each digital interaction throughout your day, whether through mobile, or desktop, or bank, leaves a trace and that can eventually be tracked and matched with an identifiable person.
And Google, with its dominance of your life, search, email, docs, buzz, photos, video, etc, is collecting huge amounts of your behavioral data, and it will be one of the main keys in unlocking your privacy.
Welcome to the future transparency of your past.
Be careful what you type and where you go when you use Google. Your keystrokes are being tracked and kept and may be used against you in future for any reason at all.
Google is not anyone’s friend...except government.
I watched in horror as google “disappeared” half a million hits on image searches for “obama birth certificate” in mid July 2008.
I use Bing now.
Awesome! And Limbaugh wants folks to pay Carbonite for the same service.
RE: Awesome! And Limbaugh wants folks to pay Carbonite for the same service.
I read news that Google cooperates with the FBI when they ask for SEARCH and other information on suspects who commit crime. I don’t think Carbonite does that.
Sadly, this goes beyond actual use of Google.
Every time you have a screen that shows a YouTube video embedded in it, such as here at Free Republic, they know it’s the same person who has the Google cookie and who did all of the following searches over the past ten years on Google or one of its partners using Google search. It also knows you looked at certain maps and looked up certain shopping information.
Not only that, but by default, every article you’ve looked at on Free Republic Google is aware of, because Jim uses Google’s Analytics to rate his pages. The cookies are only a fall-back if your IP address is changed, but honestly, since ISPs like Comcast and AT&T give you an IP address out of 256 possibilities in your immediate location, Google knows that it’s you all the time, because out of the 256 IP addresses you might have changed once every few months when you reboot your modem, you are the only one who keeps going to Free Republic and DrudgeReport.
Yes, even if you never saved cookies, Google is recording IP addresses. If you have ANY YouTube or Gmail or other Google login, guess what, they now can tie your name and address to everything you’ve ever touched of theirs.
We need to starve the Beast and erase cookies, turn off scripts (a hassle, but more easily done in Firefox than IE) and reboot your modem more often (waiting at least a minutes to assure you get a different address rather than a reissue of the one you had).
Nothing you do online is anonymous. Nothing is so aggravating as people who publish blogs and forums, and then whine about their privacy, as if their published stuff is really just meant to exist inside their head.
You CAN be anonymous. It takes a minimal amount of work and knowledge. But it is up to you to do it, because everyone else will just pick your bones clean.
Bing didn’t systematically eliminate images and links critical of OilBama. Bing didn’t systematically push links critical of Bush.
I don’t trust that anything I type on-line will ever disappear, regardless of the form or format.
Even if an individual website promptly erases their records, I’m sure our enemies have an essentially unlimited ability to intercept and record every bit of raw data.
Should someone become sufficiently inconvenient, a nation-state has the ability to focus immense resources to sorting through their archives and detecting and decrypting all of the peon’s traffic.
Is anybody knowledgeable about anonymous proxies? Are they a solution to protect your privacy?
If a leftist US administration wanted to set up a Gestapo, would it outsource? Say, the muscle end to SEIU and the information-gathering end to Google?
Good search engine. No tracking
Al looks more and more like the other Al ( Baldwin ) everyday.
>”Al looks more and more like the other Al ( Baldwin ) everyday.”
Man, I know!
Some trivia for ya.
One of them is a “thoughtless little pig”. The other called his daughter that.
One “Wants to buy a Filipina wife”
One has now been banned from ever visiting the Philippines again.
I can’t stand Google or the way they do business. But I use them for things like “what is a formulary equivalent for(insert prescription drug here)?” or “How many casualties in the Battle of Savo Island?”
I work with a guy who is pretty liberal, nice guy, but he worships Google.
They give me the creeps.
I like Scroogle as well.