You Know What About Me? Decoding My Digital Trail Across Major Platforms

Under European rules, users can request personal data from platforms, but few do, and the results are often hard to use. I accessed and parsed data from TikTok, Amazon, Google, and Instagram, uncovering surprising insights and useful tips.

We often hear the phrase “if you’re not paying for the product, you are the product.” Yet despite our concerns about data privacy and corporate surveillance, very few of us know what data these platforms collect about us. Even fewer take advantage of our legal right to access this information.

While European regulations are sometimes criticized for focusing on mundane issues like standardizing charging cables, there’s one European rule that has quietly spread worldwide and genuinely empowers users: the right to data portability under GDPR Article 15.

The Right You Often Don’t Remember You Have

Under Article 15 of the GDPR, individuals (data subjects) have the right to obtain a copy of their personal data held by a data controller, as well as information on how and why it is being processed.

This isn’t just a European privilege, most global services implement the same data takeout process for all users worldwide, regardless of location, because it’s too complicated to maintain separate systems.

The process is straightforward but asynchronous. Companies have up to 30 days to provide your data (extendable to 3 months with good reason), and it must be provided in a structured, machine-readable format—typically CSV files, JSON, or text files. Best of all, it’s usually free of charge unless you make excessive requests.

From Duolingo to Dropbox, from small German cloud services to tech giants like Google, virtually every digital service now offers some form of data takeout. You simply navigate to your account settings, request your data, and receive download links via email within days or weeks.

The Reality: Nobody’s Looking

Despite this powerful right, actual usage remains remarkably low. In my experience working as a software architect implementing these systems, only a tiny fraction of users ever request their data. The reasons are twofold: users don’t know about this option or find it cumbersome, while service providers have no incentive to promote it. After all, data egress from cloud services is expensive, and encouraging users to download their data represents a pure cost with no business benefit.

But the results can be eye-opening. File sizes range from a few megabytes for simple services to multiple terabytes for users with extensive cloud storage or long platform histories.

951825487 Dez 5 12:04 data-takeout-amazon-renato-losio-20241205.zip
5195593 Mai 6 19:51 duolingo.zip
2514113 Nov 30 12:55 TikTok_Data_1732739019.zip
2426598 Apr 6 14:59 TikTok_Data_1743792620.zip
6546575709 Mär 29 2018 gmail.zip
1102106991 Mär 28 2018 google-20180328T154946Z-001.zip

Inside the Black Box: What TikTok Collects

TikTok, despite its controversial reputation, actually provides one of the most user-friendly data takeouts I’ve encountered. The folder structure is intuitive, with clearly labeled directories like “Activity” containing the most interesting information.

The login history alone reveals extensive tracking: every login is recorded with location data, network type (WiFi vs. mobile), carrier information, and device details. But the real revelation lies in the watch history data.

Date: 2024-10-06 05:43:41
IP: 109.42.113.107
Device Model: iPhone13,3
Device System: iOS 17.5.1
Network Type: 4G
Carrier:

For one account I analyzed, the numbers were staggering:

  • 188,000 videos shown through the feed
  • 43,000 videos watched to completion (about 25%)
  • An average of over 500 videos per day
  • Peak days reaching nearly 2,000 videos



(Note: data may have delays of up to several days)
Videos shared since account registration: 166
Videos watched to the end since account registration: 43494
Videos commented on since account registration: 1375

Using simple data analysis tools (even ChatGPT for non-developers), you can uncover viewing patterns by hour and day of the week, identify inactive periods, and calculate session durations. Most surprisingly, the claim that TikTok’s algorithm only needs 260 videos to create addiction proved conservative; this threshold could be reached in less than 17 minutes of usage.

Google’s Omniscient Eye

Google’s data takeout is perhaps the most comprehensive, covering its vast ecosystem of services. For location data alone, the scale is breathtaking. A typical Android user’s location history contains hundreds of thousands of data points spanning years.

In one analysis of 10 years of location data, I found:

  • 332,000 location data points
  • Roughly 100+ location pings per day
  • Timestamps are accurate to the millisecond
  • Activity inference (in vehicle, walking, stationary)
  • Complete movement patterns mapping every journey

This data is so detailed that you could reconstruct someone’s entire decade of movement, identify their home and work locations, track their travel patterns, and even infer their lifestyle habits. The precision is both impressive and unsettling.

{
"timestampMs" : "1505486494021",
"latitudeE7" : 525277253,
"longitudeE7" : 133817500,
"accuracy" : 20,
"altitude" : 75,
"activity" : [ {
"timestampMs" : "1505486658517",
"activity" : [ {
"type" : "STILL",
"confidence" : 71
}, {
"type" : "UNKNOWN",
"confidence" : 18
}, {
"type" : "IN_VEHICLE",
"confidence" : 12
} ]
}

Amazon’s Data Labyrinth

Amazon’s data takeout presents a stark contrast to TikTok’s user-friendly approach. The folder structure is bewildering, with dozens of cryptically named directories requiring careful exploration to understand their contents.

Hidden within this maze are surprising insights:

  • Complete lists of advertising audiences you’ve been placed in
  • Prime Video location tracking showing which countries you’ve streamed from
  • Every PDF invoice from decades of purchases
  • Detailed records of all product searches and browsing history

For one user’s Prime Video data over a two-year period, location tracking revealed that 75% of their time was spent in Germany, 18% in Italy, and in various other locations, creating an accurate picture of their international movement based solely on streaming activity.

Beyond Complaint: Taking Action

The power of data takeouts extends beyond satisfying curiosity. This information enables:

Better Privacy Awareness: Understanding exactly what data companies collect helps inform decisions about privacy settings and service usage.

Parental Conversations: Having concrete data about social media usage patterns provides a factual foundation for discussions with children about screen time and digital habits.

Personal Insights: Identifying usage patterns, behavioral trends, and even recovering lost information (like old receipts buried in years of purchase history).

Corporate Accountability: Companies can only be held accountable for data practices when users understand what data is being collected.

Practical Tips for Data Exploration

For those interested in exploring their digital footprint:

Start Simple: Begin with smaller services before tackling comprehensive platforms like Google or Amazon.

Sanitize Sensitive Data: When using AI tools for analysis, remove personal identifiers and URLs to protect privacy.

Focus on Patterns: Look for trends and usage patterns rather than processing every data point.

Use Available Tools: Non-developers can leverage ChatGPT or similar tools to analyze data patterns without coding skills.

Expect Variety: There’s no standard format across platforms, but most use JSON, CSV, or plain text files that are reasonably accessible.

The Path Forward

Privacy begins with awareness. We cannot meaningfully discuss data rights, corporate responsibility, or digital privacy without first understanding what information we’re sharing. The tools exist (legally mandated and technically accessible) to pull back the curtain on our digital footprints.

Rather than simply complaining about data issues, we can take concrete action. Request your data, explore what’s there, and use that knowledge to make informed decisions about your digital life. The companies won’t advertise this option, but they’re legally required to provide it when asked.

Your data belongs to you. The first step in taking control is knowing what’s there to control.

The author presented this session at re:publica 25, demonstrating live data analysis from major platforms while protecting personal privacy through data sanitization techniques. The first draft of this article has been generated starting from the session audio, using Amazon Transcribe and Claude Sonnet 4.