Why Better Platform Data Won’t Show Regulators What They Need to See
Claire Stravato Emes / Mar 19, 2026
Alan Warburton / Social Media / © BBC / CC-BY 4.0
The Digital Services Act gave Europe something genuinely new: a legal right for researchers to access platform data. Article 40 replaced years of precarity — scraping, sock puppets, begging for API keys — with a standardized, legally protected process. The €120 million fine against X in December 2025, which included €40 million for researcher data-access failures, showed the provision has teeth.
But what happens after that fight is won? Even in the best-case scenario, where the platform is cooperating and provides improved APIs with full access for researchers, a critical category of evidence the DSA requires will still be missing. Platform data, however complete, can show what the system did. It cannot tell you what that did to people. Both matter for enforcing Article 34, but only one is currently being made available under the law.
The good news is that the other can be.
What the current framework sees — and where it stops
Recent empirical studies document several structural limitations associated with Article 40's data access provisions. Analyses of platform data access indicate that Research APIs omit approximately 50% of the content shown to users on Instagram's Explore feed and 25% of videos on TikTok's For You feed. Additionally, some of these APIs filter out up to 83% of contextual metadata compared to what is transmitted to the web interface. Engagement metrics retrieved through APIs can also differ significantly from the live values displayed to users, and coverage is systematically skewed toward Anglophone male creators. Data availability fluctuates over time; audits show that between 17.7% and 23.3% of posts become inaccessible within weeks, a trend compounded by Meta and TikTok's requirements that researchers routinely delete content the platforms later remove or invalidate.
Beyond data access, auditing the underlying algorithmic systems presents its own challenges. Recommender systems continuously evolve, so findings that accurately describe a system at the time of observation do not hold months later. Minor methodological variations — such as differences in account parameters, session durations, or content seeding strategies — can lead researchers to draw materially different conclusions about platform behavior. Analysis of TikTok's compliance audit found that auditors could not verify whether system modifications were made across the evaluation period. Enforcement proceedings take one to three years, and by the time a case concludes, the platform being challenged may no longer exist in the form that was studied.
From data donation to participatory evidence
Before the DSA, researchers had already begun looking for alternatives to platform-controlled access. Data donation, for example, emerged as a legally grounded alternative to scraping or closed APIs. It also allowed individuals to exercise their GDPR data access rights to obtain and share what platforms hold about them. Projects like Mozilla's YouTube Regrets and The Markup's Citizen Browser have shown that when platform data is collected from the user's side, it is possible to gain better insights into recommendation patterns, demographic disparities and harms otherwise invisible to API-based research.
But changing who requests the data does not change where it comes from. GDPR access requests are often incomplete, inconsistently formatted, and unstable. Studies of TikTok data packages found a severe "links-only problem": Platforms export raw lists of timestamped URLs without the video content or contextual metadata needed for meaningful analysis, and once those links expire, the interpretive context is lost permanently.
Data donation may bypass the API gate, but not the platform's underlying decisions about what to retain, how to structure it, and what to release. Most critically, a log showing that a user was served a sequence of body-image videos at 11 PM tells the researcher nothing about whether that sequence amounted to a harmful rabbit hole or a deliberate search. Data donation records what the platform delivered, not what that delivery did to the person who received it.
The deeper pattern: systems evaluated without the people they affect
These evidentiary gaps are part of a broader pattern. Across the EU, digital governance systems that shape people's lives are increasingly evaluated in ways that exclude the very people they affect.
The AI Act's regulatory sandboxes (Articles 57–59) are collaborative instruments between regulators and companies –controlled environments for mutual learning in which users are not present. Even the Act's provisions for testing in real-world conditions (Articles 60–61), which require freely given informed consent from test subjects, cast the user as someone the system is tested on, not someone who participates in assessing what it did. Consent is required; voice is not.
What Professor Sofia Ranchordas has called "administrative blindness", the structural inability of governance institutions to see certain citizens, extends to the DSA's evidence architecture: Platforms report, researchers analyze, auditors certify, regulators act, and the people whose fundamental rights are at stake contribute nothing to the evidentiary record on which their protection depends.
The experimentalism is real — sandboxes, iterative enforcement, graduated compliance. But it applies to how regulators govern, not to the evidence they use to govern.
Why the user needs to become an evidence producer
This matters all the more as platforms do not produce fixed, predictable outputs. Their effects emerge from the interaction between algorithmic design and human behavior. What a user sees depends on what they did last, which in turn shapes what the system offers next, and that, in turn, reshapes what they do after that. Even the companies that built these systems cannot reliably predict the outcomes. If the system's behavior cannot be separated from the encounter that produced it, then evidence that records only the system's side will always be incomplete. The user cannot remain merely a subject of regulation or a donor of platform-generated traces. The user needs to become an active producer of regulatory evidence, generating independent data that the platform inherently lacks.
This is not as radical as it sounds. In fields that study complex systems (clinical psychology, public health, ecology), researchers developed new methods in the 1990s precisely because retrospective and aggregate methods failed to capture what happens at the point of encounter between a system and a person. Such approaches can use brief, high-frequency self-reports collected in everyday environments to generate data on subjective states, behaviours, and contextual factors, with high compliance rates across vulnerable populations.
Applied to platform governance, such methods could capture what no API or data donation package can: how algorithmic exposure interacts with individual vulnerability, in context, over time. When a participant reports elevated distress after an extended viewing session, or when body-image indicators shift over days of tracked exposure, that is evidence the platform never produced because it originates with the person, not the system.
Building a second instrument
User-side observability is not a replacement for Article 40 — it is what Article 40 needs beside it. Platform-side data establishes the mechanism, how the system behaves, while user-side data establishes the impact, what that behavior does to the people who encounter it. Together, they provide the enforcement of the proportionality assessment that the DSA actually requires. None of this requires legislative change. User-side evidence could be recognized in enforcement proceedings, and Digital Services Coordinators could begin triangulating platform self-reporting against independently generated impact data within the current architecture.
The people closest to the harm are currently the furthest from the evidence base. Producing regulatory evidence today requires API access, legal capacity, and specialist computational skills — requirements that systematically exclude the civil society organizations and communities best placed to identify how harm actually materializes.
But producing good evidence is not enough; it must be evidence that institutions will treat as authoritative. The evidence formats that regulatory institutions currently privilege (quantitative, platform-compatible, computationally derived) are the formats platforms are best equipped to produce. It is an artifact of the same informational asymmetry the DSA was designed to correct.
The systems that organize our information, shape our attention, and mediate our public life are being evaluated in closed rooms, using evidence they themselves produced, without meaningful input from the people they affect. That is not a technical limitation. It is a political choice and one that the ambitions of both the DSA and the AI Act should compel us to revisit.
Authors
