Guide

GDPR compliance for analytics

The GDPR doesn’t ban analytics — it asks you to have a reason, be transparent, and stay in control of the data. Here’s the legal framework applied to product analytics: lawful basis, consent, data-subject rights, international transfers, and breach duties.

The General Data Protection Regulation (GDPR, in force since 2018) governs how you process the personal data of people in the EU and EEA — and its reach is extraterritorial, so a company anywhere that offers goods or services to people in the EU is in scope. It doesn’t prohibit analytics. It requires that you have a lawful reason to collect data, tell people what you’re doing, collect no more than you need, and honour their rights over it. This guide maps that framework onto product analytics. It is not legal advice: GDPR is nuanced and enforced differently across member states, so treat this as a starting point and confirm with counsel. For the principle-level practices — cookieless tracking, data minimisation, residency — see the companion privacy-first analytics guide; if you also serve users in India, the DPDP Act guide is the parallel read.

Who’s who under GDPR

  • Data subject — the individual the personal data is about.
  • Controller — whoever decides the why and how of processing. If you run the product and choose what to track, that’s you, and the accountability sits with you.
  • Processor — anyone processing on the controller’s behalf, like a hosted analytics vendor. You need a data-processing agreement (Article 28) with each one.
  • Data Protection Officer — required for some organisations (large-scale monitoring, sensitive data at scale, public authorities).

For analytics this distinction is the crux: sending events to a third-party cloud makes that vendor your processor and leaves you accountable as controller. Self-hosting keeps the whole chain in-house.

Personal data is broader than you think

GDPR’s definition of personal data covers online identifiers — cookies, device IDs, advertising IDs, and in most cases IP addresses — even before anyone logs in. So the assumption that “anonymous” traffic data sits outside GDPR is usually wrong. Product analytics, which ties events to a person on purpose, is squarely in scope the moment an event is linked to an identifiable user.

The most defensible event is the one you never needed to collect. Start from the questions you must answer, then capture only what those require.

Lawful basis: you need one before you start

GDPR gives six lawful bases for processing: consent, contract, legal obligation, vital interests, public task, and legitimate interests. For behavioural analytics, two are realistic in practice — consent or legitimate interests — and which applies depends on how intrusive the tracking is and the expectations of your users. Pick the basis before you instrument, and document it.

There’s a second, separate gate that trips up a lot of teams: the ePrivacy Directive (the “cookie law”). It generally requires prior consent before storing or reading non-essential cookies or local storage on a device — regardless of which GDPR lawful basis you later rely on. Most analytics touches device storage, so most analytics needs that consent first. Cookieless, minimal setups can reduce or remove the banner, but the requirement varies by jurisdiction.

Consent done properly

When you do rely on consent, GDPR sets a high bar: it must be freely given, specific, informed, and unambiguous, by a clear affirmative action — no pre-ticked boxes, no bundling. And it must be as easy to withdraw as it was to give. The practical requirement for an analytics stack is simple to state and easy to get wrong: don’t collect until you have consent, and stop the moment it’s withdrawn. A consent-first SDK makes that the default — start with tracking denied so nothing fires, then opt in:

init(projectId, { apiKey, defaultTrackingConsent: 'denied' })

// only after the user agrees, via clear affirmative action:
optInTracking()

The Pug Web SDK behaves this way: with consent denied, no listeners attach and manual track()/identify() calls are dropped rather than quietly queued for later replay, and revoking consent tears the listeners back down.

The principles you instrument around

  • Purpose limitation: use the data only for the purpose you disclosed — don’t repurpose analytics events for ad targeting or model training.
  • Data minimisation: collect only what the purpose needs; every extra trait is extra risk.
  • Storage limitation: keep data only as long as needed, then erase it — set and document retention windows.
  • Accuracy, integrity & confidentiality: keep it correct and keep it secure.
  • Accountability: be able to show how you meet all of the above.

Auditing your tracking plan for personal and sensitive fields before they ship is a good habit. The free PII event auditor flags identifiers in an event schema in the browser as a fast first pass — a helper, not a compliance sign-off.

Data-subject rights and DSARs

GDPR gives individuals rights to access, rectification, erasure (the “right to be forgotten”), restriction, portability, and objection, generally answerable within a month. For analytics, access and erasure are the ones with teeth: a data-subject access request (DSAR) means you must locate a person’s data, and an erasure or objection request means you must delete or stop. That’s far easier when you own the data store — with a self-hosted setup, events live in your own database where you can query, export, and purge by identity. The SDK’s reset() clears identity on the client at logout, but honouring a DSAR is a server-side operation on data you hold.

International transfers

Chapter V restricts sending personal data outside the EU/EEA unless the destination offers adequate protection — via an adequacy decision, Standard Contractual Clauses (SCCs), or a framework like the EU–US Data Privacy Framework. The Schrems II judgment invalidated the previous US arrangement and forced a hard look at transfers to US-based cloud services, which is exactly why standard Google Analytics setups drew rulings from several EU authorities. The cleanest way to take the transfer question off the table is to not transfer: keep event data on infrastructure in a region you choose. This is where self-hosting earns its keep.

Security, breaches, and penalties

Controllers must apply appropriate security measures and, on a personal-data breach, notify the supervisory authority within 72 hours where feasible, and affected individuals when the risk is high. The penalties are why GDPR has teeth: up to €20 million or 4% of global annual turnover, whichever is higher. The practical takeaway for analytics is the same as everywhere else — the less personal data you hold and the more control you have over where it lives, the smaller your exposure.

A practical GDPR checklist for analytics

  • Pick a lawful basis (usually consent or legitimate interests) and document it before you instrument.
  • Get cookie consent first where ePrivacy applies; default tracking to off.
  • Make withdrawal real: stop collecting the instant consent is revoked.
  • Minimise: capture only the events and traits your purpose needs — audit the plan for PII.
  • Honour rights: be able to fulfil access, erasure, and portability requests.
  • Set retention: keep data only as long as needed, and document why.
  • Control transfers: avoid sending data outside the EU/EEA, or rely on a valid mechanism.
  • Sign a DPA with every processor; plan for 72-hour breach notification.

Where open source and self-hosting fit

Open source helps on two fronts GDPR cares about: you can read exactly how data is handled, and you can self-host so it stays on servers you control, in a region you choose. Pug is AGPL-3.0 and self-hostable for exactly this reason — events stay in your own infrastructure, the consent-first SDK keeps collection off until a user agrees, and you hold the data store where retention, access, and erasure actually happen. None of that is automatic compliance, and none of it is legal advice. But a lawful basis, consent-first collection, minimisation, and self-hosting together give you a strong, defensible foundation — the same one that helps with the DPDP Act in India.

FAQ

Common questions

Is Google Analytics GDPR-compliant?

It’s complicated, and that’s the point. Several EU data-protection authorities have ruled that standard Google Analytics implementations transferred personal data to the US unlawfully after the Schrems II judgment. The EU–US Data Privacy Framework changed the picture again in 2023, but the safest way to remove transfer risk entirely is to keep analytics data on infrastructure you control. This isn’t legal advice — confirm your setup with counsel.

Do I need consent for analytics under GDPR?

Two separate questions apply. The ePrivacy Directive (the “cookie law”) generally requires prior consent before storing or reading non-essential cookies or local storage — which covers most analytics. Separately, GDPR needs a lawful basis to process the resulting personal data; for behavioural analytics that’s usually consent, sometimes legitimate interests. Cookieless, minimal setups can reduce the consent burden, but requirements vary by jurisdiction.

What’s the difference between a controller and a processor?

The controller decides why and how personal data is processed — if you run the product and choose what to track, that’s you. A processor acts on the controller’s instructions, like a hosted analytics vendor. Controllers carry the primary accountability and must have a data-processing agreement (Article 28) with each processor.

How does self-hosting help with GDPR?

Self-hosting keeps event data inside your own infrastructure, in a region you choose, which removes most cross-border-transfer questions and gives you direct control over retention, access, and erasure. It’s a strong foundation for compliance, not automatic compliance on its own.

Is this legal advice?

No. This is general guidance on applying GDPR principles to product analytics. GDPR is nuanced and enforcement varies by member state — review your specific obligations with a qualified professional.

Analytics that stay on your servers

Self-host Pug under AGPL-3.0 and keep every event inside infrastructure you control. Consent-first SDK, open source, free in beta.