Analytics Tracking: Privacy-First Event Collection

This article explains how we track user behavior while respecting privacy and avoiding bot traffic.

The Problem: Understanding User Behavior

We need to know:

  • Which pages users visit

  • Which products they view

  • Where traffic comes from (Google Ads, organic, social)

  • Which campaigns drive conversions

But we must avoid:

  • Tracking bots and crawlers

  • Storing personally identifiable information (PII)

  • Violating privacy regulations

The Solution: Client-Side + Server-Side Tracking

Client-Side: JavaScript Tracking

Visitor ID: Random ID stored in cookie (365 days)

Session ID: Random ID stored in sessionStorage (until browser close)

Campaign params: Extracted from URL and stored in sessionStorage

Tracked parameters:

  • gclid - Google Click ID (Search ads)

  • gbraid - Google Ads click ID (Shopping ads)

  • wbraid - Google Ads click ID (iOS)

  • fbclid - Facebook click ID

  • srsltid - Google organic search result ID

  • utm_source, utm_medium, utm_campaign, utm_term, utm_content

Storage: Parameters stored in cookies (30 minutes) for WhatsApp/phone click attribution

Server-Side: Enrichment

The server enriches events with:

GeoIP data: Country, region, city from IP address

User-Agent parsing: Browser, OS, device type

Timestamp: Server time (UTC)

Bot detection: Filters known bot user-agents

Event Types

Page view: User visits a page

Product view: User views product page

Add to cart: User adds product to cart

Checkout: User initiates checkout

Purchase: User completes purchase

WhatsApp click: User clicks WhatsApp button

Phone click: User clicks phone number

Data Flow

sequenceDiagram
    participant User
    participant JS as JavaScript
    participant API as /api/analytics
    participant Firehose as Kinesis Firehose
    participant S3
    
    User->>JS: Visit page
    JS->>JS: Extract URL params
(gclid, utm_*, etc.) JS->>JS: Store in sessionStorage JS->>API: POST event + params API->>API: Enrich with GeoIP API->>API: Parse User-Agent API->>API: Filter bots API->>Firehose: Send enriched event Firehose->>S3: Store in analytics bucket

Bot Detection

We filter bot traffic using multiple signals:

User-Agent patterns: Known bot strings (Googlebot, Bingbot, etc.)

Behavior patterns: Too fast, too many requests

Missing JavaScript: Bots often don't execute JS

Exclusion cookie: tv_exclude=true stops all tracking

Privacy Protection

No PII: We never store names, emails, phone numbers

Anonymized IPs: Last octet removed before storage

No cross-site tracking: Cookies are first-party only

Opt-out: Users can set exclusion cookie

Data retention: Events deleted after 90 days

Conditional Pixel Loading

We only load tracking pixels when relevant:

Google Ads pixel: Only if gclid, gbraid, or wbraid present

LinkedIn pixel: Only if msclkid present

Facebook pixel: Only if fbclid present

Benefit: Faster page loads, less tracking overhead

Traffic Source Detection

We detect traffic source from URL parameters:

Google Ads: gclid, gbraid, wbraidutm_source=google_ads

Google Organic: srsltidutm_source=google_search

Facebook: fbclidutm_source=facebook

LinkedIn: msclkidutm_source=linkedin

Direct: No parameters → utm_source=direct

Conversion Tracking

We track conversions through the funnel:

Product viewAdd to cartCheckoutPurchase

Each step includes:

  • Visitor ID (for attribution)

  • Session ID (for session analysis)

  • Campaign params (for ROI calculation)

  • Product SKU (for product analysis)

Lead Touch Tracking

When users contact us (WhatsApp, phone, email), we capture:

Contact method: WhatsApp, phone, email

Campaign params: From cookies (30-minute window)

Product context: Which product page they were on

Benefit: Attribute offline conversions to online campaigns

Rate Limiting

Analytics endpoint is rate-limited:

Limit: 100 requests per 10 minutes per IP

Benefit: Prevents abuse and bot floods

Storage

Events are stored in S3 via Kinesis Firehose:

Format: JSON lines (one event per line)

Partitioning: By date (year/month/day/hour)

Compression: Gzip

Retention: 90 days

Querying

Events are queried via AWS Athena:

Schema: Defined in Glue Data Catalog

Queries: SQL on S3 data

Use cases: Campaign ROI, product popularity, traffic sources

References

Technical Concepts

AWS Services

Related Articles

Summary

Our analytics system tracks user behavior while respecting privacy:

Client-side:

  • ✅ Extract campaign params from URL

  • ✅ Store in sessionStorage (session-scoped)

  • ✅ Store in cookies (30 min for attribution)

  • ✅ Send events to API

Server-side:

  • ✅ Enrich with GeoIP and User-Agent

  • ✅ Filter bot traffic

  • ✅ Send to Kinesis Firehose

  • ✅ Store in S3 (partitioned by date)

Privacy:

  • ✅ No PII stored

  • ✅ Anonymized IPs

  • ✅ First-party cookies only

  • ✅ Opt-out available

  • ✅ 90-day retention

Conditional loading:

  • ✅ Google Ads pixel only if gclid present

  • ✅ LinkedIn pixel only if msclkid present

  • ✅ Facebook pixel only if fbclid present

This approach balances insights with privacy and performance.


← Back to Documentation Index