Table of Contents

  1. Architecture overview
  2. Layer 1 — IP intelligence
  3. Layer 2 — User-agent & header analysis
  4. Layer 3 — JavaScript behavioral fingerprinting
  5. The scoring engine
  6. Page routing decision
  7. Performance considerations
  8. How platforms try to evade cloakers

Architecture Overview

A modern cloaking system is a multi-layer visitor classification engine that sits between the ad click and the destination page. It processes incoming requests in under 50 milliseconds and makes a binary decision: serve the safe page or the money page.

Ad click → Cloaking server receives HTTP request
Layer 1: IP intelligence check (< 5ms)
Layer 2: User-agent / header analysis (< 2ms)
Layer 3: JS behavioral fingerprinting (async, 200–800ms)
Scoring engine: combine signals → classification score
↓ ↓
Score ≥ threshold → BOT → Safe page
Score < threshold → HUMAN → Money page

The architecture is almost always implemented at the edge (reverse proxy, CDN worker, or a dedicated server) so latency stays minimal. Layers 1 and 2 are synchronous — they happen before the HTML response is sent. Layer 3 is asynchronous and client-side — it runs after the initial page load.

Layer 1 — IP Intelligence

The first layer checks the visitor's IP address against several databases simultaneously:

ASN (Autonomous System Number) Matching

Every IP address belongs to an ASN — a block of IPs owned by a specific organization. Ad platforms like Meta own specific ASN ranges (AS32934, AS63293). When an IP from one of these ASNs hits the cloaking server, it's immediately flagged as a potential bot. This is fast, reliable, and covers the vast majority of automated crawlers.

Datacenter IP Detection

Cloud providers (AWS, Google Cloud, Azure, DigitalOcean, Hetzner, OVH) maintain well-documented IP ranges. Real users almost never access websites from datacenter IPs — these are used by servers, bots, and VPN users. Any IP classified as datacenter is scored heavily toward "bot."

Known Proxy / VPN Detection

Commercial VPN and proxy services (NordVPN, ExpressVPN, residential proxy networks) maintain IP pools that are tracked by IP intelligence services. A visitor coming from a known proxy exit node gets a high bot score, since ad platform reviewers use these to mask their identity.

IP Reputation Scoring

Beyond the above, quality IP intelligence services like MaxMind, IPinfo, or proprietary databases maintained by cloaking providers assign each IP a fraud/bot probability score based on historical behavior patterns. This provides a granular signal rather than a binary flag.

Database freshness is critical: Ad platforms rotate their crawler IPs regularly. A cloaker using a static IP list from 6 months ago will have a high false negative rate for the newest platform IPs. Top cloakers update their databases daily or in near-real-time.

Layer 2 — User-Agent & Header Analysis

After the IP check, the system examines the full HTTP request headers:

User-Agent String Parsing

The User-Agent header identifies the browser and operating system. Known crawler UAs (facebookexternalhit/1.1, Googlebot/2.1, TikTokBot, LinkedInBot) are directly flagged. Headless Chrome (HeadlessChrome substring) and Puppeteer/Playwright-generated UAs are also detectable.

Header Consistency Analysis

Real browsers send a consistent set of headers with their requests. For example, a Chrome browser on Windows 11 will always send Sec-Ch-Ua, Sec-Ch-Ua-Platform, and Sec-Fetch-Dest headers with specific expected values. Bots that spoof user-agent strings often fail to spoof all accompanying headers consistently. Detecting these inconsistencies is highly reliable.

Accept-Language and Accept-Encoding

The Accept-Language header indicates the user's preferred language, which should match their geo. An IP geolocated to France with Accept-Language: zh-CN is anomalous. Similarly, bots often send minimal Accept-Encoding values compared to real browsers.

Layer 3 — JavaScript Behavioral Fingerprinting

This is the most sophisticated — and most important — layer. It runs client-side via a JavaScript snippet loaded on the page, and collects signals that only real human interaction can generate.

Mouse Movement Entropy

Real users move their mouse in curved, irregular paths influenced by their hand's physical inertia. Bots move in straight lines or jump directly between coordinates. The fingerprinting script records mouse trajectory data and computes the entropy (randomness) of the movement. Low entropy = likely bot.

Touch Event Analysis (Mobile)

On mobile, real users swipe and tap with natural variation in velocity, pressure, and coordinates. Device farm operators tapping rapidly through pages produce rhythmically consistent touch events. The script measures touch event timing variance and flags overly regular patterns.

Scroll Behavior

Real users scroll to read content — they pause at sections, scroll at human speed, and show acceleration/deceleration patterns. Bots either don't scroll at all, or scroll at a constant uniform velocity. Scroll depth combined with time-on-page is a strong combined signal.

Browser API Consistency

The script probes browser APIs to detect headless environments:

Session Velocity

If the same IP (or IP subnet) generates 50 page visits in 10 minutes with near-identical behavioral patterns, this is a review sweep, not organic traffic. Session velocity analysis identifies coordinated bot traffic even when individual sessions look clean.

The Scoring Engine

Each signal from all three layers produces a sub-score. The scoring engine aggregates these using a weighted model:

Signal Weight Why It Matters
ASN = known platform Very High Direct indicator of ad platform crawler
Datacenter IP High Real users rarely use datacenter IPs
Known proxy/VPN High Reviewer masking technique
navigator.webdriver = true Very High Direct headless browser signal
WebGL = SwiftShader High Headless Chrome rendering engine
Mouse entropy low Medium Automated interaction pattern
No scroll / instant scroll Medium Non-human reading behavior
Session velocity spike High Coordinated review sweep
Header inconsistency Medium-High Spoofed UA without matching headers

The final score is a number between 0 (definitely human) and 100 (definitely bot). The classification threshold is typically configurable — a more conservative threshold (e.g., 60) passes fewer humans as bots (lower false positive rate) but lets more reviewers through (higher false negative rate). A stricter threshold (e.g., 40) does the opposite.

Page Routing Decision

Once the classification is made, the routing is straightforward:

The best practice is to serve the safe page in-place (not redirect to a different URL), because a redirect adds a visible URL change that a human reviewer or automated monitoring tool could flag as suspicious.

Performance Considerations

Cloaking has to be fast. A cloaking layer that adds 2 seconds to your page load will destroy your Quality Score on Google Ads and hurt conversions everywhere. Performance requirements:

How Platforms Try to Evade Cloakers

Ad platforms are constantly evolving their detection evasion techniques. Here's what they currently deploy against cloaking:

This is why the behavioral fingerprinting layer (Layer 3) has become the most important. A residential IP running a full JavaScript-capable browser can bypass Layers 1 and 2 — only behavioral signals can catch it.

Multi-Layer Cloaking, Built for 2026

CloakTrack combines IP intelligence, header analysis, and JavaScript behavioral fingerprinting into a single platform with real-time dashboards and analytics.

See How CloakTrack Works →