Cloudflare Worker (manual setup)

Before you start

Quick checklist—then follow the numbered steps below.

Cloudflare in front of your store. The hostname shoppers use must be DNS-proxied (orange cloud) in the same Cloudflare account where you create this Worker.
Storefront routes. You’ll wire hostnames to this Worker in step 4 (often apex, www, and/or wildcards).
API key. An sm_live_… key from SearchMention Dashboard → Settings for the project that should receive AI Traffic for this store (plan limits apply).
Worker script. Copy worker.js from the block below (same file as cloudflare-worker/worker.js). Optional: integration README for troubleshooting.

1 Create and deploy a starter Worker

In the Cloudflare dashboard, open Workers & Pages from the sidebar (under Build), then click Create application (top right).

Cloudflare Workers and Pages with Create application button — Start from the Workers & Pages list.

When asked how to begin, pick Start with Hello World! to use Cloudflare’s starter template.

Create a Worker screen with Start with Hello World option — Simplest path for dashboard-only setup.

On the Deploy Hello World screen you only choose a Worker name and click Deploy. This step does not open the full code editor—you’re just publishing the starter so the Worker exists on workers.dev. You’ll open Edit code in step 2, then paste SearchMention’s script in step 3.

Deploy Hello World step with Worker name and Deploy button — Name + Deploy only—the editable IDE comes after deploy (next section).

2 Open the code editor

From your Worker’s Overview, click Edit code. That opens the full dashboard IDE—you can’t replace the whole script from the Deploy Hello World screen in step 1.

Worker overview with Edit code button highlighted — Open the editor from Overview—real editing happens in the next step.

3 Paste worker.js and deploy

In the editor’s worker.js tab, select all existing code and replace it with SearchMention’s script from the copy block below (same file as cloudflare-worker/worker.js). Then click Deploy in the editor toolbar. Until you add the secret in a later step, the Worker may not successfully call the API—that’s expected until SEARCHMENTION_API_KEY exists.

worker.js — copy entire file

Select all text in the box (⌘A / CtrlA), copy, paste into the Cloudflare editor, then Deploy.

/**
 * SearchMention AI Traffic Tracker — CloudFlare Worker
 *
 * Detects three classes of AI traffic on ecommerce sites and reports
 * them to the SearchMention API:
 *
 *   1. bot_training     — Crawlers harvesting content for model training
 *                         (robots.txt respected, long-term visibility signal)
 *   2. ai_search_fetch  — User-triggered fetchers: an AI assistant is
 *                         loading this page RIGHT NOW to answer a user's
 *                         question (ChatGPT-User, Perplexity-User, etc.)
 *   3. human_referral   — A real human clicked a link in an AI interface
 *                         (chatgpt.com, gemini.google.com, etc.)
 *
 * Detection is prioritized by 2026 ecommerce traffic share:
 *   ChatGPT (~78% of AI referrals) > Gemini (~9%) > Perplexity (~7%)
 *   > Copilot (~3%) > Claude (~3%) > others.
 *
 * Never blocks or modifies the response to the visitor.
 */

/* ---------- Bot user-agent detection ---------- */

// Training crawlers: harvest content for model training. Respect robots.txt.
// Business meaning: long-term brand presence in future model weights.
const TRAINING_BOTS = [
  { name: "GPTBot", pattern: /GPTBot/i, vendor: "OpenAI" },
  { name: "ClaudeBot", pattern: /ClaudeBot/i, vendor: "Anthropic" },
  { name: "anthropic-ai", pattern: /anthropic-ai/i, vendor: "Anthropic" },
  { name: "Google-Extended", pattern: /Google-Extended/i, vendor: "Google" },
  { name: "Applebot-Extended", pattern: /Applebot-Extended/i, vendor: "Apple" },
  { name: "Meta-ExternalAgent", pattern: /Meta-ExternalAgent/i, vendor: "Meta" },
  { name: "Bytespider", pattern: /Bytespider/i, vendor: "ByteDance" },
  { name: "CCBot", pattern: /CCBot/i, vendor: "CommonCrawl" },
  { name: "Amazonbot", pattern: /Amazonbot/i, vendor: "Amazon" },
  { name: "cohere-ai", pattern: /cohere-ai/i, vendor: "Cohere" },
  { name: "DeepSeekBot", pattern: /DeepSeek(?!.*User)/i, vendor: "DeepSeek" },
];

// Search/retrieval crawlers and user-triggered fetchers.
// Business meaning: immediate visibility in AI answers. For ecommerce,
// these hits often correlate with "AI agent is shopping on behalf of a user".
const SEARCH_FETCH_BOTS = [
  // OpenAI
  { name: "ChatGPT-User", pattern: /ChatGPT-User/i, vendor: "OpenAI" },
  { name: "OAI-SearchBot", pattern: /OAI-SearchBot/i, vendor: "OpenAI" },
  // Anthropic
  { name: "Claude-User", pattern: /Claude-User/i, vendor: "Anthropic" },
  { name: "Claude-SearchBot", pattern: /Claude-SearchBot/i, vendor: "Anthropic" },
  // Google
  { name: "Google-CloudVertexBot", pattern: /Google-CloudVertexBot/i, vendor: "Google" },
  { name: "Google-NotebookLM", pattern: /Google-NotebookLM/i, vendor: "Google" },
  { name: "GoogleAgent-Mariner", pattern: /Google-Agent|GoogleAgent|Mariner/i, vendor: "Google" },
  // Perplexity
  { name: "PerplexityBot", pattern: /PerplexityBot/i, vendor: "Perplexity" },
  { name: "Perplexity-User", pattern: /Perplexity-User/i, vendor: "Perplexity" },
  // Meta
  { name: "Meta-ExternalFetcher", pattern: /Meta-ExternalFetcher/i, vendor: "Meta" },
  // Microsoft / Mistral / DuckDuckGo
  { name: "DuckAssistBot", pattern: /DuckAssistBot/i, vendor: "DuckDuckGo" },
  { name: "MistralAI-User", pattern: /MistralAI-User|Mistral-User/i, vendor: "Mistral" },
  { name: "DeepSeek-User", pattern: /DeepSeek-User/i, vendor: "DeepSeek" },
];

/* ---------- Human referral detection ---------- */

// Ordered by 2026 ecommerce referral share. ChatGPT first = fastest exit
// path for the majority of real AI traffic.
const AI_REFERRER_DOMAINS = [
  {
    name: "ChatGPT",
    // Covers chatgpt.com, chat.openai.com, and the Atlas browser's
    // in-chat origin (chatgpt.com/c/...)
    domains: ["chatgpt.com", "chat.openai.com", "chatgpt.openai.com"],
  },
  {
    name: "Gemini",
    // gemini.google.com is the chat surface. google.com AI Mode appears
    // with gemini or AI-specific query params but comes from google.com,
    // so we handle that via UTM fallback below rather than blanket-match
    // google.com (which would false-positive regular Google organic).
    domains: ["gemini.google.com"],
  },
  {
    name: "Perplexity",
    domains: ["perplexity.ai", "www.perplexity.ai"],
  },
  {
    name: "Copilot",
    domains: [
      "copilot.microsoft.com",
      "www.bing.com/chat",
      "bing.com/chat",
    ],
  },
  {
    name: "Claude",
    domains: ["claude.ai", "www.claude.ai", "claude.com"],
  },
  {
    name: "Meta-AI",
    domains: ["meta.ai", "www.meta.ai"],
  },
  {
    name: "Grok",
    domains: ["grok.com", "www.grok.com", "x.ai", "grok.x.ai"],
  },
  {
    name: "DeepSeek",
    domains: ["chat.deepseek.com", "deepseek.com"],
  },
  {
    name: "Mistral",
    domains: ["chat.mistral.ai"],
  },
];

// Build a hostname -> platform lookup once per worker isolate.
const REFERRER_HOST_INDEX = (() => {
  const index = new Map();
  for (const platform of AI_REFERRER_DOMAINS) {
    for (const domain of platform.domains) {
      // Strip any path fragment (e.g. "bing.com/chat") and key on host only.
      const host = domain.split("/")[0].toLowerCase();
      if (!index.has(host)) index.set(host, platform.name);
    }
  }
  return index;
})();

// UTM fallback: when a real human clicks an AI link, the referrer header
// is often stripped (mobile apps, in-app browsers, no-referrer policy).
// ChatGPT, Perplexity, and Gemini frequently append utm_source tags.
// Treat these as a weaker signal — separate visit_type so downstream can
// distinguish confirmed referrers from inferred ones.
const UTM_SOURCE_MAP = new Map([
  ["chatgpt.com", "ChatGPT"],
  ["chatgpt", "ChatGPT"],
  ["openai", "ChatGPT"],
  ["perplexity.ai", "Perplexity"],
  ["perplexity", "Perplexity"],
  ["gemini.google.com", "Gemini"],
  ["gemini", "Gemini"],
  ["google_ai_mode", "Gemini"],
  ["copilot.microsoft.com", "Copilot"],
  ["copilot", "Copilot"],
  ["claude.ai", "Claude"],
  ["claude", "Claude"],
  ["meta.ai", "Meta-AI"],
  ["grok", "Grok"],
  ["x.ai", "Grok"],
]);

/* ---------- Detection functions ---------- */

function detectBot(userAgent) {
  if (!userAgent) return null;
  for (const bot of SEARCH_FETCH_BOTS) {
    if (bot.pattern.test(userAgent)) {
      return { name: bot.name, vendor: bot.vendor, category: "ai_search_fetch" };
    }
  }
  for (const bot of TRAINING_BOTS) {
    if (bot.pattern.test(userAgent)) {
      return { name: bot.name, vendor: bot.vendor, category: "bot_training" };
    }
  }
  return null;
}

function detectAiReferrer(referer) {
  if (!referer) return null;
  try {
    const host = new URL(referer).hostname.toLowerCase();
    const platform = REFERRER_HOST_INDEX.get(host);
    return platform ? { platform, host } : null;
  } catch (_) {
    return null;
  }
}

function detectUtmAiSource(url) {
  try {
    const u = new URL(url);
    const source = (u.searchParams.get("utm_source") || "").toLowerCase().trim();
    if (!source) return null;
    const platform = UTM_SOURCE_MAP.get(source);
    return platform ? { platform, raw: source } : null;
  } catch (_) {
    return null;
  }
}

/* ---------- Privacy ---------- */

// Truncate IPv4 to /24 and IPv6 to /64. This is the standard approach
// for GDPR-compliant analytics — preserves geographic signal while
// removing user identifiability.
function anonymizeIp(ip) {
  if (!ip) return null;
  if (ip.includes(".")) {
    const parts = ip.split(".");
    if (parts.length === 4) return `${parts[0]}.${parts[1]}.${parts[2]}.0`;
    return null;
  }
  if (ip.includes(":")) {
    const parts = ip.split(":");
    // First 4 hextets = /64
    return parts.slice(0, 4).join(":") + "::";
  }
  return null;
}

/* ---------- Debug helpers ---------- */

function isDebugEnabled(env) {
  const v = env.SEARCHMENTION_DEBUG;
  return v === "1" || v === "true" || v === "yes";
}

function debugLog(env, message, detail) {
  if (!isDebugEnabled(env)) return;
  if (detail !== undefined) {
    console.log("[searchmention-ai-tracker]", message, detail);
  } else {
    console.log("[searchmention-ai-tracker]", message);
  }
}

/* ---------- Reporting ---------- */

async function reportVisit(env, request, response, detection) {
  const endpoint =
    env.SEARCHMENTION_ENDPOINT || "https://searchmention.com/api/v1/visits";
  const apiKey = env.SEARCHMENTION_API_KEY;
  if (!apiKey) {
    debugLog(env, "beacon skipped: SEARCHMENTION_API_KEY is not set");
    return;
  }

  // Optional sampling — useful when a client gets a viral spike and
  // you don't want to hammer the API. Value is 0..1, default 1 (report all).
  const sampleRate = parseFloat(env.SEARCHMENTION_SAMPLE_RATE || "1");
  if (sampleRate < 1 && Math.random() > sampleRate) {
    debugLog(env, "beacon skipped: sampled out", { sampleRate });
    return;
  }

  const userAgent = request.headers.get("user-agent") || "";
  const rawIp = request.headers.get("cf-connecting-ip") || null;

  const cf = request.cf || {};
  const payload = {
    visits: [
      {
        url: request.url,
        user_agent: userAgent,
        visit_type: detection.visit_type,
        platform: detection.platform || null,
        bot_name: detection.bot_name || null,
        vendor: detection.vendor || null,
        referrer: detection.referrer || null,
        referrer_host: detection.referrer_host || null,
        method: request.method,
        status_code: response.status,
        ip_address: anonymizeIp(rawIp),
        country: cf.country || null,
        city: cf.city || null,
        visited_at: new Date().toISOString(),
        source: "cloudflare",
      },
    ],
  };

  // Abort the beacon if the API is slow — don't eat worker CPU on spikes.
  const controller = new AbortController();
  const timeout = setTimeout(() => controller.abort(), 3000);

  try {
    const res = await fetch(endpoint, {
      method: "POST",
      headers: {
        "Content-Type": "application/json",
        Authorization: `Bearer ${apiKey}`,
      },
      body: JSON.stringify(payload),
      signal: controller.signal,
    });
    if (isDebugEnabled(env)) {
      const bodyPreview = await res.text();
      debugLog(env, "beacon response", {
        status: res.status,
        visitType: detection.visit_type,
        platform: detection.platform,
        body: bodyPreview.slice(0, 300),
      });
    }
  } catch (err) {
    debugLog(env, "beacon fetch failed", String(err && err.message ? err.message : err));
  } finally {
    clearTimeout(timeout);
  }
}

/* ---------- Main handler ---------- */

export default {
  async fetch(request, env, ctx) {
    // Kick off the origin fetch immediately — detection runs in parallel
    // so we don't add latency to the visitor's response.
    const responsePromise = fetch(request);

    const userAgent = request.headers.get("user-agent") || "";
    const referer = request.headers.get("referer") || "";

    const bot = detectBot(userAgent);
    const aiReferrer = !bot ? detectAiReferrer(referer) : null;
    const utmReferrer = !bot && !aiReferrer ? detectUtmAiSource(request.url) : null;

    let detection = null;
    if (bot) {
      detection = {
        visit_type: bot.category, // "ai_search_fetch" | "bot_training"
        platform: bot.vendor,
        bot_name: bot.name,
        vendor: bot.vendor,
      };
    } else if (aiReferrer) {
      detection = {
        visit_type: "human_referral",
        platform: aiReferrer.platform,
        referrer: referer,
        referrer_host: aiReferrer.host,
      };
    } else if (utmReferrer) {
      detection = {
        visit_type: "human_referral_utm",
        platform: utmReferrer.platform,
        referrer: null,
        referrer_host: null,
      };
    }

    debugLog(env, "request", {
      method: request.method,
      url: request.url,
      detection,
      userAgent: userAgent.slice(0, 200),
    });

    const response = await responsePromise;

    if (detection) {
      ctx.waitUntil(reportVisit(env, request, response, detection));
    }

    return response;
  },
};

Worker code editor showing SearchMention script and Deploy button — After pasting, use **Deploy** in this IDE to publish your script.

4 Attach routes to your storefront

Routes for every hostname you serve. In Cloudflare you’ll usually add more than one route so the Worker runs wherever shoppers land—for example yourstore.com/* and www.yourstore.com/*, or a wildcard like *.yourstore.com/* if you use subdomains. Visits are stored against the SearchMention project tied to your API key (step 5)—not by matching the visit URL’s host to a single saved domain in SearchMention.

Go to the Worker’s Settings tab. Under Domains & Routes, choose Add for each pattern that should run this Worker. Repeat Add route until every storefront hostname you care about is covered (see above).

Worker Settings tab with Domains and Routes Add control — Routes live under Settings → Domains & Routes.

Use Fail open (proceed). When the Worker errors or hits limits, Cloudflare should still send traffic to your origin. For a live store, avoid “Fail closed”—this Worker only reports AI-related visits and must not block checkout or catalog pages.

In the route panel, select your zone, enter the route pattern that covers your storefront (wildcard patterns are fine when that matches how customers browse), select Fail open (proceed), then save the route.

Route dialog with zone, route pattern, and Fail open selected — Match your real domain; fail open keeps the site reachable.

5 Add your SearchMention API key

Still on Settings, open Variables and Secrets and click Add. Create a Secret named exactly SEARCHMENTION_API_KEY and paste your sm_live_… key as the value. Save with Deploy so the secret applies to the Worker.

Settings Variables and Secrets section with Add button — Secrets are encrypted and not shown again after save.

Secret dialog with SEARCHMENTION_API_KEY name and Deploy — Variable name must match what the script expects.

Self-hosted API. If your SearchMention app is not at the default host, add another secret SEARCHMENTION_ENDPOINT with your full ingest URL (for example https://your-domain.com/api/v1/visits) — see the worker comments and repo README for details.

6 Confirm and test

Your Settings page should list the storefront routes you added and show SEARCHMENTION_API_KEY as an encrypted secret. Then open Dashboard → AI Traffic in SearchMention: after real AI bot or referral traffic hits your routes, events appear for that project. Normal visitors alone won’t create rows—only matched bots, referrers, or allowed UTM hints.

Settings summary showing custom routes and encrypted SEARCHMENTION_API_KEY secret — Routes + secret in one place for a final sanity check.