55### Does this just block all bots?
66
77No, and that's deliberate. The goal isn't bot elimination — it's
8- * operator control over the terms of access* . Search-engine
8+ * site control over the terms of access* . Search-engine
99crawlers, LLM training bots, archival crawlers, monitoring
10- agents, partner integrations — many of these are bots an operator
11- * wants* to reach the content, but on conditions the operator sets:
10+ agents, partner integrations — many of these are bots a site
11+ * wants* to reach the content, but on conditions the site sets:
1212when, how often, which paths, with what rate cap, with what
1313attribution. mod_botshield's primitives are built around setting
1414those terms:
@@ -17,7 +17,7 @@ those terms:
1717 configured CIDR ranges are loaded, verified crawlers (UA-and-IP
1818 match against the published ranges) bypass the score ladder
1919 entirely. The built-in seed list covers Googlebot, Bingbot,
20- and Applebot; operators add others via ` BotShieldAllowBot ` and
20+ and Applebot; you add others via ` BotShieldAllowBot ` and
2121 refresh ranges out of band with ` tools/refresh-bot-ranges.sh ` .
2222- ** Robots.txt enforcement.** A bot that ignores your ` Disallow `
2323 rules gets enforced at the policy layer — robots.txt is no
@@ -36,7 +36,7 @@ those terms:
3636A site that wanted to block every bot could do that with much
3737less than mod_botshield offers. The reason this module exists is
3838that "block everything that isn't a real human browser" is the
39- * wrong* answer for most operators — they want search-engine
39+ * wrong* answer for most sites — they want search-engine
4040indexing, want LLM crawlers to cite them under controlled terms,
4141want monitoring to reach health endpoints, want partner bots to
4242hit their API. mod_botshield is the policy surface for saying
@@ -62,7 +62,7 @@ What Cloudflare gives you that mod_botshield doesn't:
6262 application-layer attacks orders of magnitude bigger than a
6363 single Apache instance can survive.
6464- ** Managed challenges and turnkey product.** No tuning, no
65- capacity sizing, no operator log-grepping. Pay the bill, get a
65+ capacity sizing, no log-grepping. Pay the bill, get a
6666 policy.
6767
6868What mod_botshield gives you that Cloudflare doesn't:
@@ -72,7 +72,7 @@ What mod_botshield gives you that Cloudflare doesn't:
7272- ** No vendor lock-in or recurring cost.** Bot mitigation that
7373 goes beyond simple known-bad-IP blocking is typically a paid
7474 tier with managed CDNs.
75- - ** Operator control.** You write the rules in Apache config you
75+ - ** Direct control.** You write the rules in Apache config you
7676 already understand; you don't have to learn a separate dashboard
7777 or wait for a vendor to add a feature.
7878
@@ -304,7 +304,7 @@ For challenged requests, yes, the user experiences friction
304304of clicked PoW; captcha for as long as the provider takes). That's
305305the entire point: * make scraping expensive without making real use
306306expensive* . The threshold tuning workflow in
307- [ staging] ( ../staging/index.html ) is the operator handle on where
307+ [ staging] ( ../staging/index.html ) is your handle on where
308308that line falls.
309309
310310### Does it work with PHP / FastCGI / mod_php / mod_proxy / nginx upstream?
@@ -342,7 +342,7 @@ Yes — it's just an Apache module. The constraints are:
342342### How much memory does it use?
343343
344344Default: 16 MiB SHM segment shared across all workers, growing as
345- operators raise capacity directives. Per-process overhead is
345+ sites raise capacity directives. Per-process overhead is
346346negligible (the ` .so ` is a few hundred KB; no per-request heap
347347allocation outside of Apache's ` r->pool ` which is freed at request
348348end).
@@ -355,15 +355,15 @@ sizing is documented in [deployment](../deployment/index.html#capacity-sizing).
355355### Does it phone home? Send data anywhere?
356356
357357No. The module makes outbound network calls in two
358- operator-controlled categories:
358+ configured categories:
359359
3603601 . ** Captcha siteverify.** When ` BotShieldCaptchaProvider ` is
361361 configured, mod_botshield makes one HTTPS POST per verify
362362 attempt to the configured provider's siteverify URL with the
363363 client's captcha token (and the client IP as the ` remoteip `
364364 field). This fires from three paths: the ` /captcha-verify `
365365 endpoint, the silent-tier embedded-verify endpoint when an
366- operator pairs silent with a captcha provider, and the
366+ site pairs silent with a captcha provider, and the
367367 form-captcha fixup. No siteverify call ever happens without
368368 a captcha provider explicitly configured on the scope.
369369
@@ -376,8 +376,8 @@ operator-controlled categories:
3763762 . ** Bot-range refresh script.** ` tools/refresh-bot-ranges.sh `
377377 fetches published JSON from search-engine providers
378378 (Googlebot, Bingbot, etc.) and rewrites the CIDR files in
379- ` /var/lib/botshield/bots/ ` . This runs only when the operator
380- invokes it (cron or manual); the module itself never makes
379+ ` /var/lib/botshield/bots/ ` . This runs only when you
380+ invoke it (cron or manual); the module itself never makes
381381 these calls at runtime.
382382
383383No telemetry. No analytics. No phoning the project. The module is
@@ -415,7 +415,7 @@ The module's data footprint (client IP + flag bits + TTL) is
415415generally classified as personal data under GDPR. Considerations:
416416
417417- ** Lawful basis.** Bot mitigation is generally a "legitimate
418- interest" — preventing scraping of operator data. Document the
418+ interest" — preventing scraping of site data. Document the
419419 basis in your privacy notice.
420420- ** Retention.** Flagged-IP entries expire after the configured
421421 TTL (default 1 hour for honeypot hits). The Bloom filter
@@ -456,11 +456,11 @@ mod_botshield fails open for siteverify timeouts. If
456456provider responds, the verification path treats the request as
457457passing — same outcome it would get without the provider. A
458458WARNING-level log line carries the literal string ` failing open `
459- so operators can grep / alert on it. The Prometheus metrics
459+ so you can grep / alert on it. The Prometheus metrics
460460count these as ` outcome=failopen ` .
461461
462462The reasoning: a third-party provider outage shouldn't black-hole
463- legitimate traffic. Operators preferring fail-closed semantics
463+ legitimate traffic. Sites preferring fail-closed semantics
464464can wrap the provider in a circuit breaker (e.g. require captcha
465465tier through a different path that doesn't fail-open) but that
466466isn't the default.
@@ -472,9 +472,9 @@ mod_botshield degrades gracefully:
472472- Periodic state-file snapshots stop. The graceful-shutdown save
473473 still runs.
474474- The capacity headroom watchdog stops emitting NOTICE/WARN
475- lines. Operators can read the same data from the on-demand
475+ lines. You can read the same data from the on-demand
476476 Prometheus gauges (` botshield_shm_flagged_used ` etc.).
477- - The robots.txt mtime-poller stops. Operators must reload Apache
477+ - The robots.txt mtime-poller stops. You must reload Apache
478478 to pick up robots.txt changes.
479479- The load sampler stops; load triggers won't fire.
480480
@@ -578,7 +578,7 @@ permanent denial better than mod_botshield does.
578578## Where to next
579579
580580- Install + minimal config: [ getting-started] ( ../getting-started/index.html ) .
581- - Tier model and scoring: [ operator- model] ( ../operator -model/index.html ) .
581+ - Tier model and scoring: [ site model] ( ../site -model/index.html ) .
582582- Allow lists, triggers, robots: [ policy] ( ../policy/index.html ) .
583583- Captcha + app-bridge: [ captcha] ( ../captcha/index.html ) .
584584- Common operational issues: [ troubleshooting] ( ../troubleshooting/index.html ) .
0 commit comments