Changes for version 0.003 - 2026-06-11

  • Detect: captcha signal no longer false-positives on cookie-banner / privacy-policy mentions of reCAPTCHA. A content-rich page now only walls when a markdown captcha marker co-occurs with captcha-PROMPT language ("complete the captcha to continue", "I'm not a robot", "verify you are human", …); thin pages still wall on any marker (JS-rendered gate) and html-only markers on rich pages still never wall
  • Detect: signals now also flags WAF / bot-management gates that REDIRECT to a challenge URL instead of embedding a widget. When the page's final_url matches a known challenge endpoint, blocked is raised for Cloudflare / DataDome / PerimeterX (/cdn-cgi/challenge, __cf_chl, /challenge-platform/, datadome, geo.captcha-delivery.com, /px/captcha, perimeterx) and captcha for provider verification endpoints (google.com/recaptcha, /recaptcha/api, hcaptcha.com). Additive and OR-ed in; cosmetic redirects (http->https, www, trailing slash) and an absent final_url never trigger

Documentation

probe Crawl4AI / CloakBrowser / proxy reachability and print the chain
run the full WWW::Crawl4AI strategy chain against one URL

Modules

Perl client and fallback orchestrator for Crawl4AI
one strategy attempt in a WWW::Crawl4AI fallback chain
UA-agnostic REST client for the Crawl4AI Docker API
breadth-first iterator for deep_crawl, separating frontier management from crawl logic
service detection and content-quality classification for Crawl4AI
structured error class for WWW::Crawl4AI
markdown field resolution across Crawl4AI response shapes
builds Crawl4AI /crawl and /md request payloads
normalized result of a WWW::Crawl4AI strategy chain
role for a single crawl strategy in the WWW::Crawl4AI fallback chain
Crawl4AI strategy with full JS rendering (wait for networkidle)
last-resort Crawl4AI strategy delegating to a user coderef
Crawl4AI strategy attaching to CloakBrowser over CDP
cheapest Crawl4AI strategy — headless text mode, no escalation
Crawl4AI strategy routing through a configured proxy
Crawl4AI strategy with enable_stealth and randomized fingerprint
ordered list of strategy objects, pluggable at construction time