Changes for version 6.02 - 2026-05-21

  • WWW::RobotRules::AnyDBM_File::agent() no longer truncates the on-disk cache through an untie/tie(O_TRUNC) sequence. Stale-data reset now goes through the tied-hash CLEAR, eliminating a symlink-follow race that a local attacker with write access to the cache directory could exploit to overwrite arbitrary files writable by the crawler user.
  • The on-disk cache file mode has been tightened from 0640 to 0600.
  • t/rules-dbm.t has been hardened against symlink attacks on its tempfile during package builds.
  • A new SECURITY CONSIDERATIONS POD section documents the residual caller-trust requirement: the constructor's tie still follows symlinks because AnyDBM_File cannot portably plumb O_NOFOLLOW, so the caller must store the cache file in a directory writable only by the user that runs the code.
  • References: CWE-377, CWE-378, CWE-379.

Modules

database of robots.txt-derived permissions
Persistent RobotRules
Parse robots.txt files using a disk cache

Provides

in lib/WWW/RobotRules.pm