Hi all,resources are "not found" and I would like to get rid of this.
For some unknown reason, some bots crawling my site have taken two bad habits:
1) adding some parameters without ? but directly with &
For example: domain.tld/file.ext&argument=...
This causes all kind of errors and is easily trapped in an htacces for example with:
RewriteCond %{the_request} .*\.html&.*
RewriteRule ^(.*)&(.*)$ http://%{HTTP_HOST}/$1? [R=301,L]
2) now my current problem. They also send some invalid PATHS after the name of the file, and I just can't get the correct syntax to go around that. This causes httpd to eat up CPU like mad for some time then display a "weird" page because some
For example: domain.tld/valid_file.html will receive the request:
domain.tld/valid_file.html/some/path
I have been turning around and no rule catches it.
RewriteCond %{query_string} \/ does not work because no query string.
RewriteCond %{the_request} html[/]+
and all variations on variable just do not work. The associated action is to generate a 404 error (and it works on other rules).
If someone has an idea, I am all ears.
TIA
Sincerely,
JGA
Maybe...I'm not sure if something.html will match the following RewriteCond:
RewriteCond %{REQUEST_URI} html.*
RewriteRule ^(.*)&(.*)$ http://%{HTTP_HOST}/$1? [R=404,L,NC]
Maybe a ..* would be better?
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 285 |
Nodes: | 16 (2 / 14) |
Uptime: | 63:53:47 |
Calls: | 6,488 |
Calls today: | 1 |
Files: | 12,096 |
Messages: | 5,274,785 |