• Can not detect slash in htaccess (not a trailing slash question)

    From john.gallet@wanadoo.fr@21:1/5 to All on Fri Aug 21 04:41:37 2015
    Hi all,

    For some unknown reason, some bots crawling my site have taken two bad habits:

    1) adding some parameters without ? but directly with &
    For example: domain.tld/file.ext&argument=...
    This causes all kind of errors and is easily trapped in an htacces for example with:

    RewriteCond %{the_request} .*\.html&.*
    RewriteRule ^(.*)&(.*)$ http://%{HTTP_HOST}/$1? [R=301,L]

    2) now my current problem. They also send some invalid PATHS after the name of the file, and I just can't get the correct syntax to go around that. This causes httpd to eat up CPU like mad for some time then display a "weird" page because some resources
    are "not found" and I would like to get rid of this.

    For example: domain.tld/valid_file.html will receive the request: domain.tld/valid_file.html/some/path

    I have been turning around and no rule catches it.
    RewriteCond %{query_string} \/ does not work because no query string.

    RewriteCond %{the_request} html[/]+

    and all variations on variable just do not work. The associated action is to generate a 404 error (and it works on other rules).

    If someone has an idea, I am all ears.

    TIA
    Sincerely,
    JGA

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From I R A Darth Aggie@21:1/5 to 6c83aeac-471f-4d76-a644-bd45ad83f94 on Wed Sep 2 17:46:41 2015
    On Fri, 21 Aug 2015 04:41:37 -0700 (PDT),
    john.gallet@wanadoo.fr <john.gallet@wanadoo.fr>, in <6c83aeac-471f-4d76-a644-bd45ad83f941@googlegroups.com> wrote:
    Hi all,

    For some unknown reason, some bots crawling my site have taken two bad habits:

    1) adding some parameters without ? but directly with &
    For example: domain.tld/file.ext&argument=...
    This causes all kind of errors and is easily trapped in an htacces for example with:

    RewriteCond %{the_request} .*\.html&.*
    RewriteRule ^(.*)&(.*)$ http://%{HTTP_HOST}/$1? [R=301,L]

    2) now my current problem. They also send some invalid PATHS after the name of the file, and I just can't get the correct syntax to go around that. This causes httpd to eat up CPU like mad for some time then display a "weird" page because some
    resources are "not found" and I would like to get rid of this.

    For example: domain.tld/valid_file.html will receive the request:
    domain.tld/valid_file.html/some/path

    I have been turning around and no rule catches it.
    RewriteCond %{query_string} \/ does not work because no query string.

    RewriteCond %{the_request} html[/]+

    and all variations on variable just do not work. The associated action is to generate a 404 error (and it works on other rules).

    If someone has an idea, I am all ears.

    TIA
    Sincerely,
    JGA

    Maybe...I'm not sure if something.html will match the following RewriteCond:

    RewriteCond %{REQUEST_URI} html.*
    RewriteRule ^(.*)&(.*)$ http://%{HTTP_HOST}/$1? [R=404,L,NC]

    Maybe a ..* would be better?
    --
    Consulting Minister for Consultants, DNRC
    I can please only one person per day. Today is not your day. Tomorrow
    isn't looking good, either.
    I am BOFH. Resistance is futile. Your network will be assimilated.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From john.gallet@wanadoo.fr@21:1/5 to All on Fri Sep 25 05:19:39 2015
    Hi,
    Thanks for your answer.

    You are right about the fact that since I do not have anywhere "html" in any url on the site, actually ANYTHING after "html" can be trashed.

    Maybe...I'm not sure if something.html will match the following RewriteCond:

    RewriteCond %{REQUEST_URI} html.*
    RewriteRule ^(.*)&(.*)$ http://%{HTTP_HOST}/$1? [R=404,L,NC]

    Maybe a ..* would be better?

    This rewrite rule make an http 500, so I am using:
    RewriteRule ^$ [R=404,L]

    RewriteCond %{REQUEST_URI} html.*
    or
    RewriteCond %{REQUEST_URI} html..*
    do not work i.e. they do not redirect http....somefile.html/some/path

    I am just banging my head against the / on this one.

    John

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)