• where to store web files in a dir tree

    From Helmut Richter@21:1/5 to All on Mon Jan 2 15:35:20 2023
    The following question pertains to a web server where there is an underlying file system that is used in a way that path elements in the URL designate directories and files in that file system. I am aware that in many CMSs this
    is not so but this is not the context of my question.

    On my website, I have the habit to avoid file extensions in URLs where the
    user of the page need not know them. For instance, it makes no difference whether the file is .html, .php, or anything else delivering HTML code. Also,
    I do not mark .txt files as such so that I can convert them to them to .html without changing the URL. It is enough that the web server deliver all
    content with the correct content type in the headers.

    Now I have two ways to store web contents in the file system:

    classical: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c/index.ext
    which a suitable file extension (as a means to determine the content type), typically .html

    abbreviated: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c.ext provided that there is no directory /a/b/c

    The abbreviated method saves quite some directories with no other use than holding one single file each, all of them with the same file name. It may sometimes be error-prone when you copy the file index.html from one directory to another where you may confuse which of the hundreds of files with that
    name it is.

    The classical method allows to treat each directory the same, no matter how many files of what file type it contains. This kind of internal consistency
    may well compensate the aforementioned inconveniences.

    Are there any other reasons to prefer one of the methods over the other?

    --
    Helmut Richter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Thu Jan 5 18:18:04 2023
    Helmut Richter, 2023-01-02 15:35:

    [...]
    Now I have two ways to store web contents in the file system:

    classical: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c/index.ext
    which a suitable file extension (as a means to determine the content type), typically .html

    abbreviated: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c.ext provided that there is no directory /a/b/c
    [...]
    Are there any other reasons to prefer one of the methods over the other?

    I would stick with <doc-root>/a/b/c/index.ext and not allow the
    alternative <doc-root>/a/b/c.ext for the same URL to avoid confusion
    what <doc-root>/a/b/c means in your filesystem.

    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Helmut Richter@21:1/5 to Arno Welzel on Sat Jan 7 17:23:23 2023
    On Thu, 5 Jan 2023, Arno Welzel wrote:

    Helmut Richter, 2023-01-02 15:35:

    [...]
    Now I have two ways to store web contents in the file system:

    classical: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c/index.ext
    which a suitable file extension (as a means to determine the content type), typically .html

    abbreviated: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c.ext
    provided that there is no directory /a/b/c
    [...]
    Are there any other reasons to prefer one of the methods over the other?

    I would stick with <doc-root>/a/b/c/index.ext and not allow the
    alternative <doc-root>/a/b/c.ext for the same URL to avoid confusion
    what <doc-root>/a/b/c means in your filesystem.

    Thank you. After doing some testing with different configurations, I come to the same result: this feature is really confusing.

    E.g.: if <doc-root>/a/b/c.html is called by the path /a/b/c, relative URLs (href="xyz") are relative to the fictitious directory /a/b/c. If, however, <doc-root>/a/b/c.html is called as /a/b/c.html, relative URLs are, of course, relative to the real parent directory /a/b. Same file contents, same file
    name, same place in the dir tree, different semantics. Moreover, you can even call the same file as /a/b/c/index.html even though /a/b/c does not exist,
    then the spurious file name is treated as PATH_INFO.

    So it should indeed not be allowed, as you suggest. It took me a while to
    find out why weird things like paths to nonexistent directories and files
    were allowed in the first place, and what to do to disallow them. The main point was the MultiViews option which brings about most of the mess.

    I will now use the following options as defaults:

    Options All {disallows MultiViews}
    Options -Indexes
    DirectoryIndex index.html index.txt {and perhaps others, as needed}
    AcceptPathInfo Off {unless needed for good reasons}

    Then the path in the URL must be a valid file path resulting in a directory (then one of index.* must exist there) or in an existing file. In all other cases, a 404 or 403 status is returned. Simple rules, less problems.

    --
    Helmut Richter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)