Forum: >>> Magnum BBS <<<

where to store web files in a dir tree

From Helmut Richter@21:1/5 to All on Mon Jan 2 15:35:20 2023

The following question pertains to a web server where there is an underlying file system that is used in a way that path elements in the URL designate directories and files in that file system. I am aware that in many CMSs this
is not so but this is not the context of my question.

On my website, I have the habit to avoid file extensions in URLs where the
user of the page need not know them. For instance, it makes no difference whether the file is .html, .php, or anything else delivering HTML code. Also,
I do not mark .txt files as such so that I can convert them to them to .html without changing the URL. It is enough that the web server deliver all
content with the correct content type in the headers.

Now I have two ways to store web contents in the file system:

classical: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c/index.ext
which a suitable file extension (as a means to determine the content type), typically .html

abbreviated: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c.ext provided that there is no directory /a/b/c

The abbreviated method saves quite some directories with no other use than holding one single file each, all of them with the same file name. It may sometimes be error-prone when you copy the file index.html from one directory to another where you may confuse which of the hundreds of files with that
name it is.

The classical method allows to treat each directory the same, no matter how many files of what file type it contains. This kind of internal consistency
may well compensate the aforementioned inconveniences.

Are there any other reasons to prefer one of the methods over the other?

--
Helmut Richter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Arno Welzel@21:1/5 to All on Thu Jan 5 18:18:04 2023

Helmut Richter, 2023-01-02 15:35:

[...]

Now I have two ways to store web contents in the file system:

classical: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c/index.ext
which a suitable file extension (as a means to determine the content type), typically .html

abbreviated: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c.ext provided that there is no directory /a/b/c

[...]

Are there any other reasons to prefer one of the methods over the other?

I would stick with <doc-root>/a/b/c/index.ext and not allow the
alternative <doc-root>/a/b/c.ext for the same URL to avoid confusion
what <doc-root>/a/b/c means in your filesystem.

--
Arno Welzel
https://arnowelzel.de

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Helmut Richter@21:1/5 to Arno Welzel on Sat Jan 7 17:23:23 2023

On Thu, 5 Jan 2023, Arno Welzel wrote:

Helmut Richter, 2023-01-02 15:35:

[...]

Now I have two ways to store web contents in the file system:

classical: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c/index.ext
which a suitable file extension (as a means to determine the content type), typically .html

abbreviated: the URL <domain>/a/b/c is served from file <doc-root>/a/b/c.ext
provided that there is no directory /a/b/c

[...]

Are there any other reasons to prefer one of the methods over the other?

I would stick with <doc-root>/a/b/c/index.ext and not allow the
alternative <doc-root>/a/b/c.ext for the same URL to avoid confusion
what <doc-root>/a/b/c means in your filesystem.

Thank you. After doing some testing with different configurations, I come to the same result: this feature is really confusing.

E.g.: if <doc-root>/a/b/c.html is called by the path /a/b/c, relative URLs (href="xyz") are relative to the fictitious directory /a/b/c. If, however, <doc-root>/a/b/c.html is called as /a/b/c.html, relative URLs are, of course, relative to the real parent directory /a/b. Same file contents, same file
name, same place in the dir tree, different semantics. Moreover, you can even call the same file as /a/b/c/index.html even though /a/b/c does not exist,
then the spurious file name is treated as PATH_INFO.

So it should indeed not be allowed, as you suggest. It took me a while to
find out why weird things like paths to nonexistent directories and files
were allowed in the first place, and what to do to disallow them. The main point was the MultiViews option which brings about most of the mess.

I will now use the following options as defaults:

Options All {disallows MultiViews}
Options -Indexes
DirectoryIndex index.html index.txt {and perhaps others, as needed}
AcceptPathInfo Off {unless needed for good reasons}

Then the path in the URL must be a valid file path resulting in a directory (then one of index.* must exist there) or in an existing file. In all other cases, a 404 or 403 status is returned. Simple rules, less problems.

--
Helmut Richter

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Guest
  Thu Dec 26 05:34:50 2024
  from /bin/busybox Cat /proc/self/ex via Raw
- Gwylbert
  Thu Dec 26 05:25:03 2024
  from Sydney, Nsw via Telnet
- Guest
  Thu Dec 26 04:02:03 2024
  from /bin/busybox Cat /proc/self/ex via Raw
- Gwylbert
  Thu Dec 26 00:08:06 2024
  from Sydney, Nsw via Telnet
- Bob Worm
  Wed Dec 25 23:09:42 2024
  from Wales, Uk via Telnet
- Guest
  Wed Dec 25 19:36:50 2024
  from /bin/busybox Cat /proc/self/ex via Raw
- Keyop
  Wed Dec 25 16:24:41 2024
  from Huddersfield, West Yorkshire via SSH
- Daniel Garrod
  Wed Dec 25 16:22:01 2024
  from Cambridge, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	379
Nodes:	16 (2 / 14)
Uptime:	41:40:00
Calls:	8,141
Calls today:	4
Files:	13,085
Messages:	5,857,792

where to store web files in a dir tree

Who's Online

Recent Visitors

System Info