This won't be a concern here. You need the whole data to sort something,
so the sort utility must read until EOF anyways before doing its work.
So, the real concern is whether you'll have enough RAM.
David W. Hodgins <dwhodgins@nomail.afraid.org> wrote:
...
Keep in mind. When sorting a file, the last line in the input may end up >>becoming the first line in the output. The sort can not write anything to >>the pipe or output file until it's sorted the entire input. With a pipe, >>the temporary file is in ram rather then being a named file on disk.
This actually raises an interesting point. Pipes are not infinite in size, and they could, theoretically block if enough is written on the write end [...]
Something to keep in mind if you ever decide to sort very large files in a pipeline. And it is probably a better idea not to do so; to sort it all at once, using multiple key specifications on the command line.
This won't be a concern here. You need the whole data to sort something,
so the sort utility must read until EOF anyways before doing its work.
So, the real concern is whether you'll have enough RAM.
On 23.04.2023 18:21, Felix Palmen wrote:
This won't be a concern here. You need the whole data to sort something,
so the sort utility must read until EOF anyways before doing its work.
See my recent reply on a different view.
So, the real concern is whether you'll have enough RAM.
Not if sorting is (alternatively or also) done over files.
It seems the idea assuming this was that the whole data to be sorted
must fit into the pipe buffer. But this isn't the case.
s/doing/finishing/
It boils down to this; sorting can _start_ sorting with fewer data
(something like a pipe-full), it can also _continue_ sorting with
more parts of data, and to _finish_ sorting it naturally must have
had all data available.
* Janis Papanagnou <janis_papanagnou+ng@hotmail.com>:
On 23.04.2023 18:21, Felix Palmen wrote:
This won't be a concern here. You need the whole data to sort something, >>> so the sort utility must read until EOF anyways before doing its work.
See my recent reply on a different view.
So, even if it starts working on "chunks", this won't change anything:
the data from the pipe must be read in order to work with it, so the
size of the pipe won't be a problem here.
It seems the idea assuming this was that the whole data to be sorted
must fit into the pipe buffer. But this isn't the case.
* Janis Papanagnou <janis_papanagnou+ng@hotmail.com>:
s/doing/finishing/
Agreed.
It boils down to this; sorting can _start_ sorting with fewer data
(something like a pipe-full), it can also _continue_ sorting with
more parts of data, and to _finish_ sorting it naturally must have
had all data available.
All correct, but I really doubt the relevance of the parantheses.
The
size of the pipe will never be of much interest (except maybe for performance), mostly because you can't seek a pipe anyways.
On 23.04.2023 18:58, Felix Palmen wrote:
* Janis Papanagnou <janis_papanagnou+ng@hotmail.com>:
On 23.04.2023 18:21, Felix Palmen wrote:
This won't be a concern here. You need the whole data to sort something, >>> so the sort utility must read until EOF anyways before doing its work.
s/doing/finishing/
See my recent reply on a different view.
So, even if it starts working on "chunks", this won't change anything:
the data from the pipe must be read in order to work with it, so the
size of the pipe won't be a problem here.
It seems the idea assuming this was that the whole data to be sorted
must fit into the pipe buffer. But this isn't the case.
It boils down to this; sorting can _start_ sorting with fewer data
(something like a pipe-full), it can also _continue_ sorting with
more parts of data, and to _finish_ sorting it naturally must have
had all data available.
If I'm reading it right, it always uses temporary files doing a sort/merge. Given that it started in 1988, it's not surprising that it's designed to
work in a low ram environment.
In fact, I suspect, a pipe doesn't have to store anything. It can be a
pure rendezvous. The write() call can block until the reader performs a read(), or vice versa, at which time MIN(read_size, write_size) bytes
can be transferred directly between their respective buffers, that value
then being returned from the read and write.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 307 |
Nodes: | 16 (2 / 14) |
Uptime: | 94:31:02 |
Calls: | 6,849 |
Files: | 12,352 |
Messages: | 5,414,804 |