• How to read an identified part of a huge text file?

    From Janis Papanagnou@21:1/5 to All on Sun Apr 2 15:51:49 2023
    I want to read identified content from a huge text file that resides in
    the file system. (My javascript code is embedded in a HTML page. I am
    running all code client side and have no application servers or data
    base systems running.)

    I've found a suggestion using 'require("fs")' but the samples required
    to load the whole file content so doesn't seem to fit for my megabytes
    large data file which I strictly want to avoid loading as a whole.

    My data file is actually structured as <key> <TAB> <text-data> lines
    and I just want to extract the <text-data> given the respective <key>.
    Is there some simple standard way to achieve that extraction?

    The second question is whether it is possible to find the <key>s given
    a text-match (a string match or ideally a regular expression match) on
    the respective <text-data> on the external file?

    For a solution/workaround to both questions it might be also useful to
    call an external extractor (awk, perl, ...) from javascript and read in
    the output of such an external tool invocation. - Is that possible?

    Thanks for any hints.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jon Ribbens@21:1/5 to Janis Papanagnou on Sun Apr 2 14:49:27 2023
    On 2023-04-02, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    I want to read identified content from a huge text file that resides in
    the file system. (My javascript code is embedded in a HTML page. I am
    running all code client side and have no application servers or data
    base systems running.)

    I've found a suggestion using 'require("fs")' but the samples required
    to load the whole file content so doesn't seem to fit for my megabytes
    large data file which I strictly want to avoid loading as a whole.

    require('fs') is a nodejs thing, which is not going to work if you're
    using in-browser javascript.

    My data file is actually structured as <key> <TAB> <text-data> lines
    and I just want to extract the <text-data> given the respective <key>.
    Is there some simple standard way to achieve that extraction?

    I think in a modern browser you might be able to use the fetch and
    streams APIs to read the file a chunk at a time. e.g.

    const response = await fetch('myfile.txt')
    for await (const chunk of response.body) {
    // Do something with each chunk
    }

    The second question is whether it is possible to find the <key>s given
    a text-match (a string match or ideally a regular expression match) on
    the respective <text-data> on the external file?

    Yes? I'm not sure I understand that question.

    For a solution/workaround to both questions it might be also useful to
    call an external extractor (awk, perl, ...) from javascript and read in
    the output of such an external tool invocation. - Is that possible?

    Not from inside a browser, no.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Janis Papanagnou@21:1/5 to Jon Ribbens on Sun Apr 2 20:05:31 2023
    Thanks for your hints and insights thus far!

    On 02.04.2023 16:49, Jon Ribbens wrote:
    On 2023-04-02, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    My data file is actually structured as <key> <TAB> <text-data> lines

    The second question is whether it is possible to find the <key>s given
    a text-match (a string match or ideally a regular expression match) on
    the respective <text-data> on the external file?

    Yes? I'm not sure I understand that question.

    Where my first question was (informally described) by something like

    Select <text-data> From <text-file> Where <key> Equals <search-key>

    the second one operates on the data and returns text-data matching keys
    that identify the data records like

    Select <keys> From <text-file> Where <text-data> Matches <s1> And <s2>

    with a possibility to either get all the s1/s2-matching key-identifier
    in one returned set or which lets me sequentially get these keys or
    let me operate on matching records (that are identified by the keys of
    matching records).

    Basically in both questions I have want access (line-wise, record-wise)
    to the data, either the <text-data> selected by <key> or the <keys>
    where the <text-data> match a search criterion.

    The point is; once data is read into memory accessible to JS I can do everything (including matching), but the problem is the bottleneck due
    to the mass of data in the file, so I need to preselect the desired
    records (to not have to load it completely into memory).

    (I hope it got cleared and doesn't muddy it further.)

    The suggestion of using await fetch('myfile.txt') sounds like it's
    a raw (byte-oriented) data function (not line/record oriented one),
    but I will be looking into that as well. Thanks again.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jon Ribbens@21:1/5 to Janis Papanagnou on Sun Apr 2 19:58:25 2023
    On 2023-04-02, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    Thanks for your hints and insights thus far!

    On 02.04.2023 16:49, Jon Ribbens wrote:
    On 2023-04-02, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
    My data file is actually structured as <key> <TAB> <text-data> lines

    The second question is whether it is possible to find the <key>s given
    a text-match (a string match or ideally a regular expression match) on
    the respective <text-data> on the external file?

    Yes? I'm not sure I understand that question.

    Where my first question was (informally described) by something like

    Select <text-data> From <text-file> Where <key> Equals <search-key>

    the second one operates on the data and returns text-data matching keys
    that identify the data records like

    Select <keys> From <text-file> Where <text-data> Matches <s1> And <s2>

    with a possibility to either get all the s1/s2-matching key-identifier
    in one returned set or which lets me sequentially get these keys or
    let me operate on matching records (that are identified by the keys of matching records).

    Basically in both questions I have want access (line-wise, record-wise)
    to the data, either the <text-data> selected by <key> or the <keys>
    where the <text-data> match a search criterion.

    The point is; once data is read into memory accessible to JS I can do everything (including matching), but the problem is the bottleneck due
    to the mass of data in the file, so I need to preselect the desired
    records (to not have to load it completely into memory).

    (I hope it got cleared and doesn't muddy it further.)

    The suggestion of using await fetch('myfile.txt') sounds like it's
    a raw (byte-oriented) data function (not line/record oriented one),
    but I will be looking into that as well. Thanks again.

    Yes, although there's an example of how to use it to read line-by-line
    here:

    https://developer.mozilla.org/en-US/docs/Web/API/ReadableStreamDefaultReader/read#example_2_-_handling_text_line_by_line

    I think the only solution available to you in a browser is to use
    IndexedDB. On the plus side though, it's quite a good solution.
    Basically, write a function in JavaScript to read and parse the
    file and load it into an in-browser database:

    https://developer.mozilla.org/en-US/docs/Web/API/IndexedDB_API

    and then you can search this indexed database of objects, which
    should be very fast and efficient. You just need to make sure that
    your code checks for the existence of the database and re-creates
    it from the file if it doesn't exist due to the browser having
    decided to expire it.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From JJ@21:1/5 to Janis Papanagnou on Mon Apr 3 06:08:52 2023
    On Sun, 2 Apr 2023 20:05:31 +0200, Janis Papanagnou wrote:

    The suggestion of using await fetch('myfile.txt') sounds like it's
    a raw (byte-oriented) data function (not line/record oriented one),
    but I will be looking into that as well. Thanks again.

    Janis

    With Fetch/XHR and the `Range` HTTP request header, you'll need to have a pre-generated index file for the text file lines, if you want to get only specific lines without having to read the whole file. The index file would contain byte offsets for each line in the text file, so that you'll know the byte range a specific line is located in the text file.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From V@21:1/5 to Janis Papanagnou on Mon Apr 3 06:55:12 2023
    groups.google.com/g/sci.math/c/sFUlUZ47ZGs



    On Sunday, April 2, 2023 at 3:52:00 PM UTC+2, Janis Papanagnou wrote:
    I want to read identified content from a huge text file that resides in
    the file system. (My javascript code is embedded in a HTML page. I am running all code client side and have no application servers or data
    base systems running.)

    I've found a suggestion using 'require("fs")' but the samples required
    to load the whole file content so doesn't seem to fit for my megabytes
    large data file which I strictly want to avoid loading as a whole.

    My data file is actually structured as <key> <TAB> <text-data> lines
    and I just want to extract the <text-data> given the respective <key>.
    Is there some simple standard way to achieve that extraction?

    The second question is whether it is possible to find the <key>s given
    a text-match (a string match or ideally a regular expression match) on
    the respective <text-data> on the external file?

    For a solution/workaround to both questions it might be also useful to
    call an external extractor (awk, perl, ...) from javascript and read in
    the output of such an external tool invocation. - Is that possible?

    Thanks for any hints.

    Janis

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Michael Haufe (TNO)@21:1/5 to Janis Papanagnou on Tue Apr 4 17:59:48 2023
    On Sunday, April 2, 2023 at 8:52:00 AM UTC-5, Janis Papanagnou wrote:
    I want to read identified content from a huge text file that resides in
    the file system. (My javascript code is embedded in a HTML page. I am running all code client side and have no application servers or data
    base systems running.)

    I've found a suggestion using 'require("fs")' but the samples required
    to load the whole file content so doesn't seem to fit for my megabytes
    large data file which I strictly want to avoid loading as a whole.

    My data file is actually structured as <key> <TAB> <text-data> lines
    and I just want to extract the <text-data> given the respective <key>.
    Is there some simple standard way to achieve that extraction?

    The second question is whether it is possible to find the <key>s given
    a text-match (a string match or ideally a regular expression match) on
    the respective <text-data> on the external file?

    For a solution/workaround to both questions it might be also useful to
    call an external extractor (awk, perl, ...) from javascript and read in
    the output of such an external tool invocation. - Is that possible?

    Thanks for any hints.

    In the latest browsers there is a feature called the Origin private file system (OPFS):

    <https://developer.mozilla.org/en-US/docs/Web/API/File_System_Access_API#origin_private_file_system>

    This provides a FileSystemSyncAccessHandle:

    <https://developer.mozilla.org/en-US/docs/Web/API/FileSystemSyncAccessHandle>

    which has a `read()` method:

    <https://developer.mozilla.org/en-US/docs/Web/API/FileSystemSyncAccessHandle/read>

    That method with an appropriately sized buffer (size being your record size) will let you access a specific location in the file

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)