• Site search

    From Tuxedo@21:1/5 to All on Tue Dec 13 19:20:37 2022
    I'm searching for an HTML keyword indexing solution which can be used to
    search a set of regular (internationalised) HTML pages and return an
    abstract for each.

    It's not for external site indexing and not for a huge amount of pages
    (100-200 or so) and not an extremely busy site, so each search could be performed live as opposed to pre-indexed and return the results in a typical search results manner of a linked <title>Title page</title> along with a description retrieved either from the meta description (if existing) or from the text surrounding the keyword(s) found in the HTML body.

    What may the Perl module repository have in store? Any recommendations?

    Many thanks,
    Tuxedo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jim Gibson@21:1/5 to Tuxedo on Fri Dec 16 06:03:12 2022
    On Dec 13, 2022 at 10:20:37 AM PST, "Tuxedo" <tuxedo@mailinator.net> wrote:

    I'm searching for an HTML keyword indexing solution which can be used to search a set of regular (internationalised) HTML pages and return an
    abstract for each.

    It's not for external site indexing and not for a huge amount of pages (100-200 or so) and not an extremely busy site, so each search could be performed live as opposed to pre-indexed and return the results in a typical search results manner of a linked <title>Title page</title> along with a description retrieved either from the meta description (if existing) or from the text surrounding the keyword(s) found in the HTML body.

    What may the Perl module repository have in store? Any recommendations?

    I use LWP::UserAgent, HTTP::Request, and URI::URL modules to fetch web pages from servers. I then use HTML::TokeParser to parse the fetched pages and breakdown the page into tags, text, and comments. This works pretty well, but it is low level stuff. You are going to have to program a lot of logic to accomplish the indexing I would guess.


    Many thanks,
    Tuxedo


    --
    Jim Gibson

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)