• Obtaining the DOM tree for a *.htm file on a local disc?

    From John Stockton@21:1/5 to All on Thu Feb 16 15:25:34 2023
    As feared, MS IE 11 is now not available in Windows 10. I do have a work-round - I have a Windows XP PC with internet connection removed and with several older browsers, and I would transfer files via a USB memory stick. But that's not nearly as
    convenient as calling IE 11 from the batch file that I am running.

    Is there a readily-available function which can be called from JavaScript running in a normal modern browser which, given as an argument the name (relative to the current directory, or maybe absolute) of an HTML file, will return a reference to the
    parsed DOM tree of the BODY of the HTML, including the Links, Anchors, etc. arrays?

    HISTORY - starting with Win XP pro sp3 + Firefox 3.6.13 or earlier, I discovered that a Web-type page on a local drive could have such a function which reads the specified file into an IFRAME element of the page, and then returned the desired reference -
    in any browser that I then had.

    So, starting with any file of the local copy of a Web-type site (usually INDEX.HTM), I could read the links of the DOM tree, and recurse wildly through the whole rats-nest of linked local files (never reading any of them more than once, of course) - and
    doing that I could check that all the files linked to were present and all the cited anchors too, collecting the faults and other things of interest. All at local-machine speed, and not using the Internet (when I started this, I had a dial-up connection)
    .

    Something like
    Txt = DOC.body.textContent || DOC.body.innerText // latter for IE8-
    was probably involved

    After a while, newer versions on browsers blocked such local access by an ill-considered (IMHO) implemented implementation of the same-origin policy; but it still worked in IE 11 - until this evening.

    I've not yet managed to get IE mode working in MS Edge.


    --
    (c) John Stockton, near London, UK. Using Google Groups. |

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jon Ribbens@21:1/5 to John Stockton on Fri Feb 17 01:06:34 2023
    On 2023-02-16, John Stockton <dr.j.r.stockton@gmail.com> wrote:
    As feared, MS IE 11 is now not available in Windows 10. I do have a work-round - I have a Windows XP PC with internet connection removed
    and with several older browsers, and I would transfer files via a USB
    memory stick. But that's not nearly as convenient as calling IE 11
    from the batch file that I am running.

    Is there a readily-available function which can be called from
    JavaScript running in a normal modern browser which, given as an
    argument the name (relative to the current directory, or maybe
    absolute) of an HTML file, will return a reference to the parsed DOM
    tree of the BODY of the HTML, including the Links, Anchors, etc.
    arrays?

    https://github.com/jsdom/jsdom ?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Dr.Kral@nyc.rr.com@21:1/5 to dr.j.r.stockton@gmail.com on Thu Feb 16 19:46:43 2023
    On Thu, 16 Feb 2023 15:25:34 -0800 (PST), John Stockton <dr.j.r.stockton@gmail.com> wrote in <7cb3d7c8-74e8-49d2-ba7d-ab6df813f0c1n@googlegroups.com>:

    <snip>

    As feared, MS IE 11 is now not available in Windows 10.
    After a while, newer versions on browsers blocked such local access by an ill-considered (IMHO) implemented implementation of the same-origin policy; but it still worked in IE 11 - until this evening.

    Look into creating a Simple Python HTTP(S) Server in your local computer
    which avoids the same-origin policy issues.

    It is not necessary to use a browser to check links. Look at "Xenu link sleuth" which, although old, still works.

    HTH

    K

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From JJ@21:1/5 to John Stockton on Fri Feb 17 08:34:20 2023
    On Thu, 16 Feb 2023 15:25:34 -0800 (PST), John Stockton wrote:
    As feared, MS IE 11 is now not available in Windows 10. I do have a work-round - I have a Windows XP PC with internet connection removed and
    with several older browsers, and I would transfer files via a USB memory stick. But that's not nearly as convenient as calling IE 11 from the
    batch file that I am running.

    Have you tried the method which uses VBScript via Windows Script Host to
    open Internet Explorer? (long URL warning)

    https://beebom.com/how-enable-and-use-internet-explorer-windows-11/#h-create-a-vbs-shortcut-to-open-internet-explorer-on-windows-11

    Is there a readily-available function which can be called from JavaScript running in a normal modern browser which, given as an argument the name (relative to the current directory, or maybe absolute) of an HTML file,
    will return a reference to the parsed DOM tree of the BODY of the HTML, including the Links, Anchors, etc. arrays?

    HISTORY - starting with Win XP pro sp3 + Firefox 3.6.13 or earlier, I discovered that a Web-type page on a local drive could have such a
    function which reads the specified file into an IFRAME element of the
    page, and then returned the desired reference - in any browser that I
    then had.

    So, starting with any file of the local copy of a Web-type site (usually INDEX.HTM), I could read the links of the DOM tree, and recurse wildly through the whole rats-nest of linked local files (never reading any of
    them more than once, of course) - and doing that I could check that all
    the files linked to were present and all the cited anchors too,
    collecting the faults and other things of interest. All at local-machine speed, and not using the Internet (when I started this, I had a dial-up connection).

    Something like Txt = DOC.body.textContent || DOC.body.innerText // latter
    for IE8- was probably involved

    After a while, newer versions on browsers blocked such local access by an ill-considered (IMHO) implemented implementation of the same-origin
    policy; but it still worked in IE 11 - until this evening.

    I've not yet managed to get IE mode working in MS Edge.

    If you want a complete DOM tree including the HTML namespace, you can use `document.write()` on an `about:blank` IFRAME's document.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From =?UTF-8?B?SvCdlZLwnZWe8J2VlvCdlaQg8@21:1/5 to John Stockton on Fri Feb 17 15:13:38 2023
    In Message <7cb3d7c8-74e8-49d2-ba7d-ab6df813f0c1n@googlegroups.com>
    On Thursday Feb 16, 2023 6:25 pm -05:00
    John Stockton <dr.j.r.stockton@gmail.com> wrote:

    As feared, MS IE 11 is now not available in Windows 10.

    [snip]

    MS just installed an add-on IEToEdge BHO which can not be disabled.

    Internet option - programs tab - manage add-ons

    IF you rename ie_to_edge_stub.exe to ie_to_edge_stub.exe.bak you may be
    able to run as before.

    On my windows 10 it is located in "C:\Program Files (x86)\Microsoft\Edge\Application\110.0.1587.49\BHO\"

    --
    J𝕒𝕞𝕖𝕤 𝕂𝕚𝕣𝕜

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Sun Feb 19 00:33:59 2023
    John Stockton, 2023-02-17 00:25:

    As feared, MS IE 11 is now not available in Windows 10. I do have a

    Not, not "as feared" - this is a good thing, since IE is a mess for
    security.

    work-round - I have a Windows XP PC with internet connection removed
    and with several older browsers, and I would transfer files via a USB
    memory stick. But that's not nearly as convenient as calling IE 11
    from the batch file that I am running.

    Then call another browser from the batch file. There are many alternatives.

    Is there a readily-available function which can be called from
    JavaScript running in a normal modern browser which, given as an
    argument the name (relative to the current directory, or maybe
    absolute) of an HTML file, will return a reference to the parsed DOM
    tree of the BODY of the HTML, including the Links, Anchors, etc.
    arrays?

    See <https://developer.mozilla.org/en-US/docs/Web/API/DOMParser>.

    The problem is, that you may not be able to read the existing HTML file
    into a string for security reasons since JavaScript in a browser is not
    allowed to read local files on your computer.

    However using Node.js you may just to that - then you would not even
    need a browser to do that:

    <https://www.npmjs.com/package/jsdom>


    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas 'PointedEars' Lahn@21:1/5 to John Stockton on Tue Feb 21 22:13:45 2023
    John Stockton wrote:

    Is there a readily-available function which can be called from JavaScript running in a normal modern browser which, given as an argument the name (relative to the current directory, or maybe absolute) of an HTML file,
    will return a reference to the parsed DOM tree of the BODY of the HTML, including the Links, Anchors, etc. arrays?

    Apparently they have even killed file:// to file:// HTTP requests in/by Chromium 108 :-(

    | Access to fetch at 'file:///…' from origin 'null' has been blocked by CORS | policy: Cross origin requests are only supported for protocol schemes:
    | http, data, isolated-app, chrome-extension, chrome, https, chrome-
    | untrusted.

    So AFAIK it is only possible now with a local Web server where the target
    file is also a resource accessible via HTTP:

    fetch(httpURI).then(response => {
    response.text().then(response2 => {
    var doc = (new DOMParser()).parseFromString(response2, 'text/html');
    /* Example */
    console.log(doc.images);
    });
    });

    (WFM.)

    DOMParser is available since the time of Firefox 1 (2006).
    If you do not have fetch(), you should write one using XMLHttpRequest.

    <https://developer.mozilla.org/en-US/docs/Web/API/DOMParser> <https://developer.mozilla.org/en-US/docs/Web/API/fetch> <https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest> <http://PointedEars.de/es-matrix/#javascript>

    --
    PointedEars
    <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
    Twitter: @PointedEars2
    Please do not cc me. /Bitte keine Kopien per E-Mail.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Thomas 'PointedEars' Lahn@21:1/5 to John Stockton on Tue Feb 21 22:15:54 2023
    John Stockton wrote:

    Is there a readily-available function which can be called from JavaScript running in a normal modern browser which, given as an argument the name (relative to the current directory, or maybe absolute) of an HTML file,
    will return a reference to the parsed DOM tree of the BODY of the HTML, including the Links, Anchors, etc. arrays?

    Apparently they have even killed file:// to file:// HTTP requests in/by Chromium 108 :-(

    | Access to fetch at 'file:///…' from origin 'null' has been blocked by CORS | policy: Cross origin requests are only supported for protocol schemes:
    | http, data, isolated-app, chrome-extension, chrome, https, chrome-
    | untrusted.

    So AFAIK it is only possible now with a local Web server where the target
    file is also a resource accessible via HTTP:

    fetch(httpURI).then(response => {
    response.text().then(response2 => {
    var doc = (new DOMParser()).parseFromString(response2, 'text/html');
    /* Example */
    console.log(doc.images);
    });
    });

    (WFM.)

    DOMParser is available since the time of Firefox 1 (2006).
    If you do not have fetch(), you should write one using XMLHttpRequest.

    <https://developer.mozilla.org/en-US/docs/Web/API/DOMParser> <https://developer.mozilla.org/en-US/docs/Web/API/fetch> <https://developer.mozilla.org/en-US/docs/Web/API/XMLHttpRequest> <http://PointedEars.de/es-matrix/#javascript>

    --
    PointedEars
    <https://github.com/PointedEars> | <http://PointedEars.de/wsvn/>
    Twitter: @PointedEars2
    Please do not cc me. /Bitte keine Kopien per E-Mail.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Stockton@21:1/5 to John Stockton on Sat Mar 4 13:36:07 2023
    On Thursday, 16 February 2023 at 23:25:39 UTC, John Stockton wrote:
    As feared, MS IE 11 is now not available in Windows 10. I do have a work-round - I have a Windows XP PC with internet connection removed and with several older browsers, and I would transfer files via a USB memory stick. But that's not nearly as
    convenient as calling IE 11 from the batch file that I am running.

    Is there a readily-available function which can be called from JavaScript running in a normal modern browser which, given as an argument the name (relative to the current directory, or maybe absolute) of an HTML file, will return a reference to the
    parsed DOM tree of the BODY of the HTML, including the Links, Anchors, etc. arrays?

    ... ... ...

    I've not yet managed to get IE mode working in MS Edge.



    However, in an ordinary Windows 10 command prompt batch file, the following line
    START iexplore %unto%\HOMEPAGE\linxchek.htm#LC?GoAt=%Who4%-SET.HTM
    still works as it did before

    So :-
    (1) I don't need to change anything at present (A),
    (2) Otherwise, I can use my adjacent off-net update-immune Windows XP PC reading directly from one of my regular back-up USB memories,
    (3) I would use a drop-in JavaScript function to replace the one that uses the Iframe,
    (4) Anything involving vesting GITHUB, though perhaps in principle superior, would be too much bother,
    (5) I want it to be able to run on any PC merely by transferring a few files, without having to install anything - as is, it can run on any of my back-up memories using a host PC (with IEXPLORE on the Path,
    (6) I don't know what XENU can do, but it's not likely to do any of the other things that I do _while_ roaming the DOM tree,
    (7) I might be able to do more using VBScript with or instead of JavaScript - but I'm not fluent in VBScript - does it have a function to return the DOM tree of a named HTML file?
    (8) When I started writing LINXCHEK.HTM, I was I think using Firefox in Windows XP sp3. At that time I think that it would basically work on all of my browsers, except that some features did not work on all browsers. From time to time, it stopped
    running on various browsers; Firefox failed on reaching version 68.0,
    (9) I suspect that the only non-resident malware that IE11, when only reading local files, can be troubled by is Windows Update.

    (A) Well, I did have a simple tester page VALIDATE.HTM to call LINXCHEK.HTM with standard values for the GoAt parameter; but it was easy to make VALIDATE.BAT, which provides the same functionality. In each case, it is LINXCHEK which shows the results.

    Thanks for the efforts, most of which were comprehensible.

    --
    (c) John Stockton, near London, UK. Using Google Groups. |

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From JJ@21:1/5 to John Stockton on Sun Mar 5 23:05:58 2023
    On Sat, 4 Mar 2023 13:36:07 -0800 (PST), John Stockton wrote:
    On Thursday, 16 February 2023 at 23:25:39 UTC, John Stockton wrote:
    As feared, MS IE 11 is now not available in Windows 10. I do have a work-round - I have a Windows XP PC with internet connection removed and with several older browsers, and I would transfer files via a USB memory stick. But that's not nearly as
    convenient as calling IE 11 from the batch file that I am running.

    Is there a readily-available function which can be called from JavaScript running in a normal modern browser which, given as an argument the name (relative to the current directory, or maybe absolute) of an HTML file, will return a reference to the
    parsed DOM tree of the BODY of the HTML, including the Links, Anchors, etc. arrays?

    ... ... ...

    I've not yet managed to get IE mode working in MS Edge.

    However, in an ordinary Windows 10 command prompt batch file, the following line
    START iexplore %unto%\HOMEPAGE\linxchek.htm#LC?GoAt=%Who4%-SET.HTM
    still works as it did before

    So :-
    (1) I don't need to change anything at present (A),
    (2) Otherwise, I can use my adjacent off-net update-immune Windows XP PC reading directly from one of my regular back-up USB memories,
    (3) I would use a drop-in JavaScript function to replace the one that uses the Iframe,
    (4) Anything involving vesting GITHUB, though perhaps in principle superior, would be too much bother,
    (5) I want it to be able to run on any PC merely by transferring a few files, without having to install anything - as is, it can run on any of my back-up memories using a host PC (with IEXPLORE on the Path,
    (6) I don't know what XENU can do, but it's not likely to do any of the other things that I do _while_ roaming the DOM tree,
    (7) I might be able to do more using VBScript with or instead of JavaScript - but I'm not fluent in VBScript - does it have a function to return the DOM tree of a named HTML file?
    (8) When I started writing LINXCHEK.HTM, I was I think using Firefox in Windows XP sp3. At that time I think that it would basically work on all of my browsers, except that some features did not work on all browsers. From time to time, it stopped
    running on various browsers; Firefox failed on reaching version 68.0,
    (9) I suspect that the only non-resident malware that IE11, when only reading local files, can be troubled by is Windows Update.

    (A) Well, I did have a simple tester page VALIDATE.HTM to call LINXCHEK.HTM with standard values for the GoAt parameter; but it was easy to make VALIDATE.BAT, which provides the same functionality. In each case, it is LINXCHEK which shows the results.

    Thanks for the efforts, most of which were comprehensible.

    Even if MSIE is no longer made "available", it's simply removes the MSIE
    host application - the IEXPLORE.EXE. The MSIE's web browser engine will
    never be removable, since it's used by some Windows applications and various administrative applications such as Management Console for some Windows features.

    In this case, the MSIE's embedded web brower component still exists and is usable. We can use HTML Application (MSHTA; *.hta) scripts to use embedded
    web browser to make it a simple web browser. Or add more stuffs into it to
    make it a full featured web browser.

    MSHTA also won't be removed any time soon, since it's mainly used for administrative purposes where a GUI is needed. Just like Windows Script Host (WSH) with its VBScript and JScript.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Stockton@21:1/5 to All on Sun Mar 5 15:26:42 2023
    On Sunday, 5 March 2023 at 16:06:08 UTC, JJ wrote:

    Even if MSIE is no longer made "available", it's simply removes the MSIE host application - the IEXPLORE.EXE.

    There are 21 files named IEXPLORE.EXE on my main Win 10 PC's C: drive at present; several different sizes, with four at about 730kB and the rest at about 10kB. Since
    START iexplore LINXCHEK.HTM
    works, I suppose that at least one of them is on the Path.

    The MSIE's web browser engine will
    never be removable, since it's used by some Windows applications and various administrative applications such as Management Console for some Windows features.

    But it could be moved, renamed, and maybe improved.


    All is working, in Win XP and Win 10. There was a surprise - in Win XP, IE loaded LINXCHEK.HTM but that would not run. Firefox ESR 52.9.0 (32-bit) works there, *better* than MSIE. So, for Win XP, I can conditionally disable the annoying IE-bug-dodging
    code. If I recall correctly, JavaScript can distinguish IE from other browsers by testing for a subtle difference somewhere in the number-handling region.


    --
    (c) John Stockton, near London, UK. Using Google Groups. |

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ezimene nimi Teine nimi@21:1/5 to Dr....@nyc.rr.com on Mon Mar 6 03:17:42 2023
    Come to: http://meetupplace4everyone.medianewsonline.com/a.php

    and invite all Your friends too !!!!!!!!!!!!!!!



    On Friday, February 17, 2023 at 2:46:54 AM UTC+2, Dr....@nyc.rr.com wrote:
    On Thu, 16 Feb 2023 15:25:34 -0800 (PST), John Stockton <dr.j.r....@gmail.com> wrote in
    <7cb3d7c8-74e8-49d2...@googlegroups.com>:

    <snip>
    As feared, MS IE 11 is now not available in Windows 10.
    After a while, newer versions on browsers blocked such local access by an ill-considered (IMHO) implemented implementation of the same-origin policy; but it still worked in IE 11 - until this evening.
    Look into creating a Simple Python HTTP(S) Server in your local computer which avoids the same-origin policy issues.

    It is not necessary to use a browser to check links. Look at "Xenu link sleuth" which, although old, still works.

    HTH

    K

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Arno Welzel@21:1/5 to All on Thu Mar 9 03:06:19 2023
    Thomas 'PointedEars' Lahn, 2023-02-21 22:15:

    John Stockton wrote:

    Is there a readily-available function which can be called from JavaScript
    running in a normal modern browser which, given as an argument the name
    (relative to the current directory, or maybe absolute) of an HTML file,
    will return a reference to the parsed DOM tree of the BODY of the HTML,
    including the Links, Anchors, etc. arrays?

    Apparently they have even killed file:// to file:// HTTP requests in/by Chromium 108 :-(

    Which is a good thing. Scripts within a browser which may get loaded
    from untrusted sources should never be allowed to access the local
    filesystem.

    That's why you may end up with Node.js as runtime environment which fits
    the requirement of using scripts to access local resources much better
    than running a script in a browser.


    --
    Arno Welzel
    https://arnowelzel.de

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From K@21:1/5 to All on Sat Mar 11 01:59:59 2023
    Hey, J𝕒𝕞𝕖𝕤 𝕂𝕚𝕣𝕜. Can You help me with 8 000 000 000 $'s ?



    On Friday, February 17, 2023 at 5:14:01 PM UTC+2, J𝕒𝕞𝕖𝕤 𝕂𝕚𝕣𝕜 wrote:
    In Message <7cb3d7c8-74e8-49d2...@googlegroups.com>
    On Thursday Feb 16, 2023 6:25 pm -05:00
    John Stockton <dr.j.r....@gmail.com> wrote:

    As feared, MS IE 11 is now not available in Windows 10.
    [snip]

    MS just installed an add-on IEToEdge BHO which can not be disabled.

    Internet option - programs tab - manage add-ons

    IF you rename ie_to_edge_stub.exe to ie_to_edge_stub.exe.bak you may be
    able to run as before.

    On my windows 10 it is located in "C:\Program Files (x86)\Microsoft\Edge\Application\110.0.1587.49\BHO\"

    --
    J𝕒𝕞𝕖𝕤 𝕂𝕚𝕣𝕜

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)