• Forensic recording of webpages

    From pensive hamster@21:1/5 to All on Tue Feb 13 05:56:52 2024
    Forensic recording of webpages

    Is it possible to record a webpage in such a way as to be able
    to prove that it is a genuine record of the webpage (on a particular
    date)?

    It is possible to take a screenshot / screengrab of a webpage, but,
    as I understand it, a screenshot is just a digital image, recording the
    pixels that were on the screen at the time. So it could be argued that
    such a screenshot could be edited, or even entirely forged, in a
    programme such as Photoshop, and therefore a screenshot couldn't
    be considered proof of anything. (Unless perhaps there is some hidden
    code in the screenshot image file.)

    I am thinking of the sort of situation whereby a company might advertise
    a product on their website, and state that it had certain capabilities. But
    if turned out that the product didn't actually have the advertised capabilities,
    and if a disappointed purchaser wished to make a claim against the
    company, the company could surreptitiously amend their webpage to
    remove the questionable description of the product's capabilities, and so
    the disappointed purchaser would not be able to use a screenshot as
    evidence that the company had made any such description of the product's capabilities.

    Equally, a company could surreptitiously alter their terms & conditions, as published on their website. Or a news source could remove a potentially libelous statement from their website.

    I had a look at the Wayback Machine / The Internet Archive

    -------------------
    https://archive.org/legal
    'Information Requests
    'The Internet Archive's Policy for Responding to Information Requests

    'The following sets forth the Internet Archive's policy with regard to requests for documents or other records for use in legal proceedings. [goes on to specify payments required].'
    -------------------

    However, Wikipedia says:

    -------------------
    https://en.wikipedia.org/wiki/Wayback_Machine#Legal_status

    'In Europe, the Wayback Machine could be interpreted as violating
    copyright laws. Only the content creator can decide where their content
    is published or duplicated, so the Archive would have to delete pages
    from its system upon request of the creator.[84] The exclusion policies
    for the Wayback Machine may be found in the FAQ section of the site.[85]
    [I haven't actually been able to find the exclusion policies so far.]

    'Some cases have been brought against the Internet Archive specifically
    for its Wayback Machine archiving efforts.'
    -------------------

    So I am thinking that a record of a webpage from the Internet Archive,
    might not be useable proof that it is a genuine record of the webpage,
    at least not in the UK.

    So I am wondering if anyone has any insight about how to establish a
    provable record of a webpage?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Alan J. Wylie@21:1/5 to pensive hamster on Tue Feb 13 18:24:11 2024
    pensive hamster <pensive_hamster@hotmail.co.uk> writes:

    Forensic recording of webpages

    Is it possible to record a webpage in such a way as to be able
    to prove that it is a genuine record of the webpage (on a particular
    date)?

    It is possible to take a screenshot / screengrab of a webpage, but,
    as I understand it, a screenshot is just a digital image, recording the pixels that were on the screen at the time.

    ...

    So I am wondering if anyone has any insight about how to establish a
    provable record of a webpage?

    Assuming you want a free service, lots of options here: https://freetsa.org/index_en.php

    In particular https://www.freetsa.org/index_en.php#screenshot

    See https://en.wikipedia.org/wiki/Trusted_timestamping
    and https://stackoverflow.com/questions/25052925/does-anyone-know-a-freetrial-timestamp-server-service


    Alternatively, in Firefox, "File", "Save Page As", "Web Page Complete"
    saves the page.

    Zip up the files/directory, then generate the SHA512 sum/hash of the zip
    file and post it here. Your post to which I am replying has this header, presumably added by the moderating NNTP server.

    Injection-Date: Tue, 13 Feb 2024 13:56:53 +0000
    --
    Alan J. Wylie https://www.wylie.me.uk/ Dance like no-one's watching. / Encrypt like everyone is.
    Security is inversely proportional to convenience

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo@21:1/5 to pensive hamster on Tue Feb 13 16:00:10 2024
    pensive hamster <pensive_hamster@hotmail.co.uk> wrote:
    Forensic recording of webpages

    Is it possible to record a webpage in such a way as to be able
    to prove that it is a genuine record of the webpage (on a particular
    date)?

    Do you want to do it for a contemporaneous web page, or one at some date in
    the past?

    Web pages don't have any kind of authenticity mechanism; the HTTPS
    connection does (so if you recorded that you can prove it was sent from
    their server within certain time parameters) but once it lands on your
    machine you just have files, with no proof you didn't modify them.

    If you want to prove they hadn't been modified after the day of access, then usual timestamping mechanisms apply (post them to yourself by sealed
    registered letter, or seal them in a vault, or take a cryptographic hash and publish it in an ad in The Times).

    'Blockchain' is another cryptographic way to prove state of something at a given point in time: take a hash of the data, append that hash to the chain. Its position in the chain between dated transactions from other people
    proves that you were in possession of that data at that point in time.

    It is possible to take a screenshot / screengrab of a webpage, but,
    as I understand it, a screenshot is just a digital image, recording the pixels that were on the screen at the time. So it could be argued that
    such a screenshot could be edited, or even entirely forged, in a
    programme such as Photoshop, and therefore a screenshot couldn't
    be considered proof of anything. (Unless perhaps there is some hidden
    code in the screenshot image file.)

    The above applies here as well; take a screenshot, seal it in a bank vault,
    if you've never since opened the vault you can prove that what was in the
    vault was what went in on that date.

    I am thinking of the sort of situation whereby a company might advertise
    a product on their website, and state that it had certain capabilities. But if turned out that the product didn't actually have the advertised capabilities,
    and if a disappointed purchaser wished to make a claim against the
    company, the company could surreptitiously amend their webpage to
    remove the questionable description of the product's capabilities, and so
    the disappointed purchaser would not be able to use a screenshot as
    evidence that the company had made any such description of the product's capabilities.

    If you want to go back in time and see what they were offering in the past,
    you need somebody who took a capture at that point in time, and to warrant their capture is a true record. It seems like the Wayback Machine offer
    that service. It is possible the page was not captured at the time, or the capture has since been deleted for reasons such as copyright claims - ie the absence of a record says nothing, but the presence of a record would be a
    true record. I don't see why this would disqualify them for providing
    evidence in a court - the court likely isn't concerned with them infringing copyright of some third party (it may well be fair dealing anyway). The defendant could counter-sue, but good luck suing them in San Francisco.

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Jethro_uk@21:1/5 to pensive hamster on Tue Feb 13 15:37:50 2024
    On Tue, 13 Feb 2024 05:56:52 -0800, pensive hamster wrote:

    Forensic recording of webpages

    Is it possible to record a webpage in such a way as to be able to prove
    that it is a genuine record of the webpage (on a particular date)?

    It is possible to take a screenshot / screengrab of a webpage, but,
    as I understand it, a screenshot is just a digital image, recording the pixels that were on the screen at the time. So it could be argued that
    such a screenshot could be edited, or even entirely forged, in a
    programme such as Photoshop, and therefore a screenshot couldn't be considered proof of anything. (Unless perhaps there is some hidden code
    in the screenshot image file.)

    I am thinking of the sort of situation whereby a company might advertise
    a product on their website, and state that it had certain capabilities.
    But if turned out that the product didn't actually have the advertised capabilities,
    and if a disappointed purchaser wished to make a claim against the
    company, the company could surreptitiously amend their webpage to remove
    the questionable description of the product's capabilities, and so the disappointed purchaser would not be able to use a screenshot as evidence
    that the company had made any such description of the product's
    capabilities.

    Equally, a company could surreptitiously alter their terms & conditions,
    as published on their website. Or a news source could remove a
    potentially libelous statement from their website.

    I had a look at the Wayback Machine / The Internet Archive

    -------------------
    https://archive.org/legal 'Information Requests 'The Internet Archive's Policy for Responding to Information Requests

    'The following sets forth the Internet Archive's policy with regard to requests for documents or other records for use in legal proceedings.
    [goes on to specify payments required].'
    -------------------

    However, Wikipedia says:

    ------------------- https://en.wikipedia.org/wiki/Wayback_Machine#Legal_status

    'In Europe, the Wayback Machine could be interpreted as violating
    copyright laws. Only the content creator can decide where their content
    is published or duplicated, so the Archive would have to delete pages
    from its system upon request of the creator.[84] The exclusion policies
    for the Wayback Machine may be found in the FAQ section of the site.[85]
    [I haven't actually been able to find the exclusion policies so far.]

    'Some cases have been brought against the Internet Archive specifically
    for its Wayback Machine archiving efforts.'
    -------------------

    So I am thinking that a record of a webpage from the Internet Archive,
    might not be useable proof that it is a genuine record of the webpage,
    at least not in the UK.

    So I am wondering if anyone has any insight about how to establish a
    provable record of a webpage?

    In theory the technology is trivial - you just need a sufficiently large
    hash and reliable algorithm.

    I suspect the main problem will be getting a court to accept it.
    Especially if they want to play the "you need to prove it every single
    case" game.

    That said, the rise of "AI" may lead to better ways of signing digital
    content.

    Software packages have provided hashes to check they haven't been
    tampered with/are what is advertised for years now.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roger Hayter@21:1/5 to pensive_hamster@hotmail.co.uk on Tue Feb 13 16:23:07 2024
    On 13 Feb 2024 at 13:56:52 GMT, "pensive hamster" <pensive_hamster@hotmail.co.uk> wrote:

    Forensic recording of webpages

    Is it possible to record a webpage in such a way as to be able
    to prove that it is a genuine record of the webpage (on a particular
    date)?

    It is possible to take a screenshot / screengrab of a webpage, but,
    as I understand it, a screenshot is just a digital image, recording the pixels that were on the screen at the time. So it could be argued that
    such a screenshot could be edited, or even entirely forged, in a
    programme such as Photoshop, and therefore a screenshot couldn't
    be considered proof of anything. (Unless perhaps there is some hidden
    code in the screenshot image file.)

    I am thinking of the sort of situation whereby a company might advertise
    a product on their website, and state that it had certain capabilities. But if turned out that the product didn't actually have the advertised capabilities,
    and if a disappointed purchaser wished to make a claim against the
    company, the company could surreptitiously amend their webpage to
    remove the questionable description of the product's capabilities, and so
    the disappointed purchaser would not be able to use a screenshot as
    evidence that the company had made any such description of the product's capabilities.

    Equally, a company could surreptitiously alter their terms & conditions, as published on their website. Or a news source could remove a potentially libelous statement from their website.

    I had a look at the Wayback Machine / The Internet Archive

    -------------------
    https://archive.org/legal
    'Information Requests
    'The Internet Archive's Policy for Responding to Information Requests

    'The following sets forth the Internet Archive's policy with regard to requests
    for documents or other records for use in legal proceedings. [goes on to specify payments required].'
    -------------------

    However, Wikipedia says:

    ------------------- https://en.wikipedia.org/wiki/Wayback_Machine#Legal_status

    'In Europe, the Wayback Machine could be interpreted as violating
    copyright laws. Only the content creator can decide where their content
    is published or duplicated, so the Archive would have to delete pages
    from its system upon request of the creator.[84] The exclusion policies
    for the Wayback Machine may be found in the FAQ section of the site.[85]
    [I haven't actually been able to find the exclusion policies so far.]

    'Some cases have been brought against the Internet Archive specifically
    for its Wayback Machine archiving efforts.'
    -------------------

    So I am thinking that a record of a webpage from the Internet Archive,
    might not be useable proof that it is a genuine record of the webpage,
    at least not in the UK.

    So I am wondering if anyone has any insight about how to establish a
    provable record of a webpage?

    Take a screenshot in the presence of a disinterested (or legal professional) party with a reputation for probity and get him to sign it at the time, keep a copy of his own, and be able to act as a witness if necessary. I doubt if any kind of cryptography can help in the absence of a reputable human witness.

    --
    Roger Hayter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roger Hayter@21:1/5 to Alan J. Wylie on Tue Feb 13 22:13:05 2024
    On 13 Feb 2024 at 18:24:11 GMT, ""Alan J. Wylie"" <alan@wylie.me.uk> wrote:

    pensive hamster <pensive_hamster@hotmail.co.uk> writes:

    Forensic recording of webpages

    Is it possible to record a webpage in such a way as to be able
    to prove that it is a genuine record of the webpage (on a particular
    date)?

    It is possible to take a screenshot / screengrab of a webpage, but,
    as I understand it, a screenshot is just a digital image, recording the
    pixels that were on the screen at the time.

    ...

    So I am wondering if anyone has any insight about how to establish a
    provable record of a webpage?

    Assuming you want a free service, lots of options here: https://freetsa.org/index_en.php

    In particular https://www.freetsa.org/index_en.php#screenshot

    See https://en.wikipedia.org/wiki/Trusted_timestamping
    and https://stackoverflow.com/questions/25052925/does-anyone-know-a-freetrial-timestamp-server-service


    Alternatively, in Firefox, "File", "Save Page As", "Web Page Complete"
    saves the page.

    Zip up the files/directory, then generate the SHA512 sum/hash of the zip
    file and post it here. Your post to which I am replying has this header, presumably added by the moderating NNTP server.

    Injection-Date: Tue, 13 Feb 2024 13:56:53 +0000

    The flaw in all these methods is proving you did not tamper with the data on the day you downloaded it, before securely recording it.

    --
    Roger Hayter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Owen Rees@21:1/5 to Theo on Tue Feb 13 22:49:31 2024
    Theo <theom+news@chiark.greenend.org.uk> wrote:
    pensive hamster <pensive_hamster@hotmail.co.uk> wrote:
    Forensic recording of webpages

    Is it possible to record a webpage in such a way as to be able
    to prove that it is a genuine record of the webpage (on a particular
    date)?

    Do you want to do it for a contemporaneous web page, or one at some date in the past?

    Web pages don't have any kind of authenticity mechanism; the HTTPS
    connection does (so if you recorded that you can prove it was sent from
    their server within certain time parameters) but once it lands on your machine you just have files, with no proof you didn't modify them.

    That is a very significant issue. Can you prove that the system you used to fetch the data had not been compromised? Can you prove that the browser had
    not been compromised and was truly rendering the pages it had fetched? Can
    you prove that what you saved is a true record of what the browser
    displayed? Can you prove that the saved data you are presenting is a true
    copy of what was saved and at what time it was saved.

    Other posters have addressed some of these issues but depending on what you need to prove to whom in the face of what challenges, you may have a
    difficult task.

    Involving a trustworthy third party at the outset and having them do the recording, timestamping and signing may be the simplest approach.

    There are technological solutions to most of the issues I gave above but I
    am not sure if the necessary mechanisms have been implemented in practice rather than being theoretical ways in which the relevant devices could be
    used.

    Perhaps a way to start will be to consider who will be the judge and to
    what level of confidence do you need to prove your case to them. I
    mentioned a trustworthy third party. It is not sufficient for you to trust them. The judge must trust them in the relevant matters if they are to be
    any use.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From John Levine@21:1/5 to All on Wed Feb 14 01:22:27 2024
    According to TTman <kraken.sankey@gmail.com>:
    So I am wondering if anyone has any insight about how to establish a
    provable record of a webpage?

    Your screenshot will contain meta data and clearly shows origin date and
    if the image has been modified...

    So mess with the page, then take a screenshot. There's no way around
    the gap in the chain of custody. And I have bad news for you: it's no
    harder to edit the metadata than to edit anything else.

    Since courts are based on law, not software, they usually accept
    Internet Archive pages because the Archive has well documented
    practices, a good reputation, and no reason to mess with random
    sites' web pages.

    --
    Regards,
    John Levine, johnl@taugh.com, Primary Perpetrator of "The Internet for Dummies",
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Theo@21:1/5 to Simon Parker on Wed Feb 14 12:42:44 2024
    Simon Parker <simonparkerulm@gmail.com> wrote:
    It depends on how "provable" you need it to be "establish[ed]" and how
    much you are willing to invest to 'establish' that 'proof'.

    We had a case where we needed to do precisely what you request here,
    (i.e. to adduce a certified copy of a given web-page taken at a fixed
    point in time), and we resorted to a third-party solution as we knew it
    would work and it had the backup of an organisation for whom this was
    their "day job" and upon whom we could depend, and call as a witness, if necessary.

    We used a company called "Foxton Forensics" and their product
    "PageRecon" [1].

    You can read about it here: https://www.foxtonforensics.com/blog/post/capturing-web-pages-as-evidence

    That looks like a good solution for capturing how a page is at the current time. Perhaps it's something browsers should do as a matter of course (it wouldn't be hard to add), but having an organisation willing to stand behind
    it and its output is very helpful.

    It might also be something the Internet Archive or similar could do when capturing pages for archival purposes, to be verified later. Although the assertion by the Archive may be sufficient without the crypto.

    (I do have some files that were on the Internet Archive in the early days
    but had become corrupted, but I guess that's been fixed long since)

    Theo

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Alan J. Wylie@21:1/5 to Roger Hayter on Wed Feb 14 19:03:36 2024
    Roger Hayter <roger@hayter.org> writes:

    On 13 Feb 2024 at 18:24:11 GMT, ""Alan J. Wylie"" <alan@wylie.me.uk> wrote:

    [much snippage]

    In particular https://www.freetsa.org/index_en.php#screenshot

    The flaw in all these methods is proving you did not tamper with the
    data on the day you downloaded it, before securely recording it.

    Not the one quoted above. If the hash anchor doesn't work, scroll down
    to the section "URL screenshot online". The third party visits the
    website, downloads it, converts it to a PDF and signs that.

    --
    Alan J. Wylie https://www.wylie.me.uk/

    Dance like no-one's watching. / Encrypt like everyone is.
    Security is inversely proportional to convenience

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Roger Hayter@21:1/5 to Alan J. Wylie on Wed Feb 14 22:49:54 2024
    On 14 Feb 2024 at 19:03:36 GMT, ""Alan J. Wylie"" <alan@wylie.me.uk> wrote:

    Roger Hayter <roger@hayter.org> writes:

    On 13 Feb 2024 at 18:24:11 GMT, ""Alan J. Wylie"" <alan@wylie.me.uk> wrote:

    [much snippage]

    In particular https://www.freetsa.org/index_en.php#screenshot

    The flaw in all these methods is proving you did not tamper with the
    data on the day you downloaded it, before securely recording it.

    Not the one quoted above. If the hash anchor doesn't work, scroll down
    to the section "URL screenshot online". The third party visits the
    website, downloads it, converts it to a PDF and signs that.

    Which is precisely the method I proposed. We are in fierce agreement.

    --
    Roger Hayter

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Smolley@21:1/5 to Alan J. Wylie on Thu Feb 15 12:20:04 2024
    On Wed, 14 Feb 2024 19:03:36 +0000, Alan J. Wylie wrote:

    Roger Hayter <roger@hayter.org> writes:

    On 13 Feb 2024 at 18:24:11 GMT, ""Alan J. Wylie"" <alan@wylie.me.uk>
    wrote:

    [much snippage]

    In particular https://www.freetsa.org/index_en.php#screenshot

    The flaw in all these methods is proving you did not tamper with the
    data on the day you downloaded it, before securely recording it.

    Not the one quoted above. If the hash anchor doesn't work, scroll down
    to the section "URL screenshot online". The third party visits the
    website, downloads it, converts it to a PDF and signs that.

    I do this as a matter of course.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From pensive hamster@21:1/5 to All on Wed Feb 21 06:13:46 2024
    Thanks to everyone who answered about forensic recording of webpages,
    I have taken note of all the suggestions. I don't need to do anything at present, but I now have plenty of information available for future reference. S.P.'s suggestion about "Foxton Forensics" looks like the most reliable,
    but I will investigate a number of the other suggestions as well. Many
    thanks to all.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)