• Re: Lower case and diff two text files contents of email addresses

    From Graham J@21:1/5 to Maxmillian on Thu Mar 23 18:53:00 2023
    XPost: alt.msdos.batch, alt.comp.microsoft.windows

    Maxmillian wrote:
    I have two long lists of email addresses in Windows 10 as text files.

    How can I lowercase everything and then get a diff of what email
    addresses are in one text file but not in the other text file?




    Are the email addersses separated in any way, with commas, spaces,
    tabs, or semicolons?

    If so, import each list into a speadsheet so that there is one email
    address per line. Sort the lines. Compare the two spreadsheets.

    Look up fc for file compare

    fc /?


    --
    Graham J

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Maxmillian@21:1/5 to All on Thu Mar 23 18:31:27 2023
    XPost: alt.msdos.batch, alt.comp.microsoft.windows

    I have two long lists of email addresses in Windows 10 as text files.

    How can I lowercase everything and then get a diff of what email
    addresses are in one text file but not in the other text file?

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Maxmillian on Thu Mar 23 15:31:20 2023
    XPost: alt.msdos.batch, alt.comp.microsoft.windows

    On 3/23/2023 2:31 PM, Maxmillian wrote:
    I have two long lists of email addresses in Windows 10 as text files.

    How can I lowercase everything and then get a diff of what email
    addresses are in one text file but not in the other text file?


    **************************** diffemail.awk **************************

    # Assumes file1.txt and file2.txt are in the current working directory
    #
    # gawk.exe -f diffemail.awk file2.txt

    BEGIN {
    while ( (getline < "file1.txt") > 0 ) { # load one file into memory
    # I am too lazy to pass this as param
    $0 = tolower($0)
    arr[$0]++ # The array index is the key, array content currently
    # is a don't care condition. You can detect duplicates
    # if you want.
    }
    close("file1.txt") # Polite are we...
    }

    { # program body, checks for file2 entry is in file1. We are reading file2 now...
    $0 = tolower($0)
    if ($0 in arr) { # check if a single, incoming entry, is in arr[] or not
    print $0 " is in both files"
    } else {
    print $0 " is not in file1.txt"
    }
    }

    **************************** END diffemail.awk **************************

    file1.txt
    fOo@computer.com
    baR@computer.com
    bAz@computer.com

    file2.txt
    foO@computer.com
    Bar@computer.com
    Baz@computer.com
    not@in.computer

    Output

    PS D:\> .\gawk.exe -f diffemail.awk file2.txt
    foo@computer.com is in both files
    bar@computer.com is in both files
    baz@computer.com is in both files
    not@in.computer is not in file1.txt
    PS D:\>

    You can spice up the program with as much if-then-else
    that you care to. You can even store both files in memory
    if you want.

    *******

    The gawk.exe file is in the binaries ZIP file here:

    https://gnuwin32.sourceforge.net/packages/gawk.htm

    Binaries Zip 1,448,542 10 February 2008 f875bfac137f5d24b38dd9fdc9408b5a

    Name: gawk-3.1.6-1-bin.zip
    Size: 1448542 bytes (1414 KiB)
    SHA1: BDA507655EB3D15059D8A55A0DAF6D697A15F632

    Program uses Windows line endings, whereas the bash shell version
    would use Linux line endings.

    Program does not support unicode or the like. It is
    just for plain ASCII at the moment.

    It's not really a practical program, just a demo of
    how easy it is to whip something up.

    And every language... has something it is not good at.
    This language is not an exception to that.

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From DAN@21:1/5 to Maxmillian on Thu Mar 23 21:44:30 2023
    Maxmillian wrote:

    I have two long lists of email addresses in Windows 10 as text files.

    How can I lowercase everything and then get a diff of what email
    addresses are in one text file but not in the other text file?

    Look at WinMerge https://winmerge.org/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Zaidy036@21:1/5 to Graham J on Thu Mar 23 16:42:55 2023
    XPost: alt.msdos.batch, alt.comp.microsoft.windows

    On 3/23/2023 2:53 PM, Graham J wrote:
    Maxmillian wrote:
    I have two long lists of email addresses in Windows 10 as text files.

    How can I lowercase everything and then get a diff of what email
    addresses are in one text file but not in the other text file?




    Are the email addersses separated in any way, with commas, spaces, tabs,
    or semicolons?

    If so, import each list into a speadsheet so that there is one email
    address per line.  Sort the lines.  Compare the two spreadsheets.

    Look up fc for file compare

    fc /?


    free for non-commercial ASAP Utilities has a function to mark duplicates

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Herbert Kleebauer@21:1/5 to Maxmillian on Fri Mar 24 01:49:46 2023
    XPost: alt.msdos.batch, alt.comp.microsoft.windows

    23.03.2023 19:31, Maxmillian wrote:
    I have two long lists of email addresses in Windows 10 as text files.

    How can I lowercase everything and then get a diff of what email
    addresses are in one text file but not in the other text file?

    Because you posted in alt.msdos.batch, here a batch solution:

    @echo off

    :: list all email addresses which are not in both
    :: input files (email1.txt, email2.txt)

    if [%1]==[sub] goto :sub
    sort email1.txt|find "@" >email1s.txt
    sort email2.txt|find "@" >email2s.txt
    cmd /c %0 sub
    del email1s.txt
    del email2s.txt
    goto :eof

    :sub
    setlocal EnableDelayedExpansion

    3<email1s.txt 4<email2s.txt (
    set line1a=&set /P line1a=<&3
    set line2a=&set /P line2a=<&4
    set line1b=&set /P line1b=<&3
    set line2b=&set /P line2b=<&4

    for /l %%i in (1,1,100000) do (
    if /I [!line1a!]==[!line2a!] (
    if [!line1a!]==[] exit
    set line1a=!line1b!
    set line2a=!line2b!
    set line1b=&set /P line1b=<&3
    set line2b=&set /P line2b=<&4
    ) else (
    if /I [!line1a!]==[!line2b!] (
    echo !line2a! in email2.txt but not in email1.txt
    set line2a=!line2b!
    set line2b=&set /P line2b=<&4
    ) else (
    if /I [!line1b!]==[!line2a!] (
    echo !line1a! in email1.txt but not in email2.txt
    set line1a=!line1b!
    set line1b=&set /P line1b=<&3
    ) else (
    echo !line1a! in email1.txt but not in email2.txt
    echo !line2a! in email2.txt but not in email1.txt
    set line1a=!line1b!
    set line2a=!line2b!
    set line1b=&set /P line1b=<&3
    set line2b=&set /P line2b=<&4
    )
    )
    )
    )
    )

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Burnelli@21:1/5 to Herbert Kleebauer on Fri Mar 24 03:35:01 2023
    XPost: alt.msdos.batch, alt.comp.microsoft.windows

    Herbert Kleebauer wrote:

    23.03.2023 19:31, Maxmillian wrote:
    I have two long lists of email addresses in Windows 10 as text files.

    How can I lowercase everything and then get a diff of what email
    addresses are in one text file but not in the other text file?

    Because you posted in alt.msdos.batch, here a batch solution:

    @echo off

    :: list all email addresses which are not in both
    :: input files (email1.txt, email2.txt)

    if [%1]==[sub] goto :sub
    sort email1.txt|find "@" >email1s.txt
    sort email2.txt|find "@" >email2s.txt
    cmd /c %0 sub
    del email1s.txt
    del email2s.txt
    goto :eof

    :sub
    setlocal EnableDelayedExpansion

    3<email1s.txt 4<email2s.txt (
    set line1a=&set /P line1a=<&3
    set line2a=&set /P line2a=<&4
    set line1b=&set /P line1b=<&3
    set line2b=&set /P line2b=<&4

    for /l %%i in (1,1,100000) do (
    if /I [!line1a!]==[!line2a!] (
    if [!line1a!]==[] exit
    set line1a=!line1b!
    set line2a=!line2b!
    set line1b=&set /P line1b=<&3
    set line2b=&set /P line2b=<&4
    ) else (
    if /I [!line1a!]==[!line2b!] (
    echo !line2a! in email2.txt but not in email1.txt
    set line2a=!line2b!
    set line2b=&set /P line2b=<&4
    ) else (
    if /I [!line1b!]==[!line2a!] (
    echo !line1a! in email1.txt but not in email2.txt
    set line1a=!line1b!
    set line1b=&set /P line1b=<&3
    ) else (
    echo !line1a! in email1.txt but not in email2.txt
    echo !line2a! in email2.txt but not in email1.txt
    set line1a=!line1b!
    set line2a=!line2b!
    set line1b=&set /P line1b=<&3
    set line2b=&set /P line2b=<&4
    )
    )
    )
    )
    )

    That is just sheer genius.

    You should win a nobel prize for that as a diff has been the bane of
    Windows users for years!

    It's going into my batch folder immediately!

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Herbert Kleebauer@21:1/5 to Andy Burnelli on Fri Mar 24 09:13:20 2023
    XPost: alt.msdos.batch, alt.comp.microsoft.windows

    On 24.03.2023 04:35, Andy Burnelli wrote:

    That is just sheer genius.

    For genius solutions, you should ask ChatGPT.
    From a discussion in de.comp.os.ms-windows.misc:

    set v=39/2023

    How to extract the two numbers in into variables v1 an v2

    The answer from ChatGPT:

    set v=39/2023
    set v1=%v:/=&rem.%
    set v2=%v:\=&rem.%
    echo "%v%", "%v1%", "%v2%"


    Ok, v2 is wrong, but that is the trivial part of the question.
    But v1 is really good!!!

    set v=39/2023
    set v1=%v:/=&rem.%
    set v2=%v:*/=%
    echo "%v%", "%v1%", "%v2%"

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Burns@21:1/5 to Herbert Kleebauer on Fri Mar 24 09:06:54 2023
    XPost: alt.msdos.batch, alt.comp.microsoft.windows

    Herbert Kleebauer wrote:

    The answer from ChatGPT:

    set v=39/2023
    set v1=%v:/=&rem.%
    set v2=%v:\=&rem.%
    echo "%v%", "%v1%", "%v2%"

    Ok, v2 is wrong, but that is the trivial part of the question.
    But v1 is really good!!!

    It it actually using undefined CMD behaviour?

    I've never seen any reference to using & in a variable substitution

    Or that substitutions are handled like a sub-command which you could
    stick a REM statement in the middle of to ignore what follows

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Andy Burns@21:1/5 to Herbert Kleebauer on Fri Mar 24 11:04:18 2023
    XPost: alt.msdos.batch, alt.comp.microsoft.windows

    Herbert Kleebauer wrote:

    Andy Burns wrote:

    It it actually using undefined CMD behaviour?

    Normal behavior, "/" is replaced by "&rem.", so you get:

    set v1=39&rem.2023

    which is equivalent to the two lines:

    set v1=39
    rem.2023

    Still somewhat surprised that it works without using
    enabledelayedexpansion (or calling cmd.exe /v)

    I'd have thought after it parsed the original SET statement, it would
    execute it "as was" not executing the &REM from after the substitution
    had been done.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Herbert Kleebauer@21:1/5 to Andy Burns on Fri Mar 24 11:17:47 2023
    XPost: alt.msdos.batch, alt.comp.microsoft.windows

    On 24.03.2023 10:06, Andy Burns wrote:
    Herbert Kleebauer wrote:

    The answer from ChatGPT:

    set v=39/2023
    set v1=%v:/=&rem.%

    It it actually using undefined CMD behaviour?

    Normal behavior, "/" is replaced by "&rem.", so you get:

    set v1=39&rem.2023

    which is equivalent to the two lines:

    set v1=39
    rem.2023

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Herbert Kleebauer@21:1/5 to Herbert Kleebauer on Fri Mar 24 14:20:44 2023
    XPost: alt.msdos.batch, alt.comp.microsoft.windows

    On 24.03.2023 01:49, Herbert Kleebauer wrote:
    23.03.2023 19:31, Maxmillian wrote:
    I have two long lists of email addresses in Windows 10 as text files.

    How can I lowercase everything and then get a diff of what email
    addresses are in one text file but not in the other text file?

    Because you posted in alt.msdos.batch, here a batch solution:

    @echo off

    :: list all email addresses which are not in both
    :: input files (email1.txt, email2.txt)


    Sorry, this code doesn't work at all. Was to late yesterday,
    but I wanted to try the idea of reading more input files
    at the same time (was presented many years ago in a.m.b.nt).
    Better use a small C program.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Zaidy036@21:1/5 to Herbert Kleebauer on Fri Mar 24 11:12:50 2023
    XPost: alt.msdos.batch, alt.comp.microsoft.windows

    On 3/24/2023 9:20 AM, Herbert Kleebauer wrote:
    On 24.03.2023 01:49, Herbert Kleebauer wrote:
       23.03.2023 19:31, Maxmillian wrote:
    I have two long lists of email addresses in Windows 10 as text files.

    How can I lowercase everything and then get a diff of what email
    addresses are in one text file but not in the other text file?

    Because you posted in alt.msdos.batch, here a batch solution:

    @echo off

    :: list all email addresses which are not in both
    :: input files (email1.txt, email2.txt)


    Sorry, this code doesn't work at all. Was to late yesterday,
    but I wanted to try the idea of reading more input files
    at the same time (was presented many years ago in a.m.b.nt).
    Better use a small C program.


    Paste email file into Notepad or Notepad++ and select all and right
    click and change to lower or upper case in one click if case compare is
    a problem. Then in future only use one case because email addresses do
    not care.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to Andy Burnelli on Fri Mar 24 08:26:43 2023
    On Fri, 24 Mar 2023 03:35:01 +0000, Andy Burnelli wrote:


    That is just sheer genius.

    You should win a nobel prize for that as a diff has been the bane of
    Windows users for years!

    It's going into my batch folder immediately!

    Reminds me of the dancing bear -- you don't judge the quality of the
    dancing, because the fact that it can be done at all is amazing.

    I often use the freeware CSDIFF from Component Software. As compared
    with a batch file, it has a graphical interface, various options for
    the comparison, and differences displayed in context in the results
    window.

    Component Software doesn't seem to exist any more, but Archive.org
    still has the download file:
    <https://archive.org/details/csdiff25_zip>
    It's 943 KB -- that's KB, not MB. No installer, just unzip it to use.

    There's also my own freeware CMP (donations gratefully accepted), at <https://brownmath.com/utils/cmp.htm>
    It's a command-line program, so no Windows and no Unicode. But I
    still find it useful for various tasks, including testing whether two
    directory trees contain the same versions of all files, and if not
    then which ones differ. Circling back to the original; question, CMP
    has an option to ignore upper/lower case in the comparison.

    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Paul@21:1/5 to Paul on Sun Mar 26 18:15:46 2023
    XPost: alt.comp.microsoft.windows

    On 3/23/2023 3:31 PM, Paul wrote:
    On 3/23/2023 2:31 PM, Maxmillian wrote:
    I have two long lists of email addresses in Windows 10 as text files.

    How can I lowercase everything and then get a diff of what email
    addresses are in one text file but not in the other text file?


    **************************** diffemail2.awk **************************
    # Assumes file1.txt and file2.txt are in the current working directory
    #
    # .\gawk.exe -f diffemail2.awk file1.txt file2.txt common-out.txt
    # ARGV[0] ARGV[1] ARGV[2] ARGV[3]
    # ARGC=4
    # normally, you would clear arguments before proceeding to read ( clear ARGV[3] )

    BEGIN {
    duplicates=0 # Your input is bad. Sanitize and retry. When bad, variable set to 1.
    lines1=0 # count number of lines in file1 (to tell main loop, which input file we're in)
    lines2=0 # malformed line number counter for file2
    pass2 =0 # count number of lines of file1.txt and file2.txt processed in main loop
    abend=1 # Set back to 0 if everything was OK
    if ( ARGC != 4 ) {
    print "Summary: gawk.exe -f diffemail2.awk file1.txt file2.txt common-out.txt"
    exit(1)
    }

    file1=ARGV[1]
    file2=ARGV[2]
    output=ARGV[3]
    # ARGV[1]="" # Zero out the args if you're not using them as input files
    # ARGV[2]="" # Zero out the args if you're not using them as input files
    ARGV[3]="" # Zero out the args if you're not using them as input files

    while ( (getline < file1 ) > 0 ) { # load one file into memory
    lines1++ # Figure out which file is being processed via line count
    $0 = tolower($0)
    if (length($0) > 0) { # don't process empty lines
    if (NF != 1) { # an email address has no spaces in it, so only 1 string is present
    printf("%s entry \"%s\" is malformed at line number %d\n", file1, $0, lines1)
    duplicates=1
    }
    a[$0]++ # The array index is the key, array content currently dont matter much
    if (a[$0] > 1) {
    printf("%s entry %s is a duplicate\n", file1, $0)
    duplicates=1
    }
    }
    }
    close(file1)

    while ( (getline < file2 ) > 0 ) { # load one file into memory
    lines2++ # for lines which are malformed
    $0 = tolower($0)
    if (length($0) > 0) {
    if (NF != 1) {
    printf("%s entry \"%s\" is malformed at line number %d\n", file2, $0, lines2)
    duplicates=1
    }
    b[$0]++ # The array index is the key, array content currently dont matter much
    if (b[$0] > 1) {
    printf("%s entry %s is a duplicate\n", file2, $0)
    duplicates=1
    }
    }
    }
    close(file2)
    if (duplicates == 1) {
    abend=1
    exit(1) # don't execute main body or generate common-out.txt
    } else { abend=0 }
    }

    { # read file1.txt and file2.txt as "regular" input passed on command line
    if (length($0) > 0) {
    $0 = tolower($0)
    if (pass2 < lines1) { # check whether entry from file1 is in file2
    if ( ($0 in b) == 0 ) { printf("%s entry %s is not in file %s\n", file1, $0, file2 ) }
    else { print $0 > output } # common-out.txt scans file1 items found in file2
    } else { # check entry from file2 is in file1
    if ( ($0 in a) == 0 ) { printf("%s entry %s is not in file %s\n", file2, $0, file1 ) }
    }
    }
    pass2++ # increment lines processed on second pass,
    # tells whether file1 or file2 is being processed, via the less-than check
    }

    END {
    close( output )

    print ""
    if (abend == 0) { # Indicate normal run, if there was no obvious error
    print "Check \"" output "\" for all email addresses common to both files"
    }
    }
    **************************** END diffemail2.awk **************************

    file1.txt
    abc@not.in.computer
    fOo@computer.com
    baR@computer.com
    bAz@computer.com
    abc@not.in.computer <=== remove duplicate entry, then retry
    Bad input <=== remove things which are not an email address, retry
    <=== empty lines are tolerated

    file2.txt
    not@in.computer
    foO@computer.com
    Bar@computer.com
    Baz@computer.com
    not@in.computer <=== remove duplicate entry, then retry
    bad input <=== remove things which are not an email address, retry

    Output from first run.

    .\gawk.exe -f diffemail2.awk file1.txt file2.txt common-out.txt

    file1.txt entry abc@not.in.computer is a duplicate
    file1.txt entry "bad input" is malformed at line number 6
    file2.txt entry not@in.computer is a duplicate
    file2.txt entry "bad input" is malformed at line number 6

    Output from second run.

    .\gawk.exe -f diffemail2.awk file1.txt file2.txt common-out.txt

    file1.txt entry abc@not.in.computer is not in file file2.txt
    file2.txt entry not@in.computer is not in file file1.txt

    Check "common-out.txt" for all email addresses common to both files

    The common-out.txt file looks like this:

    foo@computer.com
    bar@computer.com
    baz@computer.com

    Paul

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Herbert Kleebauer@21:1/5 to Herbert Kleebauer on Mon Mar 27 13:48:10 2023
    XPost: alt.msdos.batch, alt.comp.microsoft.windows

    On 24.03.2023 14:20, Herbert Kleebauer wrote:
    On 24.03.2023 01:49, Herbert Kleebauer wrote:
    23.03.2023 19:31, Maxmillian wrote:
    I have two long lists of email addresses in Windows 10 as text files.

    How can I lowercase everything and then get a diff of what email
    addresses are in one text file but not in the other text file?

    Because you posted in alt.msdos.batch, here a batch solution:

    @echo off

    :: list all email addresses which are not in both
    :: input files (email1.txt, email2.txt)


    Sorry, this code doesn't work at all. Was to late yesterday,
    but I wanted to try the idea of reading more input files
    at the same time (was presented many years ago in a.m.b.nt).
    Better use a small C program.

    Because I don't like unfinished tasks, here a version which
    should work:

    @echo off

    :: list all email addresses which are not in both
    :: input files (email1.txt, email2.txt)

    if [%1]==[sub] goto :sub
    sort email1.txt|find "@" >email1s.txt
    sort email2.txt|find "@" >email2s.txt
    cmd /c %0 sub
    del email1s.txt
    del email2s.txt
    goto :eof

    :sub
    setlocal EnableDelayedExpansion

    3<email1s.txt 4<email2s.txt (
    set line1=&set /P line1=<&3
    set line2=&set /P line2=<&4

    for /l %%i in (1,0,2) do (
    if /I [!line1!]==[!line2!] (
    if [!line1!]==[] exit
    set line1=&set /P line1=<&3
    set line2=&set /P line2=<&4
    ) else (
    if /I [!line1!] geq [!line2!] (
    echo !line2! in email2.txt but not in email1.txt
    set line2=&set /P line2=<&4
    ) else (
    echo !line1! in email1.txt but not in email2.txt
    set line1=&set /P line1=<&3
    )
    )
    )
    )

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)