• Email address pattern match in M code?

    From rtweed@21:1/5 to All on Mon Nov 9 10:50:21 2020
    Here's a challenge for folks - something someone may already have written perhaps? How to do a reasonable (doesn't have to be 100% perfect) pattern match on a string to determine whether or not it's a properly-structured email address. I DON'T mean is
    the address valid in as far as it exists as a registered email address. Just whether or not the string of characters is potentially OK as an email address

    Rob

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From K.S. Bhaskar@21:1/5 to All on Mon Nov 9 11:45:43 2020
    Since top level domains are constantly being added, instead of a hard-wired check, why not just call out to a service like https://validateemailaddress.org/ (or one of those listed at https://duckduckgo.com/?t=ffab&q=validate+e-mail+address&atb=v144-1&ia=
    web)?

    Regards
    – Bhaskar

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From retired developer@21:1/5 to rtweed on Tue Nov 10 00:22:28 2020
    On 09.11.20 19:50, rtweed wrote:
    How to do a reasonable (doesn't have to be 100% perfect) pattern match on a string to determine whether or not it's a properly-structured email address.

    Hi Rob,

    I have found this one in one of my old M-utilities (it's Cache-dialect,
    so you have to adopt it to your needs):


    // Check E-Mail address format
    // mx=1: check the MX record too
    //
    ChkEml(eml,mx=0) Public
    {
    s ok=0 i $l(eml,"@")=2, $zcvt(eml,"O","UTF8")=eml s
    eml=$zcvt(eml,"l"), ok=1
    s:ok ok=$tr($p(eml,"@"),"+-_","AAa")_"@"_$tr($p(eml,"@",2),"-","A")?1ln.an.(1"."1.an)1"@"1.(1ln.an1".")2.25a
    s:ok&mx ok=$$ChkMX($p(eml,"@",2),2)
    q ok
    }

    // Check existence of a MX-Record
    //
    ChkMX(dom,tim=2) Public
    {
    s dom=$zcvt(dom,"l"),$t=0 d OsCmd("nslookup -query=mx "_dom,.data,tim)
    f i=1:1:data s line=$tr($zcvt(data(i),"l"),$c(9)," ") i $p(line,"
    ")=dom, line["mail exchanger" q
    q $t
    }

    // OS-Command
    //
    OsCmd(cmd,&rsp,tim=1) Public
    {
    k rsp s rsp=0, io=$i o cmd:"QRI":5 q:'$t
    while $t { u cmd r ans:tim u io s rsp($i(rsp))=ans }
    c cmd u:io]"" io
    }


    Regards,
    Julius

    --
    An old Windows has old security holes.
    A new Windows has new security holes.
    Another OS has other security holes.
    For safety you must care yourself.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From rtweed@21:1/5 to All on Tue Nov 10 02:19:15 2020
    For my purposes I just want a pattern matching algorithm to determine if an email address is correctly formatted according to the rules. I found this which outlines those rules:

    https://help.returnpath.com/hc/en-us/articles/220560587-What-are-the-rules-for-email-address-syntax-

    If these are correct, then it means a simple use of the ? pattern match is probably not going to be sufficient. Here's my somewhat verbose logic with some tests. Any suggestions/improvements/corrections??

    replace(string,substr,to) ;
    i $zv["GT.M" QUIT $$^%MPIECE(string,substr,to)
    QUIT $replace(string,substr,to)
    ;
    isValidEmail(email) ;
    ;
    n chk,domain,dupFound,i,name,specialChars
    ;
    ; just a single @ ?
    i $l(email,"@")'=2 QUIT 0
    ;
    s name=$p(email,"@",1)
    ; missing name?
    i name="" QUIT 0
    ; starts or ends with a letter or number?
    i $e(name,1)'?1AN QUIT 0
    i $e(name,$l(name))'?1AN QUIT 0
    ; duplicated special characters?
    s specialChars=".!#$%&'*+-/=?^_`{|"
    s dupFound=0
    f i=1:1:$l(specialChars) d q:dupFound
    . n s,ss
    . s s=$e(specialChars,i)
    . s ss=s_s
    . s chk=$$replace(name,ss,s)
    . i chk'=name s dupFound=1
    i dupFound QUIT 0
    ;
    ; any other character than alphas, numbers or special characters?
    s chk=$tr(name,specialChars,"")
    i chk'?1AN.AN QUIT 0
    ;
    s domain=$p(email,"@",2)
    i $e(domain,1)'?1AN QUIT 0
    i $e(domain,$l(domain))'?1AN QUIT 0
    ; missing top-level domain?
    i $l(domain,".")<2 QUIT 0
    ; missing domain name
    i $p(domain,".",1)="" QUIT 0
    ; missing intermediate domain parts
    i $$replace(domain,"..",".")'=domain QUIT 0
    ; invalid characters in domain name?
    s chk=$tr(domain,".-","")
    i chk'?1AN.AN QUIT 0
    ; looks like it's formatted OK
    QUIT 1
    ;
    incorrectTest(email)
    i $$isValidEmail(email) d
    . w email_" was invalid but passed",!
    e d
    . w email_" was correctly rejected",!
    QUIT
    ;
    correctTest(email)
    i $$isValidEmail(email) d
    . w email_" passed OK",!
    e d
    . w email_" was OK but failed",!
    QUIT
    ;
    emailTests ;
    ;
    d correctTest("john.doe@gmail.com")
    d correctTest("john.doe43@domainsample.co.uk")
    d incorrectTest(".doe@gmail.com")
    d incorrectTest("@domainsample.com")
    d incorrectTest("johndoedomainsample.com")
    d incorrectTest("john.doe@.net")
    d incorrectTest("john.doe43@domainsample")
    d correctTest("john.do%e43@domainsample.com")
    d incorrectTest("john.do%%e43@domainsample.com")
    d incorrectTest("john.do%e43@domainsample..com")
    d incorrectTest("john.do%e43@domainsample.com.")
    d incorrectTest("john.do(e43@domainsample.com")
    d incorrectTest("john.doe43@domai%nsample.com")
    ;
    QUIT
    ;

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From rtweed@21:1/5 to All on Tue Nov 10 02:24:58 2020
    Looks like Google Groups mashes up the test email addresses. Here they are with the @ replaced with _at_ to bypass that

    d correctTest("john.doe_at_gmail.com")
    d correctTest("john.doe43_at_domainsample.co.uk")
    d incorrectTest(".doe_at_gmail.com")
    d incorrectTest("_at_domainsample.com")
    d incorrectTest("johndoedomainsample.com")
    d incorrectTest("john.doe_at_.net")
    d incorrectTest("john.doe43_at_domainsample")
    d correctTest("john.do%e43_at_domainsample.com")
    d incorrectTest("john.do%%e43_at_domainsample.com")
    d incorrectTest("john.do%e43_at_domainsample..com")
    d incorrectTest("john.do%e43_at_domainsample.com.")
    d incorrectTest("john.do(e43_at_domainsample.com")
    d incorrectTest("john.doe43_at_domai%nsample.com")

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)