• Re: Parsing filenames

    From none) (albert@21:1/5 to dxforth@gmail.com on Sun Jan 28 14:49:03 2024
    In article <up5c0q$3s5s7$1@dont-email.me>, dxf <dxforth@gmail.com> wrote: >This may interest CLI app writers. To make users' life easier I apply default >extensions to source and destination filenames if either wasn't specified.

    \ Parse filenames
    :noname ( -- )
    argv 0= if help then \ no args
    s" SCR" +ext 2dup ifile !fname \ got 1st apply default src ext
    -path -ext \ keep body
    argv if \ got 2nd
    2dup -ext nip while \ has body
    2nip then s" LST" \ default dest ext
    else 1 /string then +ext \ trim '.' from user ext
    ofile !fname ; is parsefn


    \\ Samples

    foo. --> foo. foo.lst
    foo --> foo.scr foo.lst
    foo bar. --> foo.scr bar.
    foo bar --> foo.scr bar.lst
    foo .prn --> foo.scr foo.prn
    foo. . --> foo. foo. (same: caught later)


    ARGV ( -- a u -1 | 0 ) get next blank delimited argument
    +EXT ( a1 u1 a2 u2 -- a3 u3 ) append ext a2/u2 to fname if '.' not present >-EXT ( a1 u1 -- a1 u2 ) discard extension from filename
    -PATH ( a1 u1 -- a2 u2 ) discard path from filename


    Let's have ARG[] that isolates the nth argument and the usual
    $! $@ $/ @+! $\ $+C wordset

    `lina -c /tmp/aap.frt' now creates `/tmp/aap' as executable.
    1 ARG[] &. $\ 2DROP PAD $! ( or use it as is)

    `wina -c /tmp/aap.frt' now creates `/tmp/aap.EXE' as executable.
    1 ARG[] &. $\ 2DROP "EXE" PAD $+! PAD $!

    No need to invent a special parser.

    After 30 years it doesn't sink how useful the universal parser $/ is.
    ( $\ works from the other side. )
    sc is a string constant, i.e. (add len ) pair on the stack.

    $/ ( "dollar slash" )

    STACKEFFECT: sc c --- sc1 sc2

    DESCRIPTION:

    Find the first c in the string constant sc and split it at that address.
    Return the strings after and before c into sc1 and sc2 respectively.
    If the character is not present sc1 is a null string (its address is zero) and sc2 is the original string.
    Both sc1 and sc2 may be empty strings (i.e. their count is zero),
    if c is the last or first character in sc .

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From mhx@21:1/5 to none on Sun Jan 28 17:41:16 2024
    none wrote:
    [..]
    $/ ( "dollar slash" )

    STACKEFFECT: sc c --- sc1 sc2

    DESCRIPTION:

    Find the first c in the string constant sc and split it at that address. Return the strings after and before c into sc1 and sc2 respectively.
    If the character is not present sc1 is a null string (its address is zero) and
    sc2 is the original string.
    Both sc1 and sc2 may be empty strings (i.e. their count is zero),
    if c is the last or first character in sc .

    I use Wil Baden's Split-At-Char
    FORTH> locate Split-At-Char
    File: d:\dfwforth/include/miscutil.frt
    1323:
    1324: -- Wil Baden
    1325: -- Right string starts with delimiter
    1326>> : Split-At-Char ( addr1 n1 char -- addr1 n2 addr1+n2 n1-n2 )

    Note that the delimiter is NOT deleted: it is at the front of the right
    string. This is a nuisance sometimes (even more for Split-At-LastChar).

    Your method returns two empty strings for the cases where I would expect
    the original to be returned (sc1 when the delimiter is the last char,
    sc2 when the delimiter is the first char). Therefore a copy must be made
    before the split if more processing follows?

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From minforth@21:1/5 to All on Sun Jan 28 18:35:45 2024
    The question is also whether one needs a cascading file search function
    that follows a PATH environment variable, or whether the current file
    path is just the current work directory cwd. As always, it depends.

    An automatic default suffix addition can be convenient, but can also
    lead to unforeseen errors. Chacun a son goût ...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to mhx on Sun Jan 28 20:29:37 2024
    In article <034a90a368ced16e52910f82e511b178@www.novabbs.com>,
    mhx <mhx@iae.nl> wrote:
    none wrote:
    [..]
    $/ ( "dollar slash" )

    STACKEFFECT: sc c --- sc1 sc2

    DESCRIPTION:

    Find the first c in the string constant sc and split it at that address.
    Return the strings after and before c into sc1 and sc2 respectively.
    If the character is not present sc1 is a null string (its address is zero) and
    sc2 is the original string.
    Both sc1 and sc2 may be empty strings (i.e. their count is zero),
    if c is the last or first character in sc .

    I use Wil Baden's Split-At-Char
    FORTH> locate Split-At-Char
    File: d:\dfwforth/include/miscutil.frt
    1323:
    1324: -- Wil Baden
    1325: -- Right string starts with delimiter
    1326>> : Split-At-Char ( addr1 n1 char -- addr1 n2 addr1+n2 n1-n2 )

    Note that the delimiter is NOT deleted: it is at the front of the right >string. This is a nuisance sometimes (even more for Split-At-LastChar).

    Your method returns two empty strings for the cases where I would expect
    the original to be returned (sc1 when the delimiter is the last char,
    sc2 when the delimiter is the first char). Therefore a copy must be made >before the split if more processing follows?

    You mean
    "." &. $/ &N EMIT TYPE &N EMIT TYPE &N EMIT
    NNN OK
    That is totally within specs.
    I can't remember or think of a case that I wanted to have the
    delimiter present in the strings.
    A most useful aspect of $/ is that it makes a distinction between
    an empty string, and a null string:

    "yield.frt" GET-FILE
    OK
    BEGIN ^J $/ OVER WHILE TYPE CR REPEAT
    \ For N and HINT return FACTOR >= hint, maybe n. NOT INLINE!
    : FACTOR BEGIN 2DUP /MOD SWAP 0= IF DROP NIP EXIT THEN
    OVER < IF DROP EXIT THEN 1+ 1 OR AGAIN ;

    \ For N return: "It IS prime" ( Cases 0 1 return FALSE)
    : PRIME? DUP 4 < IF 1 > ELSE DUP 2 FACTOR = THEN ;

    \ Generator: (next-prime) gives the primes in sequence.
    : (next-prime) CREATE 0 , DOES> BEGIN 1 OVER +! DUP @ PRIME? UNTIL @ ; (next-prime) next-prime
    : doit 10,000 1 DO next-prime I . . CR LOOP next-prime . ;

    "yield.frt" GET-FILE
    OK

    Empty lines are handled ok.
    `DUP WHILE' would stop at the first empty line.

    (a 0) is an empty string
    (0 n) is a null string


    -marcel
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From mhx@21:1/5 to dxf on Mon Jan 29 08:47:10 2024
    dxf wrote:

    Using $\ to split the extension from the filename is not without quirks.
    If '.' is not present it thinks the whole filename is the extension.
    My parser also requires a '.' in the filename; however that's to prevent
    it auto-appending the default extension.

    A minor issue is that $\ "string-split" is the wrong name for the word
    as it does more than splitting: it also removes the separator.

    That aside, I was curious if removing the separator was more convenient
    than leaving it in. I examined my usage and in many cases there is a
    check for 0-length and in many others the separator is removed
    (problematic because a 0-string may result).

    Apart from the separator being first or last, there are also the cases
    that the separator appears more than once: "a/pe/kool" or is duplicated "//apekool" (where the separator is '/').

    -marcel

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to mhx on Mon Jan 29 10:55:03 2024
    In article <42e88e6a6e9a549be0f92e2a1217d500@www.novabbs.com>,
    mhx <mhx@iae.nl> wrote:
    dxf wrote:

    Using $\ to split the extension from the filename is not without quirks.
    If '.' is not present it thinks the whole filename is the extension.
    My parser also requires a '.' in the filename; however that's to prevent
    it auto-appending the default extension.

    A minor issue is that $\ "string-split" is the wrong name for the word
    as it does more than splitting: it also removes the separator.

    Splitting hairs are we? Tell that to Python/lisp users.
    python
    Python 2.7.18 (default, Jul 1 2022, 12:27:04)
    ..
    txt = "apple#banana#cherry#orange"
    x = txt.split("#")
    print(x)
    ['apple', 'banana', 'cherry', 'orange']

    Also I named $/ "string-slash"


    That aside, I was curious if removing the separator was more convenient
    than leaving it in. I examined my usage and in many cases there is a
    check for 0-length and in many others the separator is removed
    (problematic because a 0-string may result).
    Please stick to the terminology introduced. There is a distrinction
    between an null-string and an empty string.
    Either way the tests are easy:
    OVER IF : result was a null-string (probably you must terminate now)
    DUP IF : result was an empty string, whenever that was a special case.


    Apart from the separator being first or last, there are also the cases
    that the separator appears more than once: "a/pe/kool" or is duplicated >"//apekool" (where the separator is '/').
    A separator can separate empty strings. Read the specification carefully.
    An empty string can result if the first or last character is the separator.
    $/ does only one separation at the time.
    I hope not that you want to use '/' as as separator and an escape at the
    same time.


    -marcel
    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to dxforth@gmail.com on Mon Jan 29 10:59:38 2024
    In article <up6t35$4u0i$1@dont-email.me>, dxf <dxforth@gmail.com> wrote:
    On 29/01/2024 12:49 am, albert wrote:
    ...
    Let's have ARG[] that isolates the nth argument and the usual
    $! $@ $/ @+! $\ $+C wordset

    `lina -c /tmp/aap.frt' now creates `/tmp/aap' as executable.
    1 ARG[] &. $\ 2DROP PAD $! ( or use it as is)

    `wina -c /tmp/aap.frt' now creates `/tmp/aap.EXE' as executable.
    1 ARG[] &. $\ 2DROP "EXE" PAD $+! PAD $!

    No need to invent a special parser.

    Here's a real world command-line. Please provide your code for
    processing it.

    BLK2TXT version 1.5

    Use: BLK2TXT [-opt] file[.SCR] [file[.LST]]

    Even this wasn't sufficiently convenient for me as an end-user,
    hence my post.

    I'm perfectly willing to write the code, as long as you specify
    what has to be done.

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to dxforth@gmail.com on Wed Jan 31 12:05:01 2024
    In article <up9jrn$m02m$1@dont-email.me>, dxf <dxforth@gmail.com> wrote:
    On 29/01/2024 8:59 pm, albert@spenarnc.xs4all.nl wrote:
    In article <up6t35$4u0i$1@dont-email.me>, dxf <dxforth@gmail.com> wrote: >>> On 29/01/2024 12:49 am, albert wrote:
    ...
    Let's have ARG[] that isolates the nth argument and the usual
    $! $@ $/ @+! $\ $+C wordset

    `lina -c /tmp/aap.frt' now creates `/tmp/aap' as executable.
    1 ARG[] &. $\ 2DROP PAD $! ( or use it as is)

    `wina -c /tmp/aap.frt' now creates `/tmp/aap.EXE' as executable.
    1 ARG[] &. $\ 2DROP "EXE" PAD $+! PAD $!

    No need to invent a special parser.

    Here's a real world command-line. Please provide your code for
    processing it.

    BLK2TXT version 1.5

    Use: BLK2TXT [-opt] file[.SCR] [file[.LST]]

    Even this wasn't sufficiently convenient for me as an end-user,
    hence my post.

    I'm perfectly willing to write the code, as long as you specify
    what has to be done.

    What is unclear - an arbitrary number of switches typically followed
    by one or two filenames. It's the basis of many a CLI application.
    You say it's easily done using ARG[] and a few string operators. I
    don't really see how but if someone has practical working code that >demonstrates, please do.

    The first thing to do is insert an intermediate abstraction layer,
    borrowed from c.
    This ensures that everything past the first layer is portable
    between Linux , MS-Windows or whatever os you want to use.
    Note that splitting strings is handled in this layer.
    We must admit that the API designs of c is brilliant;
    Forth essentially copied the whole file wordset from c.

    ( n) ARG[] : a string containing the nth argument
    SHIFT-ARGS : consume the first argument
    ARGC : the remaining number of arguments
    I have handled the conversion from AAP.FRT to AAP.EXE before.
    Once more SRC>EXEC is favoured that is the same API but different implementation between OSses.

    \--------------------- examples, not tested -------------
    \ input file follow
    : -i SHIFT-ARGS 1 ARG[] filein $! SHIFT-ARGS ;
    \ output file follow
    : -o SHIFT-ARGS 1 ARG[] fileout $! SHIFT-ARGS ;
    \ two files follow
    : -io SHIFT-ARGS 1 ARG[] filein $! SHIFT-ARGS
    1 ARG[] fileout $! SHIFT-ARGS ;
    : -n SHIFT-ARGS 1 ARG[] EVALUATE n ! SHIFT-ARGS ;
    : -h USAGE $@ TYPE BYE ;
    : -p SHIFT-ARGS 1 ARG[] SHIFT-ARGS
    BASE @ >R HEX EVALUATE R> BASE !
    port ! ;
    \ -N handles a collating or not collating argument
    \ This relies on the PREFIX word, present in ciforth.
    \ -N 12 / -N12
    : -N 1 ARG[]
    DUP 2 = IF
    SHIFT-ARGS 1 ARG[]
    ELSE
    2 /STRING
    THEN
    EVALUATE N ! SHIFT-ARGS
    ; PREFIX

    : doit BEGIN ARGC 1 > WHILE
    1 ARG[] OVER C@ &- = IF EVALUATE ELSE
    ABORT" invalid input"
    THEN
    now-get-on-with-it
    ;

    Note that program's can be invoked like this (linux example)

    twinprimecounting `10 12 **`
    counting twin primes under the boundary 1,000,000,000,000 .

    Invalid options result in
    -o ? ciforth ERROR # 10 : NOT A WORD, NOR A NUMBER OR OTHER DENOTATION

    I do not suggest that your approach is not usable,
    merely that this is better.

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From albert@spenarnc.xs4all.nl@21:1/5 to dxforth@gmail.com on Thu Feb 1 12:50:30 2024
    In article <updm0b$1iet4$1@dont-email.me>, dxf <dxforth@gmail.com> wrote:
    On 31/01/2024 10:05 pm, albert@spenarnc.xs4all.nl wrote:
    In article <up9jrn$m02m$1@dont-email.me>, dxf <dxforth@gmail.com> wrote: >>> On 29/01/2024 8:59 pm, albert@spenarnc.xs4all.nl wrote:
    ...
    I'm perfectly willing to write the code, as long as you specify
    what has to be done.

    What is unclear - an arbitrary number of switches typically followed
    by one or two filenames. It's the basis of many a CLI application.
    You say it's easily done using ARG[] and a few string operators. I
    don't really see how but if someone has practical working code that
    demonstrates, please do.

    The first thing to do is insert an intermediate abstraction layer,
    borrowed from c.
    This ensures that everything past the first layer is portable
    between Linux , MS-Windows or whatever os you want to use.
    Note that splitting strings is handled in this layer.
    We must admit that the API designs of c is brilliant;
    Forth essentially copied the whole file wordset from c.

    ( n) ARG[] : a string containing the nth argument
    SHIFT-ARGS : consume the first argument
    ARGC : the remaining number of arguments
    I have handled the conversion from AAP.FRT to AAP.EXE before.
    Once more SRC>EXEC is favoured that is the same API but different
    implementation between OSses.

    \--------------------- examples, not tested -------------
    \ input file follow
    : -i SHIFT-ARGS 1 ARG[] filein $! SHIFT-ARGS ;
    \ output file follow
    : -o SHIFT-ARGS 1 ARG[] fileout $! SHIFT-ARGS ;
    \ two files follow
    : -io SHIFT-ARGS 1 ARG[] filein $! SHIFT-ARGS
    1 ARG[] fileout $! SHIFT-ARGS ;
    : -n SHIFT-ARGS 1 ARG[] EVALUATE n ! SHIFT-ARGS ;
    : -h USAGE $@ TYPE BYE ;
    : -p SHIFT-ARGS 1 ARG[] SHIFT-ARGS
    BASE @ >R HEX EVALUATE R> BASE !
    port ! ;
    \ -N handles a collating or not collating argument
    \ This relies on the PREFIX word, present in ciforth.
    \ -N 12 / -N12
    : -N 1 ARG[]
    DUP 2 = IF
    SHIFT-ARGS 1 ARG[]
    ELSE
    2 /STRING
    THEN
    EVALUATE N ! SHIFT-ARGS
    ; PREFIX

    : doit BEGIN ARGC 1 > WHILE
    1 ARG[] OVER C@ &- = IF EVALUATE ELSE
    ABORT" invalid input"
    THEN
    now-get-on-with-it
    ;

    Note that program's can be invoked like this (linux example)

    twinprimecounting `10 12 **`
    counting twin primes under the boundary 1,000,000,000,000 .

    Invalid options result in
    -o ? ciforth ERROR # 10 : NOT A WORD, NOR A NUMBER OR OTHER DENOTATION

    I do not suggest that your approach is not usable,
    merely that this is better.

    It's better because it uses the forth interpreter and evaluate? I don't
    know how C does it, but surely it's not that?

    As opposed to forth, C is forced to write a mini interpreter for
    command line arguments, especially where they are complicated.

    What to think of this (gnu image manipulation program):

    gimp [-h] [--help] [--help-all] [--help-gtk] [-v] [--version] [--li-
    cense] [--verbose] [-n] [--new-instance] [-a] [--as-new] [-i]
    [--no-in- terface] [-d] [--no-data] [-f] [--no-fonts] [-s]
    [--no-splash] [--no-shm] [--no-cpu-accel] [--display display]
    [--session <name>] [-g] [--gimprc <gimprc>] [--system-gimprc <gimprc>] [--dump-gimprc] [--con- sole-messages] [--debug-handlers]
    [--stack-trace-mode <mode>] [--pdb-compat-mode <mode>]
    [--batch-interpreter <procedure>] [-b] [--batch <command>] [filename]
    ...

    ciforth itself uses a simpler approach even
    lina [ -aehmrv ]
    lina -c <source>
    lina -d [<source>]
    lina -f [ forth code ]
    lina -g <N> <binary-path>
    lina -i <binary-path> <library-path> [<shell-path>]
    lina -l <library> [ params ]
    lina -s <script> [ params ]

    Each letter 'x' loads a screen ^x and the options are separated out
    from the first line in COLD. Then the screen takes it from there.
    e.g. `` lina -c aap.frt '' compiles a program.

    Groetjes Albert
    --
    Don't praise the day before the evening. One swallow doesn't make spring.
    You must not say "hey" before you have crossed the bridge. Don't sell the
    hide of the bear until you shot it. Better one bird in the hand than ten in
    the air. First gain is a cat purring. - the Wise from Antrim -

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)