• Character weirdness in redirection

    From Stan Brown@21:1/5 to All on Thu Feb 16 17:52:38 2023
    Windows 10 Pro version 21H2 OS build 19044.2604

    I entered this command:
    for /d %X in (dv*) do echo %X
    and the result was
    Dvorák, Antonin (1841-1904)
    except with an upside-down caret over the r. That's identical to what
    File Explorer shows.

    Then I redirected the output to a batch file:
    for /d %X in (dv*) do echo cd %X >foo.bat
    I executed foo in the same command window and got "The system cannot
    find the path specified." I then entered
    type foo.bat
    and the response was
    cd Dvorák, Antonin (1841-1904)
    with _no_ accent mark on the r.

    chcp tells me that the active code page is 437, for what relevance
    that may have. I could understand if the r-with-upside-down-caret
    didn't display in the command window, since CP437 doesn't include it <https://en.wikipedia.org/wiki/Code_page_437>. But it did display
    there, only was changed to a regular r in redirection.

    Can someone explain what's going on here? Thanks!

    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From VanguardLH@21:1/5 to Stan Brown on Thu Feb 16 20:36:32 2023
    Stan Brown <the_stan_brown@fastmail.fm> wrote:

    Windows 10 Pro version 21H2 OS build 19044.2604

    I entered this command:
    for /d %X in (dv*) do echo %X
    and the result was
    Dvorák, Antonin (1841-1904)
    except with an upside-down caret over the r. That's identical to what
    File Explorer shows.

    Then I redirected the output to a batch file:
    for /d %X in (dv*) do echo cd %X >foo.bat
    I executed foo in the same command window and got "The system cannot
    find the path specified." I then entered
    type foo.bat
    and the response was
    cd Dvorák, Antonin (1841-1904)
    with _no_ accent mark on the r.

    chcp tells me that the active code page is 437, for what relevance
    that may have. I could understand if the r-with-upside-down-caret
    didn't display in the command window, since CP437 doesn't include it <https://en.wikipedia.org/wiki/Code_page_437>. But it did display
    there, only was changed to a regular r in redirection.

    Can someone explain what's going on here? Thanks!

    Did you format the partition using NTFS?

    https://learn.microsoft.com/en-us/windows/win32/intl/character-sets-used-in-file-names
    "NTFS stores file names in Unicode. In contrast, the older FAT12, FAT16,
    and FAT32 file systems use the OEM character set. For more information,
    see Code Pages."

    Another reason for not finding a path to a file is that the path has
    spaces, but you didn't account for them in your variables. Obviously:

    Dvorák, Antonin (1841-1904)

    has 2 space characters, but the 'cd' command is only going to see the
    first argument (Dvorák) as the folder name. You cannot use 'cd' to concurrently or sequentually open or move to multiple folders. 'cd'
    works on just 1 folder at a time.

    'echo' is displaying the entire string in the x variable. 'cd' only
    sees the first argument as a folder name. You need to double-quote the variable in the 'echo' command, like:

    for /d %X in (dv*) do echo cd "%X" > foo.bat

    However, running the 'for' command in a command shell at the prompt is
    not the same as when running it inside a batch file. You need to double-percent inside the batch file for the interpreter to properly
    parse the command when passed to the shell. So, inside of a .bat file,
    use:

    for /d %%X in (dv*) do echo cd "%%X" > foo.bat

    The first percent sign is the escape character, so the command shell
    will see %x, and then the command interpreter will see %x instead of x.

    Also, while the for-loop is creating an array it stores in x,
    environment variables are addressed by /enclosing/ them in percent
    signs. Try the following in a command shell:

    set x=123
    echo %x
    echo %x%

    The first doesn't work, because you haven't used the percent sign to
    designate an environment variable. A first percent sign is an escape
    character to designate the next percent sign is to signify an
    environment variable. %x% is the environment variable. So, inside the
    .bat file, change to:

    for /d %%x in (dv*) do echo cd "%x%" > foo.bat
    or
    for /d %%x in (dv*) do echo cd "%%x%%" > foo.bat

    This is off the top of my head. I'd have to test to verify %%x is
    needed to identify a variable inside a batch file, and %x% is needed to identify an environment variable. Possibly you have to change the 'cd'
    arg to "%%x%" for the first percent to escape the second percent, so
    when the command gets parsed and sent to the command intpreter, that
    will see it as "%x%" to know %x% in an environment variable.

    'for' generates a list. Each item in that list is piped into the
    environment variable, but percent is an escape character inside a batch
    file, and why you have to double them where you want a single percent
    sign. For each item in the list generated by 'for', and copied into the
    X environment variable, the do-statement gets executed, but again you
    likely need double precent signs, so the first one escapes the second
    one, and the second survives parsing to pass the string to the command interpreter.

    Bascially, in a command line you run at the prompt in a command shell,
    enclose strings in double-quotes in a batch file to account for possible
    space characters in the strings (and your's can have 2, or maybe more).
    Also double the percent signs since, in a batch file, a percent sign is
    the escape character applied to the next character. At the command
    line, you can use %, but inside a batch you need to use %%. After all,
    to allow escaping of character (to represent special characters), some character had to be chosen as the escape character.

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to All on Thu Feb 16 21:48:46 2023
    Thanks, Vanguard, for taking as much trouble as you did. I'm terribly
    sorry -- I do know about quoting filenames or paths that contain
    spaces, but I had been messing around with this all afternoon,
    getting more and more frustrated, and when I finally decided to post,
    in trying to simplify things I got careless. I apologize!

    I don't actually use cmd.exe very much -- I've been using TCC or
    TCCLE from JPsoft for decades. But I posted all of my examples using
    cmd not TCCLE -- to make sure some quirk in TCCLE wasn't responsible.
    error. And I copy/pasted from the cmd.exe window, not retyping
    anything.

    Yes, it's an NTFS partition (d:). I believe that the name as
    displayed in File Explorer is indeed Unicode, since the r-upside-
    down-caret is not in characters 0-255 of the Windows character set
    1252 or the older dos set 437. (I looked at a character table of each
    one, and didn't see it. That doesn't necessarily mean it's not there,
    but I sure didn't see it.)

    So let me do it right this time:

    Typing
    for /d %X in (dv*) do echo cd "%X"
    on the command line displays
    cd "Dvorák, Antonin (1841-1904)"
    but the /r/ has an upside-down caret on it.
    That's cd "%X" not cd "%X%" -- when I tried the latter on the command
    line I got
    cd "Dvorák, Antonin (1841-1904)%"
    i.e., an extra % after the directory name. I'm guessing that the
    variable named in the for command is special in this regard.
    for /d %X in (dv*) do echo cd "%X" >foo.bat
    displays
    echo cd "Dvorák, Antonin (1841-1904)" 1>foo.bat
    _with_ the upside-down-careted /r/
    and the command "type foo.bat" displays
    cd "Dvorák, Antonin (1841-1904)"
    with no caret on the /r/. Executing /foo/ displays
    cd "Dvorák, Antonin (1841-1904)"
    The system cannot find the path specified.
    (with no caret on the /r/).

    I do understand when I go to put the for command in a batch file, it
    will have to use "for %%x" etc., but I'm trying to solve this by
    divide and conquer.

    However, now that you've set me straight on 8.3 shortnames in my
    other thread, I think that's really the answer. Shortnames won't use
    special characters or contain spaces, so they shouldn't pose a
    problem for robocopy. And my car doesn't care how the folders on the
    USB stick are named, so I can just use the shortnames as the _only_
    folder names on my USB stick's exFAT volume. (The car doesn't
    recognize NTFS formatting, which is no surprise.)

    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From VanguardLH@21:1/5 to Stan Brown on Fri Feb 17 00:45:46 2023
    Stan Brown <the_stan_brown@fastmail.fm> wrote:

    Thanks, Vanguard, for taking as much trouble as you did. I'm terribly
    sorry -- I do know about quoting filenames or paths that contain
    spaces, but I had been messing around with this all afternoon,
    getting more and more frustrated, and when I finally decided to post,
    in trying to simplify things I got careless. I apologize!

    I don't actually use cmd.exe very much -- I've been using TCC or
    TCCLE from JPsoft for decades. But I posted all of my examples using
    cmd not TCCLE -- to make sure some quirk in TCCLE wasn't responsible.
    error. And I copy/pasted from the cmd.exe window, not retyping
    anything.

    Yes, it's an NTFS partition (d:). I believe that the name as
    displayed in File Explorer is indeed Unicode, since the r-upside-
    down-caret is not in characters 0-255 of the Windows character set
    1252 or the older dos set 437. (I looked at a character table of each
    one, and didn't see it. That doesn't necessarily mean it's not there,
    but I sure didn't see it.)

    So let me do it right this time:

    Typing
    for /d %X in (dv*) do echo cd "%X"
    on the command line displays
    cd "Dvorák, Antonin (1841-1904)"
    but the /r/ has an upside-down caret on it.
    That's cd "%X" not cd "%X%" -- when I tried the latter on the command
    line I got
    cd "Dvorák, Antonin (1841-1904)%"
    i.e., an extra % after the directory name. I'm guessing that the
    variable named in the for command is special in this regard.
    for /d %X in (dv*) do echo cd "%X" >foo.bat
    displays
    echo cd "Dvorák, Antonin (1841-1904)" 1>foo.bat
    _with_ the upside-down-careted /r/
    and the command "type foo.bat" displays
    cd "Dvorák, Antonin (1841-1904)"
    with no caret on the /r/. Executing /foo/ displays
    cd "Dvorák, Antonin (1841-1904)"
    The system cannot find the path specified.
    (with no caret on the /r/).

    I do understand when I go to put the for command in a batch file, it
    will have to use "for %%x" etc., but I'm trying to solve this by
    divide and conquer.

    However, now that you've set me straight on 8.3 shortnames in my
    other thread, I think that's really the answer. Shortnames won't use
    special characters or contain spaces, so they shouldn't pose a
    problem for robocopy. And my car doesn't care how the folders on the
    USB stick are named, so I can just use the shortnames as the _only_
    folder names on my USB stick's exFAT volume. (The car doesn't
    recognize NTFS formatting, which is no surprise.)

    I simplified all that a lot further since I suspected there was a
    problem storing some accented or diacritical characters into an ASCII
    text file.

    echo "dvořák"
    "dvořák"

    echo "dvořák" > testme.bat
    type testme.bat
    "dvorák"

    The breve-r is not getting piped into the text file. Besides using the
    'type' command to show what was inside the .txt file, I use HexEdit to
    look inside the file. What I see in binary is:

    "dvorák"

    shows as binary:

    22 64 76 6F 72 A0 6B 22 (hex)
    " d v o r á K "

    So breve-r is not getting piped into the .txt file. While accent-a is
    in the ASCII-8 character set, breve-r is not. Text files support ASCII8
    (aka ANSI). They are not rich-text files that can support Unicode.
    Changing the target file's extension (to where you are piping the output
    of the 'echo' command) to .rtf does not alter the piping is text, not
    Unicode.

    Looks like the problem stems with stdout piping into a file.

    Ah, looks like I found the problem. The command shell defaults to the
    437 code page. You need to change to the 65001 code page for UTF-8
    support. Run:

    chcp 65001
    echo "dvořák" > testme.bat
    type testme.bat
    "dvořák"

    With a change in the code page for the command shell, you can get it
    support UTF-8 when piping stdout to a file.

    https://ss64.com/nt/chcp.html

    Doesn't list all code pages, but probably the most used, including
    UTF-8. You can find more code pages listed at:

    https://en.wikipedia.org/wiki/Code_page#IBM_code_pages

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Stan Brown@21:1/5 to VanguardLH on Fri Feb 17 10:07:00 2023
    On Fri, 17 Feb 2023 00:45:46 -0600, VanguardLH wrote:
    Ah, looks like I found the problem. The command shell defaults to the
    437 code page. You need to change to the 65001 code page for UTF-8
    support. Run:

    chcp 65001
    echo "dvorák" > testme.bat
    type testme.bat
    "dvorák"

    First, thank you again for your help in investigating this. I had
    codepage 1242 in effect, but hadn't thought to mention it because I
    thought (wrongly) that the codepage would affect only displays, not pass-throughs.

    Both in cmd.exe and with TCCLE, after chcp 65001 I entered the
    command
    dir dv*;do*;fu*|more
    and got Dobrzynski, Dvorák, and Fucik, with the acute accent on the n
    and the ?breve on the r and c. Piping those into a batch file, with
    quotes around them, and then editing the batch file as UTF-8, they
    echoed correctly and a CD command on Dvorák worked.

    Now I'm torn. (a) It would be a little effort to set up shortnames
    for the composer folders on the d: drive, but that's a one-time thing
    and any new composers will be set up automatically by Windows. In the
    past I've used shortnames to get around character-set problems, so I
    know that will work.
    (b) On the other hand, the codepage solution is less work to set up,
    but I haven't tested it end-to-end yet. For instance, I'll have to
    edit my data file in UTF-8 also, and I have to find a different grep
    to search for matches in the data file, since mine handles only 8-bit characters.

    But now instead of a problem with no solution, I have one with two
    solutions, and that's great progress -- thanks to your help. I am
    very grateful for your time and trouble.

    --
    Stan Brown, Tehachapi, California, USA https://BrownMath.com/
    Shikata ga nai...

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)