# I do not understand why gsub() seems to quadruple scan escaped characters inside
# strings when the number of escaped characters is > 6. See below.
BEGIN{ # quick test of regexpr
# to match path = ...\foo\...
# (path ~ "\\\\foo\\\\") is required
# looking for "\\\\foo\\\\" string constant <==> regexp /\\foo\\/
# in every case below, gsub() returns 2 = number of substitutions (as expected)
x=";foo;"; n=gsub(/;/,"\\",x); printf("# returns %s\n",x) x=";foo;"; n=gsub(/;/,"\\\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\\\\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\\\\\\\",x); printf("# returns %s\n",x) x=";foo;"; n=gsub(/;/,"\\\\\\\\\\\\",x); printf("# returns %s\n",x)
}
# 2 \s returns 1 escaped: \foo\ as expected
# 4 \s returns 2 escaped: \\foo\\ as expected
# 6 \s returns 3 escaped: \\\foo\\\ as expected
# 8 \s returns 2 escaped: \\foo\\ NOT 4 \s
# 10 \s returns 3 escaped: \\\foo\\\ NOT 6 \s
# 12 \s returns 4 escaped: \\\\foo\\\\ ! finally get 4 \s!
# Can anyone explain why 8+ \s are different?
# Why gsub(/;/,"\\\\",x) == gsub(/;/,"\\\\\\\\",x)
Thanks, john
On 29.04.2021 03:37, J Naman wrote:I got smart and thought of a much better solution to the entire problem, presumably in any version awk:
# I do not understand why gsub() seems to quadruple scan escaped characters inside
# strings when the number of escaped characters is > 6. See below.
BEGIN{ # quick test of regexpr
# to match path = ...\foo\...
# (path ~ "\\\\foo\\\\") is required
# looking for "\\\\foo\\\\" string constant <==> regexp /\\foo\\/
# in every case below, gsub() returns 2 = number of substitutions (as expected)
x=";foo;"; n=gsub(/;/,"\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\\\\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\\\\\\\",x); printf("# returns %s\n",x) x=";foo;"; n=gsub(/;/,"\\\\\\\\\\\\",x); printf("# returns %s\n",x)
}
# 2 \s returns 1 escaped: \foo\ as expected
# 4 \s returns 2 escaped: \\foo\\ as expected
# 6 \s returns 3 escaped: \\\foo\\\ as expected
# 8 \s returns 2 escaped: \\foo\\ NOT 4 \s
# 10 \s returns 3 escaped: \\\foo\\\ NOT 6 \s
# 12 \s returns 4 escaped: \\\\foo\\\\ ! finally get 4 \s!
# Can anyone explain why 8+ \s are different?
# Why gsub(/;/,"\\\\",x) == gsub(/;/,"\\\\\\\\",x)
Thanks, john
### gawk ### ### nawk ### ### mawk ###
--- --- ---
Repl: \ Repl: \ Repl: \
Where: \foo Where: \foo Where: \foo
--- --- ---
Repl: \\ Repl: \\ Repl: \\
Where: \\foo Where: \\foo Where: \foo
--- --- ---
Repl: \\\ Repl: \\\ Repl: \\\
Where: \\\foo Where: \\\foo Where: \\foo
--- --- ---
Repl: \\\\ Repl: \\\\ Repl: \\\\
Where: \\foo Where: \\\\foo Where: \\foo
--- --- ---
Repl: \\\\\ Repl: \\\\\ Repl: \\\\\
Where: \\\foo Where: \\\\\foo Where: \\\foo
--- --- ---
Repl: \\\\\\ Repl: \\\\\\ Repl: \\\\\\
Where: \\\\foo Where: \\\\\\foo Where: \\\foo
--- --- ---
Three awks, three different results.
Note: 'Repl' contains the actual pattern modulo the string handling
(i.e. the corresponding string is twice as long, e.g. \\\\ -> \\ ).
Janis
# I do not understand why gsub() seems to quadruple scan escaped characters inside
# strings when the number of escaped characters is > 6. See below.
BEGIN{ # quick test of regexpr
# to match path = ...\foo\...
# (path ~ "\\\\foo\\\\") is required
# looking for "\\\\foo\\\\" string constant <==> regexp /\\foo\\/
# in every case below, gsub() returns 2 = number of substitutions (as expected)
x=";foo;"; n=gsub(/;/,"\\",x); printf("# returns %s\n",x) x=";foo;"; n=gsub(/;/,"\\\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\\\\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\\\\\\\",x); printf("# returns %s\n",x) x=";foo;"; n=gsub(/;/,"\\\\\\\\\\\\",x); printf("# returns %s\n",x)
}
# 2 \s returns 1 escaped: \foo\ as expected
# 4 \s returns 2 escaped: \\foo\\ as expected
# 6 \s returns 3 escaped: \\\foo\\\ as expected
# 8 \s returns 2 escaped: \\foo\\ NOT 4 \s
# 10 \s returns 3 escaped: \\\foo\\\ NOT 6 \s
# 12 \s returns 4 escaped: \\\\foo\\\\ ! finally get 4 \s!
# Can anyone explain why 8+ \s are different?
# Why gsub(/;/,"\\\\",x) == gsub(/;/,"\\\\\\\\",x)
Thanks, john
Now, I have busybox awk, gawk, mawk, nawk, and NetBSD awk installed on my computer, and none of them gives that output.No, sorry, actually mawk does.
On 4/28/2021 8:37 PM, J Naman wrote:
# I do not understand why gsub() seems to quadruple scan escaped characters inside
# strings when the number of escaped characters is > 6. See below.
BEGIN{ # quick test of regexpr
# to match path = ...\foo\...
# (path ~ "\\\\foo\\\\") is required
# looking for "\\\\foo\\\\" string constant <==> regexp /\\foo\\/
# in every case below, gsub() returns 2 = number of substitutions (as expected)
x=";foo;"; n=gsub(/;/,"\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\\\\\",x); printf("# returns %s\n",x)
x=";foo;"; n=gsub(/;/,"\\\\\\\\\\",x); printf("# returns %s\n",x) x=";foo;"; n=gsub(/;/,"\\\\\\\\\\\\",x); printf("# returns %s\n",x)
}
# 2 \s returns 1 escaped: \foo\ as expected
# 4 \s returns 2 escaped: \\foo\\ as expected
# 6 \s returns 3 escaped: \\\foo\\\ as expected
# 8 \s returns 2 escaped: \\foo\\ NOT 4 \s
# 10 \s returns 3 escaped: \\\foo\\\ NOT 6 \s
# 12 \s returns 4 escaped: \\\\foo\\\\ ! finally get 4 \s!
# Can anyone explain why 8+ \s are different?
# Why gsub(/;/,"\\\\",x) == gsub(/;/,"\\\\\\\\",x)
Thanks, john
I'm _guessing_ it's because the string gets interpreted twice, once when
the awk interpreter reads it and then again when it uses it, so for the
2 passes of interpretation, depending on how the "use" phase interprets
pairs of backslashes, we could get:
\\ -> read -> \ -> use -> \
\\\\ -> read -> \\ -> use -> \ or \\
\\\\\\ -> read -> \\\ -> use -> \\\
\\\\\\\\ -> read -> \\\\ -> use -> \\ or \\\\
Now WHY any given awk when using the string would interpret 4
backslashes as 2 but not 2 backslashes as 1, I can't guess.
Regards,
Ed.
On Thursday, April 29, 2021 at 8:33:21 PM UTC+3, Oğuz wrote:
Now, I have busybox awk, gawk, mawk, nawk, and NetBSD awk installed on my computer, and none of them gives that output.No, sorry, actually mawk does.
$ mawk 'BEGIN { x = "y"; sub(/y/, "\\y\\\\y\\\\\\y\\\\\\\\y", x); print x }' \y\y\\y\\y
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 296 |
Nodes: | 16 (2 / 14) |
Uptime: | 53:43:06 |
Calls: | 6,650 |
Calls today: | 2 |
Files: | 12,200 |
Messages: | 5,330,494 |