I was looking at a question on stackoverflow about "MySQL history show a
lot of \040" https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377
I know one can change the '\040' for a ' ' by doing:
~/tmp> sed -e 's/\\040/ /g' testfile
test test
But, why not do it using GAWK ?
~/tmp> cat testfile
test\040test
Attempt 1a:
~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
test\ test
Attempt 1b:
~/tmp> gawk -F \\040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile test\040test
Attempt 2:
~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print
$0 }'
test test
Attempt 3:
~/tmp> gawk '{ gsub(/\\040/," "); }1' testfile
test test
In the first attempt (1a) I do see a backslash in the output, but if I
add another backslash (1b) the FS is no longer recognized.
If I change from reading an inputfile (1a) to reading standard input
(2), then the output changes....
My final attempt (3) shows a workaround not using FS
Question:
How can FS be set, so GAWK will split input on the four characters \040 ?
On 25-4-2021 12:00, Janis Papanagnou wrote:
On 25.04.2021 11:28, Luuk wrote:
I was looking at a question on stackoverflow about "MySQL history show a >>> lot of \040"
https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377
I know one can change the '\040' for a ' ' by doing:
~/tmp> sed -e 's/\\040/ /g' testfile
test test
But, why not do it using GAWK ?
Because it's overkill?
~/tmp> cat testfile
test\040test
Attempt 1a:
~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
test\ test
Attempt 1b:
~/tmp> gawk -F \\040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
test\040test
Attempt 2:
~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print
$0 }'
test test
Attempt 3:
~/tmp> gawk '{ gsub(/\\040/," "); }1' testfile
test test
In the first attempt (1a) I do see a backslash in the output, but if I
add another backslash (1b) the FS is no longer recognized.
If I change from reading an inputfile (1a) to reading standard input
(2), then the output changes....
My final attempt (3) shows a workaround not using FS
Question:
How can FS be set, so GAWK will split input on the four characters
\040 ?
gawk -F '\\\\040' '...'
Janis
wow, that helps....
~/tmp> cat testfile | gawk -F '\\\\040' '1'
test\040test
~/tmp> gawk -F '\\\\040' '1' testfile
test\040test
or not?
On 25.04.2021 11:28, Luuk wrote:
I was looking at a question on stackoverflow about "MySQL history show a
lot of \040"
https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377
I know one can change the '\040' for a ' ' by doing:
~/tmp> sed -e 's/\\040/ /g' testfile
test test
But, why not do it using GAWK ?
Because it's overkill?
~/tmp> cat testfile
test\040test
Attempt 1a:
~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
test\ test
Attempt 1b:
~/tmp> gawk -F \\040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
test\040test
Attempt 2:
~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print
$0 }'
test test
Attempt 3:
~/tmp> gawk '{ gsub(/\\040/," "); }1' testfile
test test
In the first attempt (1a) I do see a backslash in the output, but if I
add another backslash (1b) the FS is no longer recognized.
If I change from reading an inputfile (1a) to reading standard input
(2), then the output changes....
My final attempt (3) shows a workaround not using FS
Question:
How can FS be set, so GAWK will split input on the four characters \040 ?
gawk -F '\\\\040' '...'
Janis
On 25-4-2021 12:00, Janis Papanagnou wrote:<cut>
On 25.04.2021 11:28, Luuk wrote:
<cut>I know one can change the '\040' for a ' ' by doing:
~/tmp> sed -e 's/\\040/ /g' testfile
test test
But, why not do it using GAWK ?
Question:gawk -F '\\\\040' '...'
How can FS be set, so GAWK will split input on the four characters \040 ? >>
wow, that helps....
~/tmp> cat testfile | gawk -F '\\\\040' '1'
test\040test
~/tmp> gawk -F '\\\\040' '1' testfile
test\040test
or not?
On 25.04.2021 13:55, Luuk wrote:
On 25-4-2021 12:00, Janis Papanagnou wrote:
On 25.04.2021 11:28, Luuk wrote:
I was looking at a question on stackoverflow about "MySQL history show a >>>> lot of \040"
https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377
I know one can change the '\040' for a ' ' by doing:
~/tmp> sed -e 's/\\040/ /g' testfile
test test
But, why not do it using GAWK ?
Because it's overkill?
~/tmp> cat testfile
test\040test
Attempt 1a:
~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
test\ test
Attempt 1b:
~/tmp> gawk -F \\040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
test\040test
Attempt 2:
~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print >>>> $0 }'
test test
Attempt 3:
~/tmp> gawk '{ gsub(/\\040/," "); }1' testfile
test test
In the first attempt (1a) I do see a backslash in the output, but if I >>>> add another backslash (1b) the FS is no longer recognized.
If I change from reading an inputfile (1a) to reading standard input
(2), then the output changes....
My final attempt (3) shows a workaround not using FS
Question:
How can FS be set, so GAWK will split input on the four characters
\040 ?
gawk -F '\\\\040' '...'
Janis
wow, that helps....
~/tmp> cat testfile | gawk -F '\\\\040' '1'
test\040test
~/tmp> gawk -F '\\\\040' '1' testfile
test\040test
or not?
No. You have to enter your awk code where I wrote '...'.
I omitted that because you just asked for "setting FS".
But note my comment about being [lexically] an overkill.
Your code also changes the input data format when doing
$1=$1, which is most likely undesired behavior.
You'd have to use gsub() or similar to avoid that.
Much easier with sed, since it's only a substitution.
Janis
$1=$1, which is most likely undesired behavior.
Janis
I was looking at a question on stackoverflow about "MySQL history show a
lot of \040" https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377
I know one can change the '\040' for a ' ' by doing:
~/tmp> sed -e 's/\\040/ /g' testfile
test test
But, why not do it using GAWK ?
~/tmp> cat testfile
test\040test
Attempt 1a:
~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
test\ test
On 4/25/2021 4:28 AM, Luuk wrote:
I was looking at a question on stackoverflow about "MySQL history show
a lot of \040"
https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377
I know one can change the '\040' for a ' ' by doing:
~/tmp> sed -e 's/\\040/ /g' testfile
test test
But, why not do it using GAWK ?
~/tmp> cat testfile
test\040test
Attempt 1a:
~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
test\ test
I'm about 95% sure we previously had a discussion on SO about the
importance of quoting strings in shell.
Ed.
On 25-4-2021 14:14, Janis Papanagnou wrote:
On 25.04.2021 13:55, Luuk wrote:
On 25-4-2021 12:00, Janis Papanagnou wrote:
On 25.04.2021 11:28, Luuk wrote:
I was looking at a question on stackoverflow about "MySQL history
show a
lot of \040"
https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377
I know one can change the '\040' for a ' ' by doing:
~/tmp> sed -e 's/\\040/ /g' testfile
test test
But, why not do it using GAWK ?
Because it's overkill?
~/tmp> cat testfile
test\040test
Attempt 1a:
~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
test\ test
Attempt 1b:
~/tmp> gawk -F \\040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
test\040test
Attempt 2:
~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1;
$0 }'
test test
Attempt 3:
~/tmp> gawk '{ gsub(/\\040/," "); }1' testfile
test test
In the first attempt (1a) I do see a backslash in the output, but if I >>>>> add another backslash (1b) the FS is no longer recognized.
If I change from reading an inputfile (1a) to reading standard input >>>>> (2), then the output changes....
My final attempt (3) shows a workaround not using FS
Question:
How can FS be set, so GAWK will split input on the four characters
\040 ?
gawk -F '\\\\040' '...'
Janis
wow, that helps....
~/tmp> cat testfile | gawk -F '\\\\040' '1'
test\040test
~/tmp> gawk -F '\\\\040' '1' testfile
test\040test
or not?
No. You have to enter your awk code where I wrote '...'.
I omitted that because you just asked for "setting FS".
But note my comment about being [lexically] an overkill.
Your code also changes the input data format when doing
$1=$1, which is most likely undesired behavior.
You'd have to use gsub() or similar to avoid that.
Much easier with sed, since it's only a substitution.
Janis
OK, thanks,
I will ignore the comment "Much easier with sed" because:
- I want to do it with (g)AWK, just to learn about it
- I showed how I could do it with sed.
- I am still confused about the difference in output ofr my attempt (1a)
en (2), which I will repeat below:
Attempt 1a:
~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
test\ test
Attempt 2:
~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print
$0 }'
test test
I was looking at a question on stackoverflow about "MySQL history show a
lot of \040" https://stackoverflow.com/questions/67112091/mysql-history-show-a-lot-of-040/67248377#67248377
I know one can change the '\040' for a ' ' by doing:
~/tmp> sed -e 's/\\040/ /g' testfile
test test
But, why not do it using GAWK ?
~/tmp> cat testfile
test\040test
Attempt 1a:
~/tmp> gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile
test\ test
Attempt 1b:
~/tmp> gawk -F \\040 'BEGIN{ OFS=" " }{ $1=$1; print $0 }' testfile test\040test
Attempt 2:
~/tmp> echo test\040test | gawk -F \040 'BEGIN{ OFS=" " }{ $1=$1; print
$0 }'
test test
Attempt 3:
~/tmp> gawk '{ gsub(/\\040/," "); }1' testfile
test test
In the first attempt (1a) I do see a backslash in the output, but if I
add another backslash (1b) the FS is no longer recognized.
If I change from reading an inputfile (1a) to reading standard input
(2), then the output changes....
My final attempt (3) shows a workaround not using FS
Question:
How can FS be set, so GAWK will split input on the four characters \040 ?
I know one can change the '\040' for a ' ' by doing:
~/tmp> sed -e 's/\\040/ /g' testfile
test test
gawk -F '\\\\040' '...'
OK, thanks,
I will ignore the comment "Much easier with sed" because:
- I want to do it with (g)AWK, just to learn about it
On 25.04.2021 14:54, Luuk wrote:
I know one can change the '\040' for a ' ' by doing:
~/tmp> sed -e 's/\\040/ /g' testfile
test test
gawk -F '\\\\040' '...'
OK, thanks,
I will ignore the comment "Much easier with sed" because:
- I want to do it with (g)AWK, just to learn about it
Okay. First; you called your "Attempt 3" as being "workaround".
Actually all your field-separator based approaches (1a/1b/2)
are hacks that work around what sed is doing; a substitution.
You can formulate that substitution also in awk - you actually
did already in your "Attempt 3". That could of course be done
a little bit terser, e.g. by
gawk 'gsub(/\\040/," ")+1'
Janis
You can formulate that substitution also in awk - you actually
did already in your "Attempt 3". That could of course be done
a little bit terser, e.g. by
gawk 'gsub(/\\040/," ")+1'
I had to look up what 'terser' means (I do not speak English very well) terser = Brief and to the point
so, terser would also have to leave out the meaningless '+' sign
Because this does doe the same:
gawk 'gsub(/\\040/," ")1'
On 25.04.2021 21:01, Luuk wrote:<snip>
You can formulate that substitution also in awk - you actually
did already in your "Attempt 3". That could of course be done
a little bit terser, e.g. by
gawk 'gsub(/\\040/," ")+1'
I had to look up what 'terser' means (I do not speak English very well)
terser = Brief and to the point
so, terser would also have to leave out the meaningless '+' sign
It's not meaningless; it makes a possible 0 result always positive.
Because this does doe the same:
gawk 'gsub(/\\040/," ")1'
This saves one character (is lexically terser) but is more complex
than a simple increment due to the implicit type conversions that
are performed and the implicit string concatenation. - YMMV, but
there's anyway not much difference.
My preference [in awk] would anyway be "wasting" yet more characters
gawk 'gsub(/\\040/," ") || 1'
instead of relying on implicit conversions or on the +1, because I
think it's the clearest without an explicit action block.
On 4/25/2021 3:04 PM, Janis Papanagnou wrote:
On 25.04.2021 21:01, Luuk wrote:<snip>
You can formulate that substitution also in awk - you actually
did already in your "Attempt 3". That could of course be done
a little bit terser, e.g. by
gawk 'gsub(/\\040/," ")+1'
I had to look up what 'terser' means (I do not speak English very well)
terser = Brief and to the point
so, terser would also have to leave out the meaningless '+' sign
It's not meaningless; it makes a possible 0 result always positive.
Because this does doe the same:
gawk 'gsub(/\\040/," ")1'
This saves one character (is lexically terser) but is more complex
than a simple increment due to the implicit type conversions that
are performed and the implicit string concatenation. - YMMV, but
there's anyway not much difference.
My preference [in awk] would anyway be "wasting" yet more characters
gawk 'gsub(/\\040/," ") || 1'
instead of relying on implicit conversions or on the +1, because I
think it's the clearest without an explicit action block.
Several people on this forum discussed using the result of an action in
the condition context years ago (maybe 20 years ago?) and IIRC came to
the consensus that:
awk '{gsub(/\\040/," ")} 1'
was the clearest way to write such code.
<snip>
Several people on this forum discussed using the result of an action in
the condition context years ago (maybe 20 years ago?) and IIRC came to
the consensus that:
awk '{gsub(/\\040/," ")} 1'
was the clearest way to write such code.
Ed.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 285 |
Nodes: | 16 (2 / 14) |
Uptime: | 75:51:10 |
Calls: | 6,489 |
Calls today: | 2 |
Files: | 12,096 |
Messages: | 5,276,201 |