On 01/08/2021 17:02, Java Jive wrote:
snip <
WTF is Java.Jive@f1.n221.z2.fidonet.fi and why is he duplicating my
posts here?
I want to clean this up so that only the first and last of each
section are output, separated by a single line containing just '...'.
Can anyone suggest a way of doing this by piping the output through
awk or sed on the fly, rather than having to write a program to
post-process the index?
snip <
I have an archive of scanned documents which I need to index. A typical sample output of ls is appended. I want to clean this up so that only
the first and last of each section are output, separated by a single
line containing just '...'. Can anyone suggest a way of doing this by
piping the output through awk or sed on the fly, rather than having to
write a program to post-process the index?
Desired:
Family History/Unknown/Unknown Person's Notebook:
Unknown Person's Notebook - 01.png
...
Unknown Person's Notebook - 33.png
Unknown Person's Notebook - End 0.png
...
Unknown Person's Notebook - End 5.png
I have an archive of scanned documents which I need to index. A typical sample output of ls is appended. I want to clean this up so that only
the first and last of each section are output, separated by a single
line containing just '...'. Can anyone suggest a way of doing this by
piping the output through awk or sed on the fly, rather than having to
write a program to post-process the index?
Desired:
Family History/Unknown/Unknown Person's Notebook:
Unknown Person's Notebook - 01.png ....
Unknown Person's Notebook - 33.png Unknown Person's Notebook - End 0.png
....
Unknown Person's Notebook - End 5.png Unknown Person's Notebook - Insert 00a.png Unknown Person's Notebook - Insert 00b.png Unknown Person's
Notebook - Insert 01.png Unknown Person's Notebook - Insert 02a - Sketch
Of Monument, Dekklan, India.png Unknown Person's Notebook - Insert 02b - Sketch Of Monument & Outcrop, Dekklan, India.png Unknown Person's
Notebook - Insert 03 - 17800208.png Unknown Person's Notebook - Insert
04.png Unknown Person's Notebook - Insert 05.png Unknown Person's
Notebook - Insert 06 - Sketch Of Crocodile.png Unknown Person's Notebook
- Insert 07a.png Unknown Person's Notebook - Insert 07b.png Unknown
Person's Notebook - Insert 08 - Sketch Of Boat.png Unknown Person's
Notebook - Insert 09a - Sketch Of Building.png Unknown Person's Notebook
- Insert 09b - Fragment Of Writing.png Unknown Person's Notebook -
Insert 10.png Unknown Person's Notebook - Insert 11a.png Unknown
Person's Notebook - Insert 11b - Fragment Of Writing.png Unknown
Person's Notebook - Insert 12.png Unknown Person's Notebook - Insert 13
- Sketch Of Bird.png Unknown Person's Notebook - Insert 14a - Sketch Of Ancient Ruins.png Unknown Person's Notebook - Insert 14b - Sketch Of
Ancient Building (partly completed).png Unknown Person's Notebook -
Insert 15a - ''La Poèsie didactique des Hébreu' - 1.png ....
Unknown Person's Notebook - Insert 15a - ''La Poèsie didactique des
Hébreu' - 6.png Unknown Person's Notebook - Insert 15b - Genealogy Of
Job - 1.png Unknown Person's Notebook - Insert 15b - Genealogy Of Job -
2.png Unknown Person's Notebook.txt
Original output for ls -1pr <etc>
Family History/Unknown/Unknown Person's Notebook:
Unknown Person's Notebook - 01.png Unknown Person's Notebook - 02.png
Unknown Person's Notebook - 03.png Unknown Person's Notebook - 04.png
Unknown Person's Notebook - 05.png Unknown Person's Notebook - 06.png
Java Jive <java@evij.com.invalid> writes:
On 01/08/2021 17:02, Java Jive wrote:
snip <
WTF is Java.Jive@f1.n221.z2.fidonet.fi and why is he duplicating my
posts here?
Someone is running a broken fido/usenet gateway.
Injection-Info: gioia.aioe.org; logging-data="10598"; posting-host="F7FIqN6dkowTZ1CLxZIWTQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
They’re injecting via aioe.org, so I guess complaints to there.
I have an archive of scanned documents which I need to index. A typical sample output of ls is appended. I want to clean this up so that only
the first and last of each section are output, separated by a single
line containing just '...'. Can anyone suggest a way of doing this by piping the output through awk or sed on the fly, rather than having to
write a program to post-process the index?
Desired:
Family History/Unknown/Unknown Person's Notebook:
Unknown Person's Notebook - 01.png
....
Unknown Person's Notebook - 33.png
Unknown Person's Notebook - End 0.png
....
Unknown Person's Notebook - End 5.png
Unknown Person's Notebook - Insert 00a.png
Unknown Person's Notebook - Insert 00b.png
Unknown Person's Notebook - Insert 01.png
Unknown Person's Notebook - Insert 02a - Sketch Of Monument, Dekklan, India.png
Unknown Person's Notebook - Insert 02b - Sketch Of Monument & Outcrop, Dekklan, India.png
Unknown Person's Notebook - Insert 03 - 17800208.png
Unknown Person's Notebook - Insert 04.png
Unknown Person's Notebook - Insert 05.png
Unknown Person's Notebook - Insert 06 - Sketch Of Crocodile.png
Unknown Person's Notebook - Insert 07a.png
Unknown Person's Notebook - Insert 07b.png
Unknown Person's Notebook - Insert 08 - Sketch Of Boat.png
Unknown Person's Notebook - Insert 09a - Sketch Of Building.png
Unknown Person's Notebook - Insert 09b - Fragment Of Writing.png
Unknown Person's Notebook - Insert 10.png
Unknown Person's Notebook - Insert 11a.png
Unknown Person's Notebook - Insert 11b - Fragment Of Writing.png
Unknown Person's Notebook - Insert 12.png
Unknown Person's Notebook - Insert 13 - Sketch Of Bird.png
Unknown Person's Notebook - Insert 14a - Sketch Of Ancient Ruins.png
Unknown Person's Notebook - Insert 14b - Sketch Of Ancient Building
(partly completed).png
Unknown Person's Notebook - Insert 15a - ''La Poèsie didactique des
Hébreu' - 1.png
....
Unknown Person's Notebook - Insert 15a - ''La Poèsie didactique des
Hébreu' - 6.png
Unknown Person's Notebook - Insert 15b - Genealogy Of Job - 1.png
Unknown Person's Notebook - Insert 15b - Genealogy Of Job - 2.png
Unknown Person's Notebook.txt
Java Jive wrote:
I have an archive of scanned documents which I need to index. A typical
sample output of ls is appended. I want to clean this up so that only
the first and last of each section are output, separated by a single
line containing just '...'. Can anyone suggest a way of doing this by
piping the output through awk or sed on the fly, rather than having to
write a program to post-process the index?
Desired:
Family History/Unknown/Unknown Person's Notebook:
Unknown Person's Notebook - 01.png
...
Unknown Person's Notebook - 33.png
Unknown Person's Notebook - End 0.png
...
Unknown Person's Notebook - End 5.png
Awk (Gawk) has the ability to store things in arrays.
For example, in awk, I can reverse the order of lines in
a text file. A file with lines 1..10 can be emitted in
order 10..1. This requires the usage of an array in memory,
which grows as the file (or piped input) is acquired, then
the memory array is dumped in the END() clause of the program.
In such a situation, a 10GB text file cannot be processed
by a 2GB RAM machine. "A person has to know their limits."
We might also have to decide what to do about
Unknown Person's Notebook - 1.png
...
Unknown Person's Notebook - 33.png
or the multiple iterator case (which is "easy" from
a sorting perspective, but how do we know which
iterator is the least significant one). Maybe the
controlling iterator is the one on the right.
Unknown 01 Person's Notebook - 01.png
Unknown 01 Person's Notebook - 02.png
Unknown 02 Person's Notebook - 01.png
Unknown 02 Person's Notebook - 02.png
Unknown 03 Person's Notebook - 01.png
Unknown 03 Person's Notebook - 02.png
output:
Unknown 01 Person's Notebook - 01.png Group 01
...
Unknown 03 Person's Notebook - 01.png
Unknown 01 Person's Notebook - 02.png Group 02
...
Unknown 03 Person's Notebook - 02.png
You could scan for digits from the right, and
assume the operator is logically minded. Or something.
The version of Gawk I traditionally use, only knows
ASCII. I don't know what the latest evolution is, in terms
of, say, UTF-8. Part of the problem, is the notion of
a character being one byte wide, and what does the
Gawk program do when the characters are variable width.
One side effect, is the runtime could be considerably
slower. Or, the memory array representation could be
"very inefficient" and four times larger than normal.
Sorta like how some image editing programs now use
absurdly wide internal representations.
The first part of any program, is "a complete specification".
The effort to write the program goes up exponentially,
if the program specification is "dribbling in". For example,
one of my attempts to tame some ls -R output, ran into
character set problems. And my solution at the time, was
to delete the offending files ("save as web page complete"
was the source of the bad file names).
Awk can store the entire input in memory, if you want it to.
*******
I'll offer these two.
find /media/FOREIGN -type d -exec ls -al -1 -d {} + > dirlist.txt
find /media/FOREIGN -type f -exec ls -al -1 {} + > filelist.txt
The "dirlist" is a succinct summary, with less detail
than you would like.
But it also didn't require writing a program.
Paul
On 2021-08-01, Paul <nospam@needed.invalid> wrote:
Java Jive wrote:
I have an archive of scanned documents which I need to index. A typical >>> sample output of ls is appended. I want to clean this up so that only
the first and last of each section are output, separated by a single
line containing just '...'. Can anyone suggest a way of doing this by
piping the output through awk or sed on the fly, rather than having to
write a program to post-process the index?
Each section means what?
If youjust want the first and last
awk ' {T=$0; if ( NR==1) print T; }
END {if ( NR>2 ) then print "..."; print T} '
I want to clean this up so that only the first and last of each
section are output, separated by a single line containing just '...'.
Can anyone suggest a way of doing this by piping the output through
awk or sed on the fly, rather than having to write a program to
post-process the index?
I have an archive of scanned documents which I need to index. A typical sample output of ls is appended. I want to clean this up so that only
the first and last of each section are output, separated by a single
line containing just '...'. Can anyone suggest a way of doing this by
piping the output through awk or sed on the fly, rather than having to
write a program to post-process the index?
Desired:
Family History/Unknown/Unknown Person's Notebook:
Unknown Person's Notebook - 01.png
...
Unknown Person's Notebook - 33.png
Unknown Person's Notebook - End 0.png
...
Unknown Person's Notebook - End 5.png
I have an archive of scanned documents which I need to index. A typical sample output of ls is appended. I want to clean this up so that only
the first and last of each section are output, separated by a single
line containing just '...'. Can anyone suggest a way of doing this by
piping the output through awk or sed on the fly, rather than having to
write a program to post-process the index?
Desired:
Family History/Unknown/Unknown Person's Notebook:
Unknown Person's Notebook - 01.png ....
Unknown Person's Notebook - 33.png Unknown Person's Notebook - End 0.png
....
Unknown Person's Notebook - End 5.png Unknown Person's Notebook - Insert 00a.png Unknown Person's Notebook - Insert 00b.png Unknown Person's
Notebook - Insert 01.png Unknown Person's Notebook - Insert 02a - Sketch
Of Monument, Dekklan, India.png Unknown Person's Notebook - Insert 02b - Sketch Of Monument & Outcrop, Dekklan, India.png Unknown Person's
Notebook - Insert 03 - 17800208.png Unknown Person's Notebook - Insert
04.png Unknown Person's Notebook - Insert 05.png Unknown Person's
Notebook - Insert 06 - Sketch Of Crocodile.png Unknown Person's Notebook
- Insert 07a.png Unknown Person's Notebook - Insert 07b.png Unknown
Person's Notebook - Insert 08 - Sketch Of Boat.png Unknown Person's
Notebook - Insert 09a - Sketch Of Building.png Unknown Person's Notebook
- Insert 09b - Fragment Of Writing.png Unknown Person's Notebook -
Insert 10.png Unknown Person's Notebook - Insert 11a.png Unknown
Person's Notebook - Insert 11b - Fragment Of Writing.png Unknown
Person's Notebook - Insert 12.png Unknown Person's Notebook - Insert 13
- Sketch Of Bird.png Unknown Person's Notebook - Insert 14a - Sketch Of Ancient Ruins.png Unknown Person's Notebook - Insert 14b - Sketch Of
Ancient Building (partly completed).png Unknown Person's Notebook -
Insert 15a - ''La Poèsie didactique des Hébreu' - 1.png ....
Unknown Person's Notebook - Insert 15a - ''La Poèsie didactique des
Hébreu' - 6.png Unknown Person's Notebook - Insert 15b - Genealogy Of
Job - 1.png Unknown Person's Notebook - Insert 15b - Genealogy Of Job -
2.png Unknown Person's Notebook.txt
Original output for ls -1pr <etc>
Family History/Unknown/Unknown Person's Notebook:
Unknown Person's Notebook - 01.png Unknown Person's Notebook - 02.png
Unknown Person's Notebook - 03.png Unknown Person's Notebook - 04.png
Unknown Person's Notebook - 05.png Unknown Person's Notebook - 06.png
Java Jive wrote:
I have an archive of scanned documents which I need to index. A typical
sample output of ls is appended. I want to clean this up so that only
the first and last of each section are output, separated by a single
line containing just '...'. Can anyone suggest a way of doing this by
piping the output through awk or sed on the fly, rather than having to
write a program to post-process the index?
Desired:
Family History/Unknown/Unknown Person's Notebook:
Unknown Person's Notebook - 01.png
...
Unknown Person's Notebook - 33.png
Unknown Person's Notebook - End 0.png
...
Unknown Person's Notebook - End 5.png
Awk (Gawk) has the ability to store things in arrays.
For example, in awk, I can reverse the order of lines in
a text file. A file with lines 1..10 can be emitted in
order 10..1. This requires the usage of an array in memory,
which grows as the file (or piped input) is acquired, then
the memory array is dumped in the END() clause of the program.
In such a situation, a 10GB text file cannot be processed
by a 2GB RAM machine. "A person has to know their limits."
We might also have to decide what to do about
Unknown Person's Notebook - 1.png
...
Unknown Person's Notebook - 33.png
or the multiple iterator case (which is "easy" from
a sorting perspective, but how do we know which
iterator is the least significant one). Maybe the
controlling iterator is the one on the right.
Unknown 01 Person's Notebook - 01.png
Unknown 01 Person's Notebook - 02.png
Unknown 02 Person's Notebook - 01.png
Unknown 02 Person's Notebook - 02.png
Unknown 03 Person's Notebook - 01.png
Unknown 03 Person's Notebook - 02.png
output:
Unknown 01 Person's Notebook - 01.png Group 01
...
Unknown 03 Person's Notebook - 01.png
Unknown 01 Person's Notebook - 02.png Group 02
...
Unknown 03 Person's Notebook - 02.png
You could scan for digits from the right, and
assume the operator is logically minded. Or something.
The version of Gawk I traditionally use, only knows
ASCII. I don't know what the latest evolution is, in terms
of, say, UTF-8. Part of the problem, is the notion of
a character being one byte wide, and what does the
Gawk program do when the characters are variable width.
One side effect, is the runtime could be considerably
slower. Or, the memory array representation could be
"very inefficient" and four times larger than normal.
Sorta like how some image editing programs now use
absurdly wide internal representations.
The first part of any program, is "a complete specification".
The effort to write the program goes up exponentially,
if the program specification is "dribbling in". For example,
one of my attempts to tame some ls -R output, ran into
character set problems. And my solution at the time, was
to delete the offending files ("save as web page complete"
was the source of the bad file names).
Awk can store the entire input in memory, if you want it to.
*******
I'll offer these two.
find /media/FOREIGN -type d -exec ls -al -1 -d {} + > dirlist.txt
find /media/FOREIGN -type f -exec ls -al -1 {} + > filelist.txt
The "dirlist" is a succinct summary, with less detail
than you would like.
But it also didn't require writing a program.
Paul
On 01/08/2021 18:02, Java Jive wrote:
I have an archive of scanned documents which I need to index. A
typical sample output of ls is appended. I want to clean this up so
that only the first and last of each section are output, separated by
a single line containing just '...'. Can anyone suggest a way of
doing this by piping the output through awk or sed on the fly, rather
than having to write a program to post-process the index?
Desired:
Family History/Unknown/Unknown Person's Notebook:
Unknown Person's Notebook - 01.png
....
Unknown Person's Notebook - 33.png
Unknown Person's Notebook - End 0.png
....
Unknown Person's Notebook - End 5.png
Unknown Person's Notebook - Insert 00a.png
Unknown Person's Notebook - Insert 00b.png
Unknown Person's Notebook - Insert 01.png
Unknown Person's Notebook - Insert 02a - Sketch Of Monument, Dekklan,
India.png
Unknown Person's Notebook - Insert 02b - Sketch Of Monument & Outcrop,
Dekklan, India.png
Unknown Person's Notebook - Insert 03 - 17800208.png
Unknown Person's Notebook - Insert 04.png
Unknown Person's Notebook - Insert 05.png
Unknown Person's Notebook - Insert 06 - Sketch Of Crocodile.png
Unknown Person's Notebook - Insert 07a.png
Unknown Person's Notebook - Insert 07b.png
Unknown Person's Notebook - Insert 08 - Sketch Of Boat.png
Unknown Person's Notebook - Insert 09a - Sketch Of Building.png
Unknown Person's Notebook - Insert 09b - Fragment Of Writing.png
Unknown Person's Notebook - Insert 10.png
Unknown Person's Notebook - Insert 11a.png
Unknown Person's Notebook - Insert 11b - Fragment Of Writing.png
Unknown Person's Notebook - Insert 12.png
Unknown Person's Notebook - Insert 13 - Sketch Of Bird.png
Unknown Person's Notebook - Insert 14a - Sketch Of Ancient Ruins.png
Unknown Person's Notebook - Insert 14b - Sketch Of Ancient Building
(partly completed).png
Unknown Person's Notebook - Insert 15a - ''La Poèsie didactique des
Hébreu' - 1.png
....
Unknown Person's Notebook - Insert 15a - ''La Poèsie didactique des
Hébreu' - 6.png
Unknown Person's Notebook - Insert 15b - Genealogy Of Job - 1.png
Unknown Person's Notebook - Insert 15b - Genealogy Of Job - 2.png
Unknown Person's Notebook.txt
Thanks Grant & Paul. To clarify:
There's no point in putting '...' in between Genealogy Of Job - 1 & 2
because there's nothing missing and it would make the index longer, not shorter. The minimum series length that it's worthwhile for is 3.
There's only ever one iterator in operation at a time, and it's always
the last number in the filename.
So how would I truncate the current line in awk or sed, $0 in the
former, and hold it for comparison to the following lines until there's
a mismatch? I've used sed for very simple s/pattern/replace/ type operations, but it's inner workings are something of a mystery. I've
only ever done the simplest things in awk.
I can see exactly how I would write a shell program to do this with the
input read from a file dump of ls output, but I can't help feeling there
must be a better way of doing it on the fly.
On 01/08/2021 18:02, Java Jive wrote:
I have an archive of scanned documents which I need to index. A
typical sample output of ls is appended. I want to clean this up so
that only the first and last of each section are output, separated by
a single line containing just '...'. Can anyone suggest a way of
doing this by piping the output through awk or sed on the fly, rather
than having to write a program to post-process the index?
Desired:
Family History/Unknown/Unknown Person's Notebook:
Unknown Person's Notebook - 01.png
....
Unknown Person's Notebook - 33.png
Unknown Person's Notebook - End 0.png
....
Unknown Person's Notebook - End 5.png
Unknown Person's Notebook - Insert 00a.png
Unknown Person's Notebook - Insert 00b.png
Unknown Person's Notebook - Insert 01.png
Unknown Person's Notebook - Insert 02a - Sketch Of Monument, Dekklan,
India.png
Unknown Person's Notebook - Insert 02b - Sketch Of Monument & Outcrop,
Dekklan, India.png
Unknown Person's Notebook - Insert 03 - 17800208.png
Unknown Person's Notebook - Insert 04.png
Unknown Person's Notebook - Insert 05.png
Unknown Person's Notebook - Insert 06 - Sketch Of Crocodile.png
Unknown Person's Notebook - Insert 07a.png
Unknown Person's Notebook - Insert 07b.png
Unknown Person's Notebook - Insert 08 - Sketch Of Boat.png
Unknown Person's Notebook - Insert 09a - Sketch Of Building.png
Unknown Person's Notebook - Insert 09b - Fragment Of Writing.png
Unknown Person's Notebook - Insert 10.png
Unknown Person's Notebook - Insert 11a.png
Unknown Person's Notebook - Insert 11b - Fragment Of Writing.png
Unknown Person's Notebook - Insert 12.png
Unknown Person's Notebook - Insert 13 - Sketch Of Bird.png
Unknown Person's Notebook - Insert 14a - Sketch Of Ancient Ruins.png
Unknown Person's Notebook - Insert 14b - Sketch Of Ancient Building
(partly completed).png
Unknown Person's Notebook - Insert 15a - ''La Poèsie didactique des
Hébreu' - 1.png
....
Unknown Person's Notebook - Insert 15a - ''La Poèsie didactique des
Hébreu' - 6.png
Unknown Person's Notebook - Insert 15b - Genealogy Of Job - 1.png
Unknown Person's Notebook - Insert 15b - Genealogy Of Job - 2.png
Unknown Person's Notebook.txt
Thanks Grant & Paul. To clarify:
There's no point in putting '...' in between Genealogy Of Job - 1 & 2
because there's nothing missing and it would make the index longer, not shorter. The minimum series length that it's worthwhile for is 3.
There's only ever one iterator in operation at a time, and it's always
the last number in the filename.
So how would I truncate the current line in awk or sed, $0 in the
former, and hold it for comparison to the following lines until there's
a mismatch? I've used sed for very simple s/pattern/replace/ type operations, but it's inner workings are something of a mystery. I've
only ever done the simplest things in awk.
I can see exactly how I would write a shell program to do this with the
input read from a file dump of ls output, but I can't help feeling there
must be a better way of doing it on the fly.
Not thoroughly tested. Will show some awk syntax, no guarantees
it meets the specs :-)
********************************* redund.awk ******************************
# howtorun
# gawk -f redund.awk inputfile.txt > outputfile.txt
# ls-like-program-piped-to | gawk -f redund.awk
# I usually put data samples inline like this, so I can stare at
# them while writing snippers for stuff.
# Unknown Person's Notebook - 01.png
# Unknown Person's Notebook - 02.png
# Unknown Person's Notebook - 03.png
# Unknown Person's Notebook - End 0.png
# Unknown Person's Notebook - End 1.png
# Unknown Person's Notebook - End 2.png
# Unknown Person's Notebook - Insert 00a.png
# Unknown Person's Notebook - Insert 00b.png
# Unknown Person's Notebook - Insert 01.png
# Unknown Person's Notebook - Insert 02a - Sketch Of Monument, Dekklan, India, 30451.png
# 000001
# 000002
# 000003
# 000004.png.jpg # not compressible with the others
# Test some commands first, copy stuff from Internet, etc
#
# gawk '{match($0,/[0-9]{6}/);print substr($0,RSTART,RLENGTH)}'
Input_file
# gawk "{match($0,/[0-9]/);print substr($0,RSTART,RLENGTH)}"
Works for the first digit only
# gawk "{match($0,/[[:digit:]]+/);print substr($0,RSTART,RLENGTH)}"
Works to detect first instance
# gawk "{match($0,/[[:digit:]]+/,arr);print arr[1] " " arr[2]}"
Probably gawk5 only, cannot use
# for(i=length($0);i>0;i--) x=x substr($0,i,1); A
way to reverse a string, not needed
BEGIN {
FS = "." # peel off extension, if present, using $0 processing
oldok = 0
oldroot = "" # not a problem with oldok false
}
{ # check the end of $1 for digits
# match($0,/[[:digit:]]+/);print substr($0,RSTART,RLENGTH)
# By using a field separator of ".", we must disqualify NF>2 cases
like 000004.png.jpg
# split() can be used instead of FS and $0 processing, for more
general programming solutions
# Since the field separator is used very little in this program, it
can be "wasted" like this.
ok = (NF<=2) # boolean for string with compressible digits on end, initial determination
match($1, /[[:digit:]]+/ ) # side effect... sets RSTART RLENGTH
ok = (RSTART+RLENGTH-1 == length($1)) && ok # ok true is equal
to 1, false is 0
root = substr($1,1,RSTART-1) # Empty string for
filename "000001"
# print ok " " $1 " \"" root "\"" # the usual debug
statement
if ( root == oldroot && ok == 1 && oldok == 1) {
cntr++
}
if ( root != oldroot || ok == 0) { # new assignment
# Check processing of stuff in buffer
if (oldok == 1) {
if (cntr > 2) {
print "..."
}
if (cntr > 1) {
print oldstr
}
# cntr = 1 has already been printed
}
cntr = 1
print $0 # opening stanza of a potential compression
}
# bookkeeping
oldroot = root
oldok = ok
oldstr = $0
# When I make doodles like the following in the source, it means I'm
# struggling with the if-then-else order and making the code
# as succinct as possible. This table started me out on the
# wrong leg, and it took a second try to make a better if-then-else
# root ok oldroot oldok cntr oldstr
# xxx yyy dump previous if
oldok,cntr,oldstr, define cntr = 1, print opening line
# xxx 1 xxx 1 increment cntr
}
********************************* end redund.awk ******************************
On Tue, 03 Aug 2021 01:14:16 +0100, Java Jive wrote:
Years ago, when I was getting into awk, I found the O'Reilly book,
"sed & awk", subtitled "UNIX Power Tools" to be really helpful.
I think it explains how awk works, how to use it and shows what it can do better than anything else I've found. It contains a lot of non-trivial example code too.
That's not to knock the awk manpage, which is a good reference guide, specially for the various built-in functions, just that I think the book explains the way to structure awk scripts rather better than the manpage.
--
Martin | martin at
Gregorie | gregorie dot org
On Tue, 03 Aug 2021 01:14:16 +0100, Java Jive wrote:
Years ago, when I was getting into awk, I found the O'Reilly book,
"sed & awk", subtitled "UNIX Power Tools" to be really helpful.
I think it explains how awk works, how to use it and shows what it can do better than anything else I've found. It contains a lot of non-trivial example code too.
That's not to knock the awk manpage, which is a good reference guide, specially for the various built-in functions, just that I think the book explains the way to structure awk scripts rather better than the manpage.
--
Martin | martin at
Gregorie | gregorie dot org
Martin Gregorie wrote:
On Tue, 03 Aug 2021 01:14:16 +0100, Java Jive wrote:
Years ago, when I was getting into awk, I found the O'Reilly book,
"sed & awk", subtitled "UNIX Power Tools" to be really helpful.
I think it explains how awk works, how to use it and shows what it can
do better than anything else I've found. It contains a lot of
non-trivial example code too.
That's not to knock the awk manpage, which is a good reference guide,
specially for the various built-in functions, just that I think the
book explains the way to structure awk scripts rather better than the
manpage.
For zero dollars, you can get Arnold Robbins "Gawk.pdf",
which is all the instruction manual you need.
It's far from a perfect language.
Sometimes it crushed the problem you're working on. And other times, it's
the problem (take sorting as an example of migraine-induction).
It's far from a perfect language.
Sometimes it crushed the problem you're working on. And other times, it's
the problem (take sorting as an example of migraine-induction).
On 01/08/2021 18:26, Richard Kettlewell wrote:
Java Jive <java@evij.com.invalid> writes:
On 01/08/2021 17:02, Java Jive wrote:
snip <
WTF is Java.Jive@f1.n221.z2.fidonet.fi and why is he duplicating my
posts here?
Someone is running a broken fido/usenet gateway.
Injection-Info: gioia.aioe.org; logging-data="10598"; posting-host="F7FIqN6dkowTZ1CLxZIWTQ.user.gioia.aioe.org"; mail-complaints-to="abuse@aioe.org";
They’re injecting via aioe.org, so I guess complaints to there.
Done, thanks for your explanation, we'll have to wait and see what the
result of the complaint is.
On 01/08/2021 18:26, Richard Kettlewell wrote:
Java Jive <java@evij.com.invalid> writes:
On 01/08/2021 17:02, Java Jive wrote:
snip <
WTF is Java.Jive@f1.n221.z2.fidonet.fi and why is he duplicating my
posts here?
Someone is running a broken fido/usenet gateway.
Injection-Info: gioia.aioe.org; logging-data="10598"; posting-host="F7FIqN6dkowTZ1CLxZIWTQ.user.gioia.aioe.org";mail-complaints-to="abuse@aioe.org";
They’re injecting via aioe.org, so I guess complaints to there.
Done, thanks for your explanation, we'll have to wait and see what the
result of the complaint is.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 286 |
Nodes: | 16 (2 / 14) |
Uptime: | 81:15:25 |
Calls: | 6,495 |
Calls today: | 6 |
Files: | 12,096 |
Messages: | 5,276,696 |