why doesn't grep count 2 commas
echo 'Kích thước máy xay cỏ, giá máy thế nào , phụ tùng máy mua ở đâu' | grep -c ,
1
echo 'Kích thước máy xay cỏ, giá máy thế nào , phụ tùng máy mua ở đâu' | cut -d, -f1
Kích thước máy xay cỏ
echo 'Kích thước máy xay cỏ, giá máy thế nào , phụ tùng máy mua ở đâu' | cut -d, -f2
giá máy thế nào
echo 'Kích thước máy xay cỏ, giá máy thế nào , phụ tùng máy mua ở đâu' | cut -d, -f3
phụ tùng máy mua ở đâu
why doesn't grep count 2 commas
echo 'Kích thước máy xay cỏ, giá máy thế nào , phụ tùng máy mua ở đâu' | grep -c ,
1
echo 'Kích thước máy xay cỏ, giá máy thế nào , phụ tùng máy mua ở đâu' | cut -d, -f1
Kích thước máy xay cỏ
echo 'Kích thước máy xay cỏ, giá máy thế nào , phụ tùng máy mua ở đâu' | cut -d, -f2
giá máy thế nào
echo 'Kích thước máy xay cỏ, giá máy thế nào , phụ tùng máy mua ở đâu' | cut -d, -f3
phụ tùng máy mua ở đâu
why doesn't grep count 2 commas
echo 'Kích thước máy xay cỏ, giá máy thế nào , phụ tùng máy mua ở đâu' |
grep -c ,
1
fxkl47BF@protonmail.com wrote:
why doesn't grep count 2 commas
echo 'Kích thước máy xay cỏ, giá máy thế nào , phụ tùng máy mua ở đâu' |
grep -c ,
1
An answer to this question
https://stackoverflow.com/questions/16679369/count-occurrences-of-a-char-in-a-string-using-bash
proposes
echo "referee" | tr -cd 'e' | wc -c
$ echo ',,,' | tr -cd ',' | wc -c
3
unicorn:~$ string="apple,banana,cherry,date"
unicorn:~$ commas=${string//[!,]/}
unicorn:~$ echo "${#commas}"
3
But at this point, we have to wonder what the *actual* goal is.
Hi,
Greg Wooledge wrote:
unicorn:~$ string="apple,banana,cherry,date"
unicorn:~$ commas=${string//[!,]/}
unicorn:~$ echo "${#commas}"
3
Always astonishing what a good bashism can do.
But at this point, we have to wonder what the *actual* goal is.
Up to now we only know about the astonishment of fxkl47BF@protonmail.com
that grep -c does not count characters.
For a more complicated use case i would write a little C program where
i'd be in control of every single bit of throughput.
(Ok, C causes scars on the programmer's self esteem. But what does not
kill me makes me just stronger. I'm a vim user.)
But at this point, we have to wonder what the *actual* goal is.
to exclude phrases with commas for seperate examination
I won, and you lost
I really don't think I'd try this with shell scripts. The tools just
aren't designed for this. You really want tools that are custom built
for natural language processing, or a language that lets you run
through a large string character by character in a fast, efficient
way (C comes to mind) if you're trying to build your tools from the
ground up.
On Fri, Jan 19, 2024 at 03:30:17PM +0000, fxkl47BF@protonmail.com wrote:
But at this point, we have to wonder what the *actual* goal is.
to exclude phrases with commas for seperate examination
Parsing natural language text is going to be tricky. I can only talk
about English, and not about whatever language your text is actually
written in.
Let's look at a few example English sentences:
Good morning, John.
I went to the store with Mary, Paul, Susan and Ralph.
I won, and you lost.
The bear, who was hungry, looked for food.
Oh, that's interesting.
These are five different examples of comma usage in English. Do you
happen to know in advance that your text will *only* contain samples
that use the fourth style above? Let's assume this. Let's then form
a template:
STUFF, ASIDE, MORE STUFF, ASIDE, STILL MORE STUFF.
I.e. given a sentence which conforms to expectation, we should see
an even number of commas (is *THIS* why you were counting them??) and
we should extract the ASIDEs from in between the first and second, then
the third and fourth, and so on.
So... uh, I guess my next question is: are you *pre-filtering* the
sentences and keeping only the ones which have an even number of
commas? Or have you already *done* that, and now you're asking how
to extract the ASIDEs?
I really don't think I'd try this with shell scripts. The tools just
aren't designed for this. You really want tools that are custom built
for natural language processing, or a language that lets you run
through a large string character by character in a fast, efficient
way (C comes to mind) if you're trying to build your tools from the
ground up.
The "obvious" algorithm for extracting the ASIDEs would be use a
simple finite state machine, and march through the sentence
character by character. When you encounter a comma, change state.
Otherwise, if you're in the "ASIDE" state, copy the character to your
output buffer. When you leave the "ASIDE" state, terminate the current output buffer and move to the next one. That's how I'd do it in C.
Add whitespace trimming and so on.
Also note that breaking a piece of natural language text *into*
sentences in the first place is extraordinarily difficult. If you
haven't already got a way to do that, you're probably screwed.
Seriously, asking the debian-user list how to count the number of
commas in a text file is *not* a good sign if you're dealing with a masters-degree-level problem in natural language analysis.
There shouldn't be a comma in that sentence, in English. There is in
the closely related expression "I won, you lost."
At the risk of being seen as old-fashioned, but as a user of both
languages, I think Perl is a much better choice than C for string
processing.
Greg Wooledge <greg@wooledge.org> wrote:
I won, and you lost
There shouldn't be a comma in that sentence, in English. There is in
the closely related expression "I won, you lost."
On Fri 19 Jan 2024 at 17:25:10 (+0000), debian-user@howorth.org.uk wrote:
Greg Wooledge <greg@wooledge.org> wrote:
I won, and you lost
There shouldn't be a comma in that sentence, in English. There is in
the closely related expression "I won, you lost."
That's rather proscriptive. "I won and you lost." and
"I won, and you lost." are two different sentences.
The first is a more neutral statement of fact. The
second carries an implication of triumphalism or
mockery: many speakers would expect a swoop upwards
in intonation on "won", a pause, and a steep drop
between "you" and "lost"; kinda like:
-⭜ ·¯⭝
if that works in your font.
What you lose (sorry) in "I won, you lost." is the
anacrusis, the ·, which many would pronounce "ən",
as in ənyeeoo. Without it, I'd be inclined to write
"I won. You lost." (similar intonation).
Disclaimer: my choice of intonation was to illustrate
one difference. There are many more ways of saying
all of those sentences.
Cheers,
David.
On Fri, Jan 19, 2024, 2:07 PM Thomas Schmitt <scdbackup@gmx.net <mailto:scdbackup@gmx.net>> wrote:
.....
(Ok, C causes scars on the programmer's self esteem. But what does not
kill me makes me just stronger. I'm a vim user.)
OK I'll mention that to my psychiatrist :-)
But the C programmers I knew were either really nice guys if they wrote
C on unix, or real toads if they wrote C for DOS/Windows. YMMV
Have a nice day :)Back in my Amiga days 25-35 years ago, I did my stuff in SAS-C or ARexx
Thomas
On Fri, Jan 19, 2024, 2:07 PM Thomas Schmitt <scdbackup@gmx.net <mailto:scdbackup@gmx.net>> wrote:
.....
(Ok, C causes scars on the programmer's self esteem. But what does not
kill me makes me just stronger. I'm a vim user.)
OK I'll mention that to my psychiatrist :-)
But the C programmers I knew were either really nice guys if they wrote
C on unix, or real toads if they wrote C for DOS/Windows. YMMV
Have a nice day :)
Thomas
On Fri 19 Jan 2024 at 17:25:10 (+0000), debian-user@howorth.org.uk wrote:
Greg Wooledge <greg@wooledge.org> wrote:
I won, and you lost
There shouldn't be a comma in that sentence, in English. There is in
the closely related expression "I won, you lost."
That's rather proscriptive. "I won and you lost." and
"I won, and you lost." are two different sentences.
On 2024-01-19, David Wright <deblis@lionunicorn.co.uk> wrote:
On Fri 19 Jan 2024 at 17:25:10 (+0000), debian-user@howorth.org.uk wrote:
Greg Wooledge <greg@wooledge.org> wrote:
I won, and you lost
There shouldn't be a comma in that sentence, in English. There is in
the closely related expression "I won, you lost."
That's rather proscriptive. "I won and you lost." and
"I won, and you lost." are two different sentences.
AFAIK, "you lost" is an independent clause and should be separated from
the independent clause that precedes it with a comma before the
coordinating conjunction.
Regardless of which grammar rules are right, wrong, or optional, the point
of this is that parsing natural language text is *stupidly difficult*.
A person who has to ask why "grep -c" doesn't count the number of commas
in a single line of text probably isn't able to take on this quest.
Any serious inquiries about natural language parsing should be directed
to an appropriate artificial intelligence mailing list instead of this one.
On Sat, Jan 20, 2024 at 05:09:58PM -0000, Curt wrote:
On 2024-01-19, David Wright <deblis@lionunicorn.co.uk> wrote:
On Fri 19 Jan 2024 at 17:25:10 (+0000), debian-user@howorth.org.uk wrote: >>>> Greg Wooledge <greg@wooledge.org> wrote:
I won, and you lost
There shouldn't be a comma in that sentence, in English. There is in
the closely related expression "I won, you lost."
That's rather proscriptive. "I won and you lost." and
"I won, and you lost." are two different sentences.
AFAIK, "you lost" is an independent clause and should be separated from
the independent clause that precedes it with a comma before the
coordinating conjunction.
Regardless of which grammar rules are right, wrong, or optional, the point
of this is that parsing natural language text is *stupidly difficult*.
A person who has to ask why "grep -c" doesn't count the number of commas
in a single line of text probably isn't able to take on this quest.
Any serious inquiries about natural language parsing should be directed
to an appropriate artificial intelligence mailing list instead of this one.
On 2024-01-19, David Wright <deblis@lionunicorn.co.uk> wrote:
On Fri 19 Jan 2024 at 17:25:10 (+0000), debian-user@howorth.org.uk wrote:
Greg Wooledge <greg@wooledge.org> wrote:
I won, and you lost
There shouldn't be a comma in that sentence, in English. There is in
the closely related expression "I won, you lost."
That's rather proscriptive. "I won and you lost." and
"I won, and you lost." are two different sentences.
AFAIK, "you lost" is an independent clause and should be separated from
the independent clause that precedes it with a comma before the
coordinating conjunction.
On Fri, Jan 19, 2024, 2:07 PM Thomas Schmitt <scdbackup@gmx.net> wrote:
.....
(Ok, C causes scars on the programmer's self esteem. But what does not
kill me makes me just stronger. I'm a vim user.)
OK I'll mention that to my psychiatrist :-)
But the C programmers I knew were either really nice guys if they wrote C
on unix, or real toads if they wrote C for DOS/Windows. YMMV
On Friday 19 January 2024 09:48:01 pm Nicholas Geovanis wrote:
On Fri, Jan 19, 2024, 2:07 PM Thomas Schmitt <scdbackup@gmx.net> wrote:
.....
(Ok, C causes scars on the programmer's self esteem. But what does not
kill me makes me just stronger. I'm a vim user.)
OK I'll mention that to my psychiatrist :-)
But the C programmers I knew were either really nice guys if they wrote C
on unix, or real toads if they wrote C for DOS/Windows. YMMV
Where does that leave those of us that wrote c for CP/M? :-)
Where does that leave those of us that wrote c for CP/M?
Where does that leave those of us that wrote c for CP/M?I wrote:
Or for MTS?Gene writes:
That, i've not heard of John, please expand.
Roy J. Tellason writes:That, i've not heard of John, please expand.
Where does that leave those of us that wrote c for CP/M?
Or for MTS?
Roy J. Tellason writes:Thanks John. I have heard of Amdahl, but it was decades ago.
Where does that leave those of us that wrote c for CP/M?I wrote:
Or for MTS?Gene writes:
That, i've not heard of John, please expand.
Michigan Terminal System. A multi-user OS running on the Amdahl 470V/6
at the University of Michigan.
Roy J. Tellason writes:
Where does that leave those of us that wrote c for CP/M?I wrote:
Or for MTS?Gene writes:
That, i've not heard of John, please expand.
Michigan Terminal System. A multi-user OS running on the Amdahl
470V/6 at the University of Michigan.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 307 |
Nodes: | 16 (2 / 14) |
Uptime: | 46:27:49 |
Calls: | 6,910 |
Files: | 12,377 |
Messages: | 5,429,526 |