In Vim I frequently jump from string to the next equal string using the commands '*' (forward search'n'jump) and '#' (backward search'n'jump).
With Unicode characters that doesn't seem to always work (at least not
per default).
In the following (UTF-8 encoded) test sample there is one subset of
Omega words where * and # works correctly and one where it doesn't
(starting with the cursor on the first letter of any word)
Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega
In comp.editors, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
In Vim I frequently jump from string to the next equal string using the
commands '*' (forward search'n'jump) and '#' (backward search'n'jump).
With Unicode characters that doesn't seem to always work (at least not
per default).
In the following (UTF-8 encoded) test sample there is one subset of
Omega words where * and # works correctly and one where it doesn't
(starting with the cursor on the first letter of any word)
Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega
This is like complaining that a search for "MISS" does not also match "МІЅЅ". They are completely different strings that just happen to look alike with certain font choices.
Some of those are "ohm sign", "Latin small letter m", "Latin small
letter e", "Latin small letter g", "Latin small letter a" and the
others are "Greek capital letter omega", "Latin small letter m",
"Latin small letter e", "Latin small letter g", "Latin small letter
a".
Your "difference is only the encoding" fails to grasp that Unicode is semiotics aware, even if users might not be.
Try to copy/paste the line into a Vim session, then move the cursor
onto the first character of the first word, then type * repeatedly.
Then do the same starting with the first character of the third word,
and observe the difference! - Tell me what you think about that.
In comp.editors, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
In Vim I frequently jump from string to the next equal string using the
commands '*' (forward search'n'jump) and '#' (backward search'n'jump).
With Unicode characters that doesn't seem to always work (at least not
per default).
In the following (UTF-8 encoded) test sample there is one subset of
Omega words where * and # works correctly and one where it doesn't
(starting with the cursor on the first letter of any word)
Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega
This is like complaining that a search for "MISS" does not also match "МІЅЅ". They are completely different strings that just happen to look alike with certain font choices.
Some of those are "ohm sign", "Latin
small letter m", "Latin small letter e", "Latin small letter g", "Latin
small letter a" and the others are "Greek capital letter omega",
"Latin small letter m", "Latin small letter e", "Latin small letter g", "Latin small letter a".
Your "difference is only the encoding" fails to grasp that Unicode is semiotics aware, even if users might not be.
Elijah
------
https://www.unicode.org/reports/tr36/#visual_spoofing
In Vim I frequently jump from string to the next equal string using the commands '*' (forward search'n'jump) and '#' (backward search'n'jump).
With Unicode characters that doesn't seem to always work (at least not
per default).
In the following (UTF-8 encoded) test sample there is one subset of
Omega words where * and # works correctly and one where it doesn't
(starting with the cursor on the first letter of any word)
Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega
The difference is only the encoding of the first character of that
word ('\x03A9' versus '\x2126'). For words with Ω=\x03A9 it works but
not for words with Ω=\x2126.
Is there a way to fix or achieve that function for all UTF-8 encoded
words?
Case 2 (cursor starting at first character of the _first_ word):
Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega Ωmega
^ ^ ^ ^ first turn
^ ^ ^ ^ second turn
[snip]
In any case, it is clear that # and * recognize alphabetic characters
like Greek capital *letter* omega differently from non-alphabet symbol characters like ohm *sign*. If you move along the line with "w" to jump between "words" you see the differences. The # and * searches use word boundaries, so word definitions are very important there.
You are still looking at an ohm sign and thinking of a letter which is
the trap of Unicode "look alikes", not something vim is doing wrong.
Is there, on the other hand, some sensible use-case for that
current [inconsistent] behavior (of ad hoc changing the pattern)?
In comp.editors, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
Is there, on the other hand, some sensible use-case for that
current [inconsistent] behavior (of ad hoc changing the pattern)?
It is a keyword search tool, not a random object search tool.
The word boundaries should be the indicator.
PS: Historically (IIRC), in Vi, there was just the # command
(but not the * which I saw later in Vim).
jump from a C function call backwards to find its declaration.
Application of Vi(m) broadened since then, and yet more useful
features and changes entered the Vim command base.
In comp.editors, Janis Papanagnou <janis_papanagnou+ng@hotmail.com> wrote:
PS: Historically (IIRC), in Vi, there was just the # command
(but not the * which I saw later in Vim).
I do not believe you. For starters, nvi has a completely different
function bound to #, and nvi tries to be backwards compatible with vi.
jump from a C function call backwards to find its declaration.
Application of Vi(m) broadened since then, and yet more useful
features and changes entered the Vim command base.
It occurs to me that you may like the boundary free versions of * and #: prefix them with a g.
:noremap * g*
:noremap # g#
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 299 |
Nodes: | 16 (2 / 14) |
Uptime: | 72:22:34 |
Calls: | 6,694 |
Calls today: | 4 |
Files: | 12,228 |
Messages: | 5,346,765 |
Posted today: | 1 |