[There's a vast amount of work on edit distance. My guess is they
use something like Levenshtein, but rather than use a constant
distance of 1 between different letters, the distance varies depending
on how different the letters look. -John]
This clang blog specifically mentions Levenshtein,
http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#spell_checker
and it looks like what people do is to go through the entire symbol
table and compute it against the individual erroneous identifier.
I thought that'd be a bit on the expensive side,
So, without going through the source of rustc to find out, I'm curious
about what general techniques people use to make this work? In particu-
lar the Damerau–Levenshtein distance algorithm is not appropriate for dictionary lookups, as far as I know.
Dear c.compilers,
While experimenting with Rust, I came across this suggestion.
--> foo.rs:5:9
|
5 | return j; // the variable, not the type.
| ^ help: a local variable with a similar name exists: `i`
Here it is suggesting i where I typed j. This is the same problem as
spell checking identifiers with fuzzy matching, so apologies for a po- tentially misleading subject.
So, without going through the source of rustc to find out, I'm curious
about what general techniques people use to make this work? In particu-
lar the Damerau–Levenshtein distance algorithm is not appropriate for dictionary lookups, as far as I know.
On 2020-06-23, Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> wrote:
5 | return j; // the variable, not the type.
| ^ help: a local variable with a similar name exists: `i`
I don't find these kinds of childish diagnostic messages useful at all.
They have started to appear in GCC also.
The good old "undeclared identifier `j`" requires no update, thanks.
More than you probably wanted to know: http://www.coding-guidelines.com/cbook/sent792.pdf
Kaz Kylheku <937-053-0959@kylheku.com> schrieb:
On 2020-06-23, Johann 'Myrkraverk' Oskarsson <johann@myrkraverk.invalid> wrote:
5 | return j; // the variable, not the type.
| ^ help: a local variable with a similar name exists: `i`
I don't find these kinds of childish diagnostic messages useful at all.
They have started to appear in GCC also.
The good old "undeclared identifier `j`" requires no update, thanks.
At least gcc doesn't do the suggestion in that particular case:
foo.c: In function 'foo':
foo.c:8:7: error: 'j' undeclared (first use in this function)
8 | x[j] = a[i] > b[i];
| ^
foo.c:8:7: note: each undeclared identifier is reported only once for each function it appears in
On Tuesday, June 23, 2020 at 12:59:35 PM UTC-7, Johann 'Myrkraverk' Oskarsson wrote:
(snip)
This clang blog specifically mentions Levenshtein,
http://blog.llvm.org/2010/04/amazing-feats-of-clang-error-recovery.html#spell_checker
and it looks like what people do is to go through the entire symbol
table and compute it against the individual erroneous identifier.
I thought that'd be a bit on the expensive side,
With either constant weighting or character dependent weighting
it is easy to do with dynamic programming. The time is then O(m n)
where m and n are the two lengths.
It seems most obvious to do only variable that are in the appropriate
scope to be misspelled, but I suspect catching variables used out
of scope is also worth doing. Well, in the latter case, you could
hope that they at least spell them the same.
I think you should turn it off for one character names, though,
even though I suspect those are more likely. Too many false
positives!
You might not want J1 and J_one in the same program.
You might not want J1 and J_one in the same program.
Similarly, some ancient compiler (Euclid?) had case-insensitive lookup, but required the same capitalization everywhere
Fortran has a a bit of a similar issue with its C interoperability
feature.
Entities with C binding have global identifiers in Fortran. Fortran
is a case-insensitive laguage, so FooBar and foobar look the same
to Fortran, and you can not have a C binding to both (but either
one would work).
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 293 |
Nodes: | 16 (2 / 14) |
Uptime: | 231:57:31 |
Calls: | 6,624 |
Files: | 12,171 |
Messages: | 5,319,447 |