I would like a tool that tries to find as many syntax errors as possible
in a python file. I know there is the risk of false positives when a
tool tries to recover from a syntax error and proceeds but I would
prefer that over the current python strategy of quiting after the first syntax error. I just want a tool for syntax errors. No style
enforcements. Any recommandations? -- Antoon Pardon
--
https://mail.python.org/mailman/listinfo/python-list
I would like a tool that tries to find as many syntax errors as possible in
a python file. I know there is the risk of false positives when a tool tries to recover from a syntax error and proceeds but I would prefer that over the current python strategy of quiting after the first syntax error. I just want a tool for syntax errors. No style enforcements. Any recommandations?
My guess is that finding 100 errors might turn out to be misleading. If you fix just the first, many others would go away.
Op 9/10/2022 om 17:49 schreef Avi Gross:
My guess is that finding 100 errors might turn out to be misleading. If you >fix just the first, many others would go away.
At this moment I would prefer a tool that reported 100 errors, which would allow me to easily correct 10 real errors, over the python strategy which quits
after having found one syntax error.
On 2022-10-09 12:09:17 +0200, Antoon Pardon wrote:
I would like a tool that tries to find as many syntax errors as possible in >> a python file. I know there is the risk of false positives when a tool tries >> to recover from a syntax error and proceeds but I would prefer that over the >> current python strategy of quiting after the first syntax error. I just want >> a tool for syntax errors. No style enforcements. Any recommandations?
There seems to have been increased interest in good error recovery over
the last years. I thought I had bookmarked a bunch of projects, but the
only one I can find right now is Lezer (https://marijnhaverbeke.nl/blog/lezer.html) which is part of the
CodeMirror (https://codemirror.net/) editor. Python is listed as a
currently supported language, so you might want to check that out.
Disclaimer: I haven't used CodeMirror, so I can't say anything about
its quality. The blog entry about Lezer was interesting, though.
hp
Am Sun, Oct 09, 2022 at 06:59:36PM +0200 schrieb Antoon Pardon:
Op 9/10/2022 om 17:49 schreef Avi Gross:
My guess is that finding 100 errors might turn out to be misleading. If you
fix just the first, many others would go away.
At this moment I would prefer a tool that reported 100 errors, which would allow me to easily correct 10 real errors, over the python strategy which quits
after having found one syntax error.
But the point is: you can't (there is no way to) be sure the
9+ errors really are errors.
https://stackoverflow.com/questions/4284313/how-can-i-check-the-syntax-of-python-script-without-executing-it
People seemed especially enthusiastic about the one-liner from jmd_dk.
Am Sun, Oct 09, 2022 at 06:59:36PM +0200 schrieb Antoon Pardon:
Op 9/10/2022 om 17:49 schreef Avi Gross:But the point is: you can't (there is no way to) be sure the
My guess is that finding 100 errors might turn out to be misleading. If you >>> fix just the first, many others would go away.At this moment I would prefer a tool that reported 100 errors, which would >> allow me to easily correct 10 real errors, over the python strategy which quits
after having found one syntax error.
9+ errors really are errors.
Unless you further constrict what sorts of errors you are
looking for and what margin of error or leeway for false
positives you want to allow.
Op 9/10/2022 om 19:23 schreef Karsten Hilbert:
Am Sun, Oct 09, 2022 at 06:59:36PM +0200 schrieb Antoon Pardon:
Op 9/10/2022 om 17:49 schreef Avi Gross:But the point is: you can't (there is no way to) be sure the
My guess is that finding 100 errors might turn out to be misleading. If youAt this moment I would prefer a tool that reported 100 errors, which would >>> allow me to easily correct 10 real errors, over the python strategy which quits
fix just the first, many others would go away.
after having found one syntax error.
9+ errors really are errors.
Unless you further constrict what sorts of errors you are
looking for and what margin of error or leeway for false
positives you want to allow.
Look when I was at the university we had to program in Pascal and
the compilor we used continued parsing until the end. Sure there
were times that after a number of reported errors the number of
false positives became so high it was useless trying to find the
remaining true ones, but it still was more efficient to correct the
obvious ones, than to only correct the first one.
I don't need to be sure. Even the occasional wrong correction
is probably still more efficient than quiting after the first
syntax error.
Op 9/10/2022 om 17:49 schreef Avi Gross:
My guess is that finding 100 errors might turn out to be misleading. Ifyou
fix just the first, many others would go away.
At this moment I would prefer a tool that reported 100 errors, which would allow me to easily correct 10 real errors, over the python strategy which quits
after having found one syntax error.
--
Antoon.
--
https://mail.python.org/mailman/listinfo/python-list
Op 9/10/2022 om 19:23 schreef Karsten Hilbert:
Am Sun, Oct 09, 2022 at 06:59:36PM +0200 schrieb Antoon Pardon:
If youOp 9/10/2022 om 17:49 schreef Avi Gross:
My guess is that finding 100 errors might turn out to be misleading.
wouldfix just the first, many others would go away.At this moment I would prefer a tool that reported 100 errors, which
which quitsallow me to easily correct 10 real errors, over the python strategy
after having found one syntax error.But the point is: you can't (there is no way to) be sure the
9+ errors really are errors.
Unless you further constrict what sorts of errors you are
looking for and what margin of error or leeway for false
positives you want to allow.
Look when I was at the university we had to program in Pascal and
the compilor we used continued parsing until the end. Sure there
were times that after a number of reported errors the number of
false positives became so high it was useless trying to find the
remaining true ones, but it still was more efficient to correct the
obvious ones, than to only correct the first one.
I don't need to be sure. Even the occasional wrong correction
is probably still more efficient than quiting after the first
syntax error.
--
Antoon.
--
https://mail.python.org/mailman/listinfo/python-list
Antoon, it may also relate to an interpreter versus compiler issue.
Something like a compiler for C does not do anything except write code in
an assembly language. It can choose to keep going after an error and start looking some more from a less stable place.
Interpreters for Python have to catch interrupts as they go and often run code in small batches. Continuing to evaluate after an error could cause weird effects.
So what you want is closer to a lint program that does not run code at all, or merely writes pseudocode to a file to be run faster later.
I will say that often enough a program could report more possible errors. Putting your code into multiple files and modules may mean you could
cleanly evaluate the code and return multiple errors from many modules as long as they are distinct. Finding all errors is not possible if recovery from one is not guaranteed.
Is it that onerous to fix one thing and run it again? It was once when you handed in punch cards and waited a day or on very busy machines.
Antoon, it may also relate to an interpreter versus compiler issue.
Something like a compiler for C does not do anything except write code in
an assembly language. It can choose to keep going after an error and start looking some more from a less stable place.
Interpreters for Python have to catch interrupts as they go and often run code in small batches. Continuing to evaluate after an error could cause weird effects.
But an error like setting the size of a fixed length data structure to the right size may result in oodles of errors about being out of range that magically get fixed by one change. Sometimes too much info just gives you a headache.
On 9 Oct 2022, at 18:54, Antoon Pardon <antoon.pardon@vub.be> wrote:

Op 9/10/2022 om 19:23 schreef Karsten Hilbert:
Am Sun, Oct 09, 2022 at 06:59:36PM +0200 schrieb Antoon Pardon:
Op 9/10/2022 om 17:49 schreef Avi Gross:But the point is: you can't (there is no way to) be sure the
My guess is that finding 100 errors might turn out to be misleading. If youAt this moment I would prefer a tool that reported 100 errors, which would >>> allow me to easily correct 10 real errors, over the python strategy which quits
fix just the first, many others would go away.
after having found one syntax error.
9+ errors really are errors.
Unless you further constrict what sorts of errors you are
looking for and what margin of error or leeway for false
positives you want to allow.
Look when I was at the university we had to program in Pascal and
the compilor we used continued parsing until the end. Sure there
were times that after a number of reported errors the number of
false positives became so high it was useless trying to find the
remaining true ones, but it still was more efficient to correct the
obvious ones, than to only correct the first one.
I don't need to be sure. Even the occasional wrong correction
is probably still more efficient than quiting after the first
syntax error.
--
Antoon.
--
https://mail.python.org/mailman/listinfo/python-list
But the point is: you can't (there is no way to) be sure the
9+ errors really are errors.
Unless you further constrict what sorts of errors you are
looking for and what margin of error or leeway for false
positives you want to allow.
Look when I was at the university we had to program in Pascal and
the compilor we used continued parsing until the end. Sure there
were times that after a number of reported errors the number of
false positives became so high it was useless trying to find the
remaining true ones, but it still was more efficient to correct the
obvious ones, than to only correct the first one.
I don't need to be sure. Even the occasional wrong correction
is probably still more efficient than quiting after the first
syntax error.
Is it that onerous to fix one thing and run it again? It was once when
you
handed in punch cards and waited a day or on very busy machines.
Yes I find it onerous, especially since I have a pipeline with unit tests
and other tools that all have to redo their work each time a bug is >corrected.
I just want a parser that doesn't give up on encoutering the first syntax error. Maybe do some semantic checking like checking the number of parameters.
On 2022-10-09 12:59:09 -0400, Thomas Passin wrote:
https://stackoverflow.com/questions/4284313/how-can-i-check-the-syntax-of-python-script-without-executing-it
People seemed especially enthusiastic about the one-liner from jmd_dk.
I don't think that one-liner solves Antoon's requirement of continuing
after an error. It uses just the normal python parser so it has exactly
the same limitations.
Some of the mentioned tools may do what Antoon wants, though.
hp
Is it that onerous to fix one thing and run it again? It was once when
you handed in punch cards and waited a day or on very busy machines.
Yes I find it onerous, especially since I have a pipeline with unit
tests and other tools that all have to redo their work each time a bug
is corrected.
Your suggestion makes me shudder!
Removing all earlier lines of code is often guaranteed to generate errors as >variables you are using are not declared or initiated, modules are not >imported and so on.
Removing just the line or three where the previous error happened would also >have a good chance of invalidating something.
On 09Oct2022 21:46, Antoon Pardon <antoon.pardon@vub.be> wrote:
Is it that onerous to fix one thing and run it again? It was once
when you
handed in punch cards and waited a day or on very busy machines.
Yes I find it onerous, especially since I have a pipeline with unit
tests
and other tools that all have to redo their work each time a bug is
corrected.
It is easy to get the syntax right before submitting to such a
pipeline. I usually run a linter on my code for serious commits, and
I've got a `lint1` alias which basicly runs the short fast flavour of
that which does a syntax check and the very fast less thorough lint phase.
It is easy to get the syntax right before submitting to such a
pipeline. I usually run a linter on my code for serious commits, and
I've got a `lint1` alias which basicly runs the short fast flavour of
that which does a syntax check and the very fast less thorough lint
phase.
If you have a linter that doesn't quit after the first syntax error,
please provide a link. I already tried pylint and it also quits after
the first syntax error.
Anton
There likely are such programs out there but are there universal agreements on how to figure out when a new safe zone of code starts where error
testing can begin?
For example a file full of function definitions might find an error in function 1 and try to find the end of that function and resume checking the next function. But what if a function defines local functions within it? What if the mistake in one line of code could still allow checking the next line rather than skipping it all?
My guess is that finding 100 errors might turn out to be misleading. If you fix just the first, many others would go away. If you spell a variable name wrong when declaring it, a dozen uses of the right name may cause errors. Should you fix the first or change all later ones?
How does one declare a variable in python? Sometimes it'd be nice to
be able to have declarations and any undeclared variable be flagged.
I would like a tool that tries to find as many syntax errors as possible
in a python file.
Cameron,
Your suggestion makes me shudder!
Removing all earlier lines of code is often guaranteed to generate errors as variables you are using are not declared or initiated, modules are not imported and so on.
On Mon, 10 Oct 2022 at 06:50, Antoon Pardon <antoon.pardon@vub.be> wrote:
I just want a parser that doesn't give up on encoutering the first syntax error. Maybe do some semantic checking like checking the number of parameters.
That doesn't make sense though.
It's one thing to keep going after finding a non-syntactic error, but
an error of syntax *by definition* makes parsing the rest of the file dubious.
What would it even *mean* to not give up?
On 2022-10-10 09:23:27 +1100, Chris Angelico wrote:
On Mon, 10 Oct 2022 at 06:50, Antoon Pardon <antoon.pardon@vub.be> wrote:
I just want a parser that doesn't give up on encoutering the first syntax error. Maybe do some semantic checking like checking the number of parameters.
That doesn't make sense though.
I think you disagree with most compiler authors here.
It's one thing to keep going after finding a non-syntactic error, but
an error of syntax *by definition* makes parsing the rest of the file dubious.
Dubious but still useful.
What would it even *mean* to not give up?
Read the blog post on Lezer for some ideas: https://marijnhaverbeke.nl/blog/lezer.html
This is in the context of an editor.
My guess is that finding 100 errors might turn out to be misleading.
If you
fix just the first, many others would go away. If you spell a variable name >>wrong when declaring it, a dozen uses of the right name may cause errors. >>Should you fix the first or change all later ones?
How does one declare a variable in python? Sometimes it'd be nice to
be able to have declarations and any undeclared variable be flagged.
I'd love it if there was something similar that I could do in python.
There's a huge difference between non-fatal errors and syntactic
errors. The OP wants the parser to magically skip over a fundamental >syntactic error and still parse everything else correctly. That's
never going to work perfectly, and the OP is surprised at this.
On 11Oct2022 08:02, Chris Angelico <rosuav@gmail.com> wrote:
There's a huge difference between non-fatal errors and syntactic
errors. The OP wants the parser to magically skip over a fundamental >syntactic error and still parse everything else correctly. That's
never going to work perfectly, and the OP is surprised at this.
The OP is not surprised by this, and explicitly expressed awareness that resuming a parse had potential for "misparsing" further code.
I remain of the opinion that one could resume a parse at the next
unindented line and get reasonable results a lot of the time.
In fact, I expect that one could resume tokenising at almost any line
which didn't seem to be inside a string and often get reasonable
results.
I grew up with C and Pascal compilers which would _happily_ produce many complaints, usually accurate, and all manner of syntactic errors. They
didn't stop at the first syntax error.
... await qdef f():
All you need in principle is a parser which goes "report syntax error
here, continue assuming <some state>". For Python that might mean
"pretend a missing final colon" or "close open brackets" etc, depending
on the context. If you make conservative implied corrections you can get
a reasonable continued parse, enough to find further syntax errors.
I remember the Pascal compiler in particular had a really good "you
missed a semicolon _back there_" mode which was almost always correct, a
nice boon when correcting mistakes.
On 09/10/2022 10.49, Avi Gross wrote:
Anton
There likely are such programs out there but are there universal
agreements
on how to figure out when a new safe zone of code starts where error
testing can begin?
For example a file full of function definitions might find an error in
function 1 and try to find the end of that function and resume
checking the
next function. But what if a function defines local functions within it? >> What if the mistake in one line of code could still allow checking the
next
line rather than skipping it all?
My guess is that finding 100 errors might turn out to be misleading.
If you
fix just the first, many others would go away. If you spell a variable
name
wrong when declaring it, a dozen uses of the right name may cause errors.
Should you fix the first or change all later ones?
How does one declare a variable in python? Sometimes it'd be nice to
be able to have declarations and any undeclared variable be flagged.
When I was writing F77 for a living, I'd (temporarily) put:
     IMPLICIT CHARACTER*3
at the beginning of a program or subroutine that I was modifying,
in order to have any typos flagged.
I'd love it if there was something similar that I could do in python.
Antonfunctions within it?
There likely are such programs out there but are there universal
agreements on how to figure out when a new safe zone of code starts
where error testing can begin?
For example a file full of function definitions might find an error in function 1 and try to find the end of that function and resume
checking the next function. But what if a function defines local
What if the mistake in one line of code could still allow checking thecause errors.
next line rather than skipping it all?
My guess is that finding 100 errors might turn out to be misleading.
If you fix just the first, many others would go away. If you spell a
variable name wrong when declaring it, a dozen uses of the right name may
Should you fix the first or change all later ones?
If the above is:
Import grumpy as np
Then what happens if the code tries to find a file named "grumpy" somewhere and cannot locate it and this is considered a syntax error rather than a run-time error for whatever reason? Can you continue when all kinds of functionality is missing and code asking to make a np.array([1,2,3]) clearly fails?
There's a huge difference between non-fatal errors and syntactic
errors. The OP wants the parser to magically skip over a fundamental >syntactic error and still parse everything else correctly. That's never
going to work perfectly, and the OP is surprised at this.
With the internet today, we are used to expecting error correction to come for free. Do you really need one of every 8 bits to be a parity bit, which only catches may half of the errors...
If the above is:
Import grumpy as np
Then what happens if the code tries to find a file named "grumpy"
somewhere and cannot locate it and this is considered a syntax error
rather than a run-time error for whatever reason? Can you continue
when all kinds of functionality is missing and code asking to make a np.array([1,2,3]) clearly fails?
I stand corrected Chris, and others, as I pay the sin tax.
Yes, there are many kinds of errors that logically fall into different categories or phases of evaluation of a program and some can be determined
by a more static analysis almost on a line by line (or "statement" or "expression", ...) basis and others need to sort of simulate some things
and look back and forth to detect possible incompatibilities and yet others can only be detected at run time and likely way more categories depending on the language.
But when I run the Python interpreter on code, aren't many such phases done interleaved and at once as various segments of code are parsed and examined and perhaps compiled into block code and eventually executed?
... print("Hello, world", 1>=2)code = """def f():
... print(tokenize.tok_name[t.exact_type], t.string)for t in tokenize.tokenize(iter(code.encode().split(b"\n")).__next__):
"Module(body=[FunctionDef(name='f', args=arguments(posonlyargs=[],import ast
ast.dump(ast.parse(code))
names=[alias(name='the_past')], level=0)], type_ignores=[])"ast.dump(ast.parse("from __future__ import the_past")) "Module(body=[ImportFrom(module='__future__',
names=[alias(name='braces')], level=0)], type_ignores=[])"ast.dump(ast.parse("from __future__ import braces")) "Module(body=[ImportFrom(module='__future__',
args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[FunctionDef(name='g', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]),ast.dump(ast.parse("def f():\n\tdef g():\n\t\tnonlocal x\n")) "Module(body=[FunctionDef(name='f', args=arguments(posonlyargs=[],
Traceback (most recent call last):compile(ast.parse("from __future__ import braces"), "-", "exec")
So is the OP asking for something other than a Python Interpreter that normally halts after some kind of error? Tools like a linter may indeed fit that mold.
This may limit some of the objections of when an error makes it hard for the parser to find some recovery point to continue from as no code is being run and no harmful side effects happen by continuing just an analysis.
Time to go read some books about modern ways to evaluate a language based on more mathematical rules including more precisely what is syntax versus ...
Suggestions?
With the internet today, we are used to expecting error correction to
come for free. Do you really need one of every 8 bits to be a parity
bit, which only catches may half of the errors...
I stand corrected Chris, and others, as I pay the sin tax.
Yes, there are many kinds of errors that logically fall into different categories or phases of evaluation of a program and some can be
determined by a more static analysis almost on a line by line (or
"statement" or "expression", ...) basis and others need to sort of
simulate some things and look back and forth to detect possible incompatibilities and yet others can only be detected at run time and
likely way more categories depending on the language.
But when I run the Python interpreter on code, aren't many such phases
done interleaved and at once as various segments of code are parsed
and examined and perhaps compiled into block code and eventually executed?
... print("Hello, world", 1>=2)code = """def f():
... print(tokenize.tok_name[t.exact_type], t.string)for t in tokenize.tokenize(iter(code.encode().split(b"\n")).__next__):
"Module(body=[FunctionDef(name='f', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Constant(value='Hello, world'), Compare(left=Constant(value=1), ops=[GtE()], comparators=[Constant(value=2)])], keywords=[])), Expr(value=Call(func=Name(id='print', ctx=Load()), args=[Name(id='Ellipsis', ctx=Load()), Constant(value=Ellipsis)], keywords=[])), Return(value=Constant(value=True))],import ast
ast.dump(ast.parse(code))
names=[alias(name='the_past')], level=0)], type_ignores=[])"ast.dump(ast.parse("from __future__ import the_past")) "Module(body=[ImportFrom(module='__future__',
names=[alias(name='braces')], level=0)], type_ignores=[])"ast.dump(ast.parse("from __future__ import braces")) "Module(body=[ImportFrom(module='__future__',
ast.dump(ast.parse("def f():\n\tdef g():\n\t\tnonlocal x\n")) "Module(body=[FunctionDef(name='f', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[FunctionDef(name='g', args=arguments(posonlyargs=[], args=[], kwonlyargs=[], kw_defaults=[], defaults=[]), body=[Nonlocal(names=['x'])], decorator_list=[])], decorator_list=[])], type_ignores=[])"
Traceback (most recent call last):compile(ast.parse("from __future__ import braces"), "-", "exec")
So is the OP asking for something other than a Python Interpreter that normally halts after some kind of error? Tools like a linter may
indeed fit that mold.
This may limit some of the objections of when an error makes it hardanalysis.
for the parser to find some recovery point to continue from as no code
is being run and no harmful side effects happen by continuing just an
Time to go read some books about modern ways to evaluate a languageversus ...
based on more mathematical rules including more precisely what is syntax
Suggestions?
Antoon Pardon wrote:I've been following the discussion from a distance and the whole time
I would like a tool that tries to find as many syntax errors as possible
in a python file.
I'm puzzled as to when such a tool would be needed. How many syntax errors can
you realistically put into a single Python file before compiling it for the first time?
Antoon Pardon wrote:
I would like a tool that tries to find as many syntax errors as possibleI'm puzzled as to when such a tool would be needed. How many syntax errors can
in a python file.
you realistically put into a single Python file before compiling it for the first time?
Thanks for a rather detailed explanation of some of what we have been discussing, Chris. The overall outline is about what I assumed was there but some of the details were, to put it politely, fuzzy.
I see resemblances to something like how a web page is loaded and operated.
I mean very different but at some level not so much.
I mean a typical web page is read in as HTML with various keyword regions expected such as <BODY> ... </BODY> or <DIV ...> ... </DIV> with things
often cleanly nested in others. The browser makes nodes galore in some kind of tree format with an assortment of objects whose attributes or methods represent aspects of what it sees. The resulting treelike structure has
names like DOM.
To a certain approximation, this tree starts a certain way but is regularly being manipulated (or perhaps a copy is) as it regularly is looked at to see how to display it on the screen at the moment based on the current tree contents and another set of rules in Cascading Style Sheets.
These are not at all the same thing but share a certain set of ideas and methods and can be very powerful as things interact.
In effect the errors in the web situation have such analogies too as in what happens if a region of HTML is not well-formed or uses a keyword not recognized.
There was a guy around a few years ago who suggested he would create a
system where you could create a series of some kind of configuration files for ANY language and his system would them compile or run programs for each and every such language? Was that on this forum? What ever happened to him?
I see resemblances to something like how a web page is loaded and operated.
I mean very different but at some level not so much.
I mean a typical web page is read in as HTML with various keyword regions expected such as <BODY> ... </BODY> or <DIV ...> ... </DIV> with things
often cleanly nested in others. The browser makes nodes galore in some kind of tree format with an assortment of objects whose attributes or methods represent aspects of what it sees. The resulting treelike structure has
names like DOM.
On 10/11/2022 3:10 AM, avi.e.gross@gmail.com wrote:
I see resemblances to something like how a web page is loaded and operated. I mean very different but at some level not so much.
I mean a typical web page is read in as HTML with various keyword regions expected such as <BODY> ... </BODY> or <DIV ...> ... </DIV> with things often cleanly nested in others. The browser makes nodes galore in some kind of tree format with an assortment of objects whose attributes or methods represent aspects of what it sees. The resulting treelike structure has names like DOM.
To bring things back to the context of the original post, actual web
browsers are extremely tolerant of HTML syntax errors (including
incorrect nesting of tags) in the documents they receive. They usually recover silently from errors and are able to display the rest of the
page. Usually they manage this correctly.
On Wed, 12 Oct 2022 at 05:23, Thomas Passin <list1@tompassin.net> wrote:
On 10/11/2022 3:10 AM, avi.e.gross@gmail.com wrote:
I see resemblances to something like how a web page is loaded and operated. >>> I mean very different but at some level not so much.
I mean a typical web page is read in as HTML with various keyword regions >>> expected such as <BODY> ... </BODY> or <DIV ...> ... </DIV> with things
often cleanly nested in others. The browser makes nodes galore in some kind >>> of tree format with an assortment of objects whose attributes or methods >>> represent aspects of what it sees. The resulting treelike structure has
names like DOM.
To bring things back to the context of the original post, actual web
browsers are extremely tolerant of HTML syntax errors (including
incorrect nesting of tags) in the documents they receive. They usually
recover silently from errors and are able to display the rest of the
page. Usually they manage this correctly.
Having had to debug tiny errors in HTML pages that resulted in
extremely weird behaviour, I'm not sure that I agree that they usually
manage correctly. Fundamentally, they guess, and guesswork is never
reliable.
The OP wants to get help with problems in
his files even if it isn't perfect, and I think that's reasonable to
wish for. The link to a post about the lezer parser in a recent message
on this thread is partly about how a real, practical parser can do some
error correction in mid-flight, for the purposes of a programming editor
(as opposed to one that has to build a correct program).
Personally, I'd most likely go for a decent programming editor that you
can set up to run a program on your file, use that to run a checker,
like pyflakes for instance, and run that from time to time. You could
run it when you save a file. Even if it only showed one error at a
time, it would make quick work of correcting mistakes. And it wouldn't
need to trigger an entire tool chain each time.
On Tue, 11 Oct 2022 at 09:18, Cameron Simpson <cs@cskk.id.au> wrote:
Consider:
if condition # no colon
code
else:
code
To actually "restart" parsing, you have to make a guess of some sort.
I grew up with C and Pascal compilers which would _happily_ produce many complaints, usually accurate, and all manner of syntactic errors. They didn't stop at the first syntax error.
Yes, because they work with a much simpler grammar.
# add an extra character within identifier, as if 'new' identifier
28 assert expected_value == fyibonacci_number
UUUUUUUUUUUUUU UUUUUUUUUUUUUUUUU
# these all trivial SYNTAX errors - could have tried leaving-out a
keyword, but ...
On 2022-10-11 09:47:52 +1100, Chris Angelico wrote:
On Tue, 11 Oct 2022 at 09:18, Cameron Simpson <cs@cskk.id.au> wrote:
Consider:
if condition # no colon
code
else:
code
To actually "restart" parsing, you have to make a guess of some sort.
Right. At least one of the papers on parsing I read over the last few
years (yeah, I really should try to find them again) argued that the
vast majority of syntax errors is either a missing token, a superfluous
token or a combination of the the two. So one strategy with good results
is to heuristically try to insert or delete single tokens and check
which results in the longest distance to the next error.
Checking multiple possible fixes has its cost, especially since you have
to do that at every error. So you can argue that it is better for productivity if you discover one error in 0.1 seconds than 10 errors in
5 seconds.
I grew up with C and Pascal compilers which would _happily_ produce many complaints, usually accurate, and all manner of syntactic errors. They didn't stop at the first syntax error.
Yes, because they work with a much simpler grammar.
I very much doubt that. Python doesn't have a particularly complicated grammar, and C certainly doesn't have a particularly simple one.
The argument that it's impossible in Python (unlike any other language), because Python is oh so special doesn't hold water.
On Thu, 13 Oct 2022 at 11:19, Peter J. Holzer <hjp-python@hjp.at> wrote:
On 2022-10-11 09:47:52 +1100, Chris Angelico wrote:
On Tue, 11 Oct 2022 at 09:18, Cameron Simpson <cs@cskk.id.au> wrote:
Consider:
if condition # no colon
code
else:
code
To actually "restart" parsing, you have to make a guess of some sort.
Right. At least one of the papers on parsing I read over the last few
years (yeah, I really should try to find them again) argued that the
vast majority of syntax errors is either a missing token, a superfluous token or a combination of the the two. So one strategy with good results
is to heuristically try to insert or delete single tokens and check
which results in the longest distance to the next error.
Checking multiple possible fixes has its cost, especially since you have
to do that at every error. So you can argue that it is better for productivity if you discover one error in 0.1 seconds than 10 errors in
5 seconds.
Maybe; but what if you report 10 errors in 5 seconds, but 8 of them
are spurious? You've reported two useful errors in a sea of noise.
Even if it's the other way around (8 where you nailed it and correctly reported the error, 2 that are nonsense), is it actually helpful?
I grew up with C and Pascal compilers which would _happily_ produce many
complaints, usually accurate, and all manner of syntactic errors. They didn't stop at the first syntax error.
Yes, because they work with a much simpler grammar.
I very much doubt that. Python doesn't have a particularly complicated grammar, and C certainly doesn't have a particularly simple one.
The argument that it's impossible in Python (unlike any other language), because Python is oh so special doesn't hold water.
Never said it's because Python is special; there are a LOT of
languages that are at least as complicated.
But I do think that Pascal, especially, has a significantly simpler
grammar than Python does.
I would like a tool that tries to find as many syntax errors as possible
in a python file. I know there is the risk of false positives when a
tool tries to recover from a syntax error and proceeds but I would
prefer that over the current python strategy of quiting after the first syntax error. I just want a tool for syntax errors. No style
enforcements. Any recommandations? -- Antoon Pardon
To bring things back to the context of the original post, actual web
browsers are extremely tolerant of HTML syntax errors (including incorrect nesting of tags) in the documents they receive.
I would like a tool that tries to find as many syntax errors as possible
in a python file. I know there is the risk of false positives when a
tool tries to recover from a syntax error and proceeds but I would
prefer that over the current python strategy of quiting after the first syntax error. I just want a tool for syntax errors. No style
enforcements. Any recommandations? -- Antoon Pardon
Parso is a Python parser that supports error recovery and round-trip parsing for different Python versions (in multiple Python versions). Parso is also able to list multiple syntax errors in your python file.
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 300 |
Nodes: | 16 (2 / 14) |
Uptime: | 66:43:49 |
Calls: | 6,712 |
Files: | 12,244 |
Messages: | 5,356,315 |