XPost: linux.debian.bugs.dist
Hello all
Now that we are using the more modern tool onsgmls instead of nsgmls in our "validate" script:
https://anonscm.debian.org/cgit/debwww/cron.git/tree/scripts/validate
I've returned to this bug.
The output of the validate script for the files containing "emojis" didn't change much:
**** Errors validating
/srv/www.debian.org/www/international/l10n/po/en_GB.it.html: ***
Line 122, character 357: cannot convert character reference to number
128513 because character not in internal character set
I was a bit surprised that we are still getting these errors, because if I pass the online w3c validator
https://validator.w3.org/ or even a manual onsgmls command in the machine that builds the website:
onsgmls -E0 -s /path/to/dtd /path/to/file
in both cases I don't get any error.
So I've looked at the "validate" script and played a bit with the options set there, and I'd like to bring to your attention the lines L363-376:
# Determine whether we're dealing with HTML or XHTML and set the SP
# environment accordingly.
if ($xhtml{$htmlLevel}) {
$ENV{'SGML_CATALOG_FILES'} = $xhtmlCatalog;
$ENV{'SP_ENCODING'} = 'xml';
} else {
$ENV{'SGML_CATALOG_FILES'} = $htmlCatalog;
if (defined $charset) {
$ENV{'SP_ENCODING'} = $charset;
} else {
$ENV{'SP_ENCODING'} = "ISO-8859-1";
}
}
$ENV{'SP_CHARSET_FIXED'} = 1
If I comment this last line (and thus, letting onsgmls run in not fixed mode), I
get no errors validating the file.
I've read the documentation about these options:
http://openjade.sourceforge.net/doc/charset.htm
but frankly I don't understand it very much.
I've done:
larjona@wolkenstein:~$ sudo -u debwww env | grep SP_
and it returns nothing, so I guess only the environment set in "validate" script
is taken into account, if we don't set the variables there, defaults rule.
I've modified and run a copy of the validate script, making it print some values
when checking a file, and document type is correctly detected (HTML 4.01 Strict), as well as charset (utf-8).
I'm not sure I can safely comment the line 376
$ENV{'SP_CHARSET_FIXED'} = 1;
to avoid the errors, or even comment the whole paragraph, and trust onsgmls to do the right thing.
Anybody with more experience in this can help?
Thanks
--
Laura Arjona Reina
https://wiki.debian.org/LauraArjona
--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)