• document markup: HTML, LaTeX, etc.

    From Ivan Shmakov@21:1/5 to All on Fri Nov 4 17:30:21 2016
    XPost: comp.misc

    Rich <rich@example.invalid>:
    Marko Rauhamaa <marko@pacujo.net> wrote:

    [I took the liberty of adding news:comp.infosystems.www.misc to
    Newsgroups:, as the discussion would seem quite on-topic there.]

    [...]

    * At the bottom is HTML. It is extremely crude in its expressive
    power. It lists a handful of element types, but documents need
    more. In fact, different web sites need different element types.

    Well, I wouldn't describe HTML's "expressive power" as crude. HTML's expressive power is quite strong. What is "crude" for HTML is the
    default visual presentation of the underlying expressive power of
    HTML. The default visual rendering for basic HTML is what is quite
    crude.

    Which is quite understandable. First of all, contrary to LaTeX,
    there're multiple /independent/ "implementations of HTML" in
    existence. An author of a LaTeX document can request it being
    processed with "LaTeX 2\epsilon". With HTML, one gets no such
    luxury^1.

    And it won't pay out to standardize any "fancy" rendering as
    part of a future HTML version, either: as the authors would be
    unable to request a renderer that implements (at least) that
    specific version, they will be required to supply their own
    "fancy" CSS anyway. And if they won't benefit from a
    "fancy built-in" CSS, why introduce one in the first place?
    And that's good for the "forward compatibility", too.

    There's one another reason, however. Also contrary to LaTeX,
    HTML is expected to be rendered in a wide variety of forms, such
    as computer screens of varying dimensions, paper, by the means
    of speech synthesis, and so on^2. Indeed, the standard could
    prescribe that the document's title is rendered in a
    "17pt Roman" font^3, but how useful that would be if I read that
    document on a character-cell terminal with Lynx^4? And trying
    to specify /all/ the possible ways to render the title doesn't
    look like a sensible approach, either.


    Notes

    ^1 "Best viewed with" fad of the bad old days aside.

    ^2 Not to mention the processing by various "robot" software.

    ^3 Default \title formatting per classes.dtx.

    ^4 As I often do.

    [...]

    --
    FSF associate member #7257 58F8 0F47 53F5 2EB2 F6A5 8916 3013 B6A0 230E 334A

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ivan Shmakov@21:1/5 to All on Fri Nov 4 18:12:34 2016
    XPost: comp.misc

    Marko Rauhamaa <marko@pacujo.net> writes:

    [I took the liberty of adding news:comp.infosystems.www.misc to
    Newsgroups:, as the discussion would seem quite on-topic there.]

    [...]

    I mean HTML only has a handful of element types, regardless of
    visuals. Compare it with (La)TeX, which allows you to define new
    element types with semantics and all. In HTML, you do it by defining
    a div and a class name and externalizing the semantics outside HTML.

    There's no <aphorism>, <adage>, <saw>, <definition>, <theorem>, <conjecture>, <joke>, <guess>, <lie>, <rumor> etc...

    But of course you can define new HTML elements, much the same as
    you do that with TeX-based systems. As for the "prior art"
    example, Wayback Machine routinely uses <wb_p /> in place of
    <p /> (and more) for their markup on the archived pages^1.

    There are, of course, a couple drawbacks to this approach --
    which, arguably, are not all that different to what one gets
    with TeX.

    First of all, such HTML has little chance of passing "validity"
    tests^2. That said, TeX-based systems do not introduce the
    concept of "validity" at all; the document is deemed "good" as
    long as it renders the way it's intended. Or at the least, I'm
    not aware of any "LaTeX validators" currently in wide use.

    Also, while CSS makes it possible to specify the rendering^3,
    the semantics remain undefined. On the TeX side, using
    \def \foo \mathbf doesn't convey semantics, either -- only the
    presentation. And that's (partially) covered by CSS.

    And of course, CSS-wise, using such new elements is hardly any
    different to using the standard "blank" <div /> and <span />
    elements with a 'class' attribute. A more sensible approach is
    to use some standard element (thus "inheriting" its semantics
    for any third-party processors of said document) -- along with
    suitable 'class' and 'role' attributes, RDFa, etc.

    Now, as an aside, while I can imagine specific documents that
    would benefit from the elements above, I fail to see their
    utility to the Web at large. Should the search engine really
    treat <joke /> any differently to <theorem />, for instance?
    How <saw /> would be any different to <definition /> when
    interpreted by the Web user agent? (Other than for their
    presentation -- but we've got that covered with CSS, right?)
    If anything, it feels like over-engineering to me, alas.


    Ultimately, however, I'd like to note that the flexibility of
    TeX comes from it being a full-weight programming language --
    contrary to HTML and CSS, which are merely data languages.

    Then, it was already noted that the modern Web ecosystem employs
    both data languages, such as HTML and CSS (but also SVG, MathML,
    RDFa, various "microformats", etc.), -- and JavaScript for the
    programming language. And honestly, I'm not entirely sure that
    comparing a data language with a programming language quite
    makes sense. (So, if anything, shouldn't we rather be comparing
    TeX to JavaScript here? Instead of HTML and CSS, that is.)

    Hence, I claim that the power of TeX is also its weakness.
    Yes, one can implement a seemingly-declarative "markup" language
    in TeX (such as LaTeX), but will it be much different to
    implementing such a language in JavaScript? Yes, one can
    perform static analysis of a TeX document -- but will the
    results /always/ be more useful than performing that same static
    analysis on a pure JavaScript-based Web page^4? And no, one
    does not "process" a TeX document to get a PDF -- one has to
    "run" it instead. "Here be the halting problem."

    Somewhat less importantly, TeX code is even less isolated (by
    default) from the underlying system than JavaScript^5. One can
    easily \input /etc/passwd -- or write to any file the user is
    permitted to write to. And given that there're users that tend
    to be wary of running arbitrary JavaScript, what should they
    feel about running arbitrary TeX code?


    Now, to speak of the bright side. HTML5 possesses a decent
    "expressive power" and can be "specialized" as necessary by the
    means of (generally ad-hoc) 'class' values and (more
    standardized) "microformats", RDFa, etc. The standardization of
    such elements as <article />, <nav /> and <time /> in HTML5
    allows for easier extraction of the "payload" content and
    metadata from the compliant documents. The inclusion of the DOM
    interface specification makes it possible to provide uniform
    interface to "HTML objects" across programming languages.

    There're some developments (such as RASH^6) aimed at making HTML
    a suitable format for authoring scientific papers in.

    An even older project, MathJax^7, allows one to include quality
    mathematics in HTML documents. It supports several formats for
    both "input" (MathML, TeX, ASCIImath) and "output" (HTML and
    CSS, SVG, MathML.) The formulae are rendered on the user's
    side, which means that the user has a degree of control over the
    final presentation. When the math is written in the TeX
    notation, the user of a browser not implementing JavaScript, or
    having it disabled, sees the unprocessed TeX -- which can be as
    readable as the author manages to write it.

    While the use of "client-side" JavaScript is questionable at
    times, its /omnipresence/ can be regarded as an opportunity.
    Frankly, I don't seem to recall there ever been a development
    environment covering the computers ranging from something one
    can carry on one's palm, to desktops, to supercomputers^8.


    Notes

    ^1 Presumably to avoid possible clashes with the archived pages' own
    styling.

    ^2 Unless, of course, the newly introduced elements become so
    commonly used by other parties as to warrant inclusion into the
    whatever new HTML TR W3C decides to publish.

    ^3 Most commonly visual, but, while frequently overlooked, I'd like
    to note that CSS 2.1 offers properties to describe also the
    /aural/ presentation of the document -- think of the speech
    synthesers' users, for instance. I'm unaware of any similar
    facility for TeX-based publishing systems.

    ^4 http://circuits.im/ comes to mind.

    ^5 The isolation the JavaScript implementations offer is also
    stronger than, say, the one implemented in GhostScript (-dSAFER)
    for PostScript -- that happens to be one another common
    "document programming language".

    ^6 http://rawgit.com/essepuntato/rash/master/documentation/
    RASH: Research Articles in Simplified HTML

    ^7 http://mathjax.org/

    ^8 Disclaimer: I do not advocate in favor of portable computers in
    general, and even less so for any and all devices running
    non-free software, or implementing cellular network protocols.
    Also, I really hope that one wouldn't actually use JavaScript
    for any "number crunching", but will rely on something like C
    instead. That said, should I ever have to choose between
    JavaScript and, say, Python -- I'd go with JavaScript, sure.

    [...]

    --
    FSF associate member #7257 58F8 0F47 53F5 2EB2 F6A5 8916 3013 B6A0 230E 334A

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Marko Rauhamaa@21:1/5 to All on Fri Nov 4 22:24:18 2016
    XPost: comp.misc

    Ivan Shmakov <ivan@siamics.net>:

    Ultimately, however, I'd like to note that the flexibility of TeX
    comes from it being a full-weight programming language -- contrary to
    HTML and CSS, which are merely data languages.

    Then, it was already noted that the modern Web ecosystem employs both
    data languages, such as HTML and CSS (but also SVG, MathML, RDFa,
    various "microformats", etc.), -- and JavaScript for the programming language. And honestly, I'm not entirely sure that comparing a data
    language with a programming language quite makes sense. (So, if
    anything, shouldn't we rather be comparing TeX to JavaScript here?
    Instead of HTML and CSS, that is.)

    The point is, is there a reason to bother with "data languages" when
    what you really need is a programming language. Data is code, and code
    is data.

    Note, for example, how iptables are giving way to BPF, which in turn are finding more and expanded uses.

    Also, note how PostScript handles rendering beautifully in the printing
    world and how elisp is used to "configure" emacs.

    Hence, I claim that the power of TeX is also its weakness. Yes, one
    can implement a seemingly-declarative "markup" language in TeX (such
    as LaTeX), but will it be much different to implementing such a
    language in JavaScript?

    Well, no, it won't. That's the point. All you need is <div> and JS, but
    that's also the minimum you need.

    Yes, one can perform static analysis of a TeX document -- but will the results /always/ be more useful than performing that same static
    analysis on a pure JavaScript-based Web page^4?

    What do you need static analysis for? Ok, Google needs to analyze, but
    leave that to them.

    Point is, formal semantics needs a full-fledged programming language.
    And by semantics, I'm not referring (mostly) to the visual layout but to
    the structure and function of the parts of the document/web site.

    And no, one does not "process" a TeX document to get a PDF -- one has
    to "run" it instead.

    Way to go!

    "Here be the halting problem."

    Mostly, there will be problems with security and DoS, but I suppose
    those can be managed. JavaScript and PostScript (among others) have had
    to go through those stages.


    Marko

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Doc O'Leary@21:1/5 to Marko Rauhamaa on Sat Nov 5 17:25:26 2016
    XPost: comp.misc

    For your reference, records indicate that
    Marko Rauhamaa <marko@pacujo.net> wrote:

    The point is, is there a reason to bother with "data languages" when
    what you really need is a programming language. Data is code, and code
    is data.

    No. Things scale in natural ways. Everything is data *first*, and
    only then can we look at it and see if the data could be code, and
    then *how* the data and code can be best be presented together. When
    all you want to do is say “Hello, World!”, the more simply a system
    can do that, the better.

    Point is, formal semantics needs a full-fledged programming language.
    And by semantics, I'm not referring (mostly) to the visual layout but to
    the structure and function of the parts of the document/web site.

    It *may* require that for some uses, but it is a mistake to impose
    the most complex system possible at all scales. Too great a learning
    curve will drive new people away. Tools that are too specialized
    will get very few users.

    And also keep in mind that semantics *will* differ between the
    producer of a document and the consumer of the document. The more
    “code” you embed in a document, the more you force a particular
    meaning on its content. That is not always the right thing to do.

    --
    "Also . . . I can kill you with my brain."
    River Tam, Trash, Firefly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Ivan Shmakov@21:1/5 to All on Fri Nov 4 22:25:27 2016
    XPost: comp.misc

    Marko Rauhamaa <marko@pacujo.net> writes:
    Ivan Shmakov <ivan@siamics.net>:

    Ultimately, however, I'd like to note that the flexibility of TeX
    comes from it being a full-weight programming language -- contrary
    to HTML and CSS, which are merely data languages.

    Then, it was already noted that the modern Web ecosystem employs
    both data languages, such as HTML and CSS (but also SVG, MathML,
    RDFa, various "microformats", etc.), -- and JavaScript for the
    programming language. And honestly, I'm not entirely sure that
    comparing a data language with a programming language quite makes
    sense. (So, if anything, shouldn't we rather be comparing TeX to
    JavaScript here? Instead of HTML and CSS, that is.)

    The point is, is there a reason to bother with "data languages" when
    what you really need is a programming language. Data is code, and
    code is data.

    I beg to differ. Otherwise, I won't be using Lynx (which only
    supports rendering HTML, but not running JavaScript -- or any
    other programming language for that matter; lynxcgi:// aside)
    or that wonderful NoScript extension for Firefox.

    Note, for example, how iptables are giving way to BPF,

    Do they?

    which in turn are finding more and expanded uses.

    Also, note how PostScript handles rendering beautifully in the
    printing world

    I haven't used PostScript for years; I choose to rely on PDF.
    If anything, being a data language, it feels much "safer".

    and how elisp is used to "configure" emacs.

    Given that Emacs is first and foremost a programming language
    implementation, the "configuration" as a term feels nearly as
    applicable to it as it would be to GNU Libc. Otherwise, indeed,
    it only makes sense to customize a programming language using a
    program written in it.

    As an aside, will you fancy reading messages here on Usenet,
    that would happen to be actual code, instead of ASCII data?
    (And if so, what language would you prefer?)

    ... Not that it's something unheard of. Back in the BBS days,
    the demogroups and tech-savvy individuals were not necessarily
    satisfied with posting a plain text message, opting instead
    to wrap one into a binary for the platform of their choice.
    When run, the binary would render said message -- with music,
    video effects, and so on.

    Of course, that meant no luck for those who use different
    hardware, but then again: those would probably not be interested
    in the message in the first place.

    FTR, the relevant software were known as "noters". I was able
    to locate several mentions on the Web (say, ^1, ^2), but no
    "encyclopedic" description of the concept so far.

    Hence, I claim that the power of TeX is also its weakness. Yes, one
    can implement a seemingly-declarative "markup" language in TeX (such
    as LaTeX), but will it be much different to implementing such a
    language in JavaScript?

    Well, no, it won't. That's the point. All you need is <div> and JS,
    but that's also the minimum you need.

    That way, one's essentially using a format of one's own make,
    while also requiring that those who happen to be interested in
    the actual "payload" use the software also of one's own make.

    Well, thanks, but no thanks; when the site tells me "best viewed
    with our custom browser" (or "only viewed", for that matter),
    it's one big red "STOP" sign to me. I prefer to stick to the
    software of /my/ choice -- not theirs.

    Yes, one can perform static analysis of a TeX document -- but will
    the results /always/ be more useful than performing that same static
    analysis on a pure JavaScript-based Web page?

    What do you need static analysis for?

    Why, I may want to extract the table of contents of a document,
    or make a list of the titles of some or all my documents, etc.

    Ok, Google needs to analyze, but leave that to them.

    Indeed, Web search engines benefit the most from the use of data
    languages on the modern Web. But in fact, anyone can join.
    (But if you manage to convince Google to run the code you've put
    on your site in order to index it -- I'd be very much interested
    in the details.)

    Also to mention is that "data format conversion" becomes an
    ill-defined procedure once it's no longer "data" we speak of.

    Say, subject to formats' color depth limitations, it's always
    possible to convert a raster image from one lossless format to
    another (say, PNG to PNM to BMP to...) with no loss of data.

    Is it similarly possible to convert Forth into Perl? Or, more
    relevant to this discussion, LaTeX into JavaScript -- and back
    again?

    Point is, formal semantics needs a full-fledged programming language.
    And by semantics, I'm not referring (mostly) to the visual layout but
    to the structure and function of the parts of the document/web site.

    I do not see how a programming language could help /conveying/
    semantics -- as opposed to /implementing/ it.

    Suppose, for example, that we are to define the C 'printf'
    function semantics for a new C standard. What programming
    language do we use, and how do we do that?

    [...]

    "Here be the halting problem."

    Mostly, there will be problems with security and DoS, but I suppose
    those can be managed. JavaScript and PostScript (among others) have
    had to go through those stages.

    Off the top of my head, a bug in the Firefox implementation of
    JavaScript has lead to the fall of Silk Road (say, ^3).
    Also, it was shown that it's possible to exploit the
    "row hammer"^4 hardware vulnerability from JavaScript.

    ^1 http://commodorefree.com/magazine/vol2/issue20.htm

    ^2 https://duckduckgo.com/html/?q=demomaker+"noter"

    ^3 https://daniweb.com/hardware-and-software/networking/news/460484/

    ^4 https://en.wikipedia.org/wiki/Row_hammer

    --
    FSF associate member #7257 58F8 0F47 53F5 2EB2 F6A5 8916 3013 B6A0 230E 334A

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)