• Figuring out grammars from examples

    From John R Levine@21:1/5 to All on Fri Apr 12 18:39:51 2024
    There's been a surprising amount of work on taking language strings and
    feeding them to a box that infers a grammar for them. One of the harder
    bits is figuring out nesting constructs, which in the past has often
    required hints.

    In this paper, a Visibly Pushdown Grammar is large but more tractable
    subset of context free grammars.

    V-Star: Learning Visibly Pushdown Grammars from Program Inputs
    Xiaodong Jia, Gang Tan

    Accurate description of program inputs remains a critical challenge in the field of programming languages. Active learning, as a well-established
    field, achieves exact learning for regular languages. We offer an
    innovative grammar inference tool, V-Star, based on the active learning of visibly pushdown automata. V-Star deduces nesting structures of program
    input languages from sample inputs, employing a novel inference mechanism
    based on nested patterns. This mechanism identifies token boundaries and converts languages such as XML documents into VPLs. We then adapted
    Angluin's L-Star, an exact learning algorithm, for VPA learning, which
    improves the precision of our tool. Our evaluation demonstrates that
    V-Star effectively and efficiently learns a variety of practical grammars, including S-Expressions, JSON, and XML, and outperforms other
    state-of-the-art tools.

    https://arxiv.org/abs/2404.04201v1

    Regards,
    John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
    Please consider the environment before reading this e-mail. https://jl.ly

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Derek@21:1/5 to All on Sat Apr 13 12:06:49 2024
    John,

    including S-Expressions, JSON, and XML, and outperforms other state-of-the-art tools.

    They did not compare their tool against LLMs, which these days
    outperform non-humans on language related tasks.
    [I would like to see some actual data. In my experience, LLMs are
    impressive, confident, and frequently wrong. -John]

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)
  • From Derek@21:1/5 to All on Mon Apr 15 02:17:04 2024
    John,

    [I would like to see some actual data. In my experience, LLMs are
    impressive, confident, and frequently wrong. -John]

    LLM's performance on fact recall is poor.

    It seems to be much better than other tools when dealing with
    grammars. I could not find the example I was looking for, but here are
    two others: https://arxiv.org/abs/2305.19234 https://szopa.medium.com/teaching-chatgpt-to-speak-my-sons-invented-language-9d109c0a0f05

    My own experience using local (i.e., very small)
    models https://shape-of-code.com/2024/02/25/extracting-named-entities-from-a-change-log-using-an-llm/

    --- SoupGate-Win32 v1.05
    * Origin: fsxNet Usenet Gateway (21:1/5)