Forum: >>> Magnum BBS <<<

Figuring out grammars from examples

From John R Levine@21:1/5 to All on Fri Apr 12 18:39:51 2024

There's been a surprising amount of work on taking language strings and
feeding them to a box that infers a grammar for them. One of the harder
bits is figuring out nesting constructs, which in the past has often
required hints.

In this paper, a Visibly Pushdown Grammar is large but more tractable
subset of context free grammars.

V-Star: Learning Visibly Pushdown Grammars from Program Inputs
Xiaodong Jia, Gang Tan

Accurate description of program inputs remains a critical challenge in the field of programming languages. Active learning, as a well-established
field, achieves exact learning for regular languages. We offer an
innovative grammar inference tool, V-Star, based on the active learning of visibly pushdown automata. V-Star deduces nesting structures of program
input languages from sample inputs, employing a novel inference mechanism
based on nested patterns. This mechanism identifies token boundaries and converts languages such as XML documents into VPLs. We then adapted
Angluin's L-Star, an exact learning algorithm, for VPA learning, which
improves the precision of our tool. Our evaluation demonstrates that
V-Star effectively and efficiently learns a variety of practical grammars, including S-Expressions, JSON, and XML, and outperforms other
state-of-the-art tools.

https://arxiv.org/abs/2404.04201v1

Regards,
John Levine, johnl@taugh.com, Taughannock Networks, Trumansburg NY
Please consider the environment before reading this e-mail. https://jl.ly

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Derek@21:1/5 to All on Sat Apr 13 12:06:49 2024

John,

including S-Expressions, JSON, and XML, and outperforms other state-of-the-art tools.

They did not compare their tool against LLMs, which these days
outperform non-humans on language related tasks.
[I would like to see some actual data. In my experience, LLMs are
impressive, confident, and frequently wrong. -John]

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Derek@21:1/5 to All on Mon Apr 15 02:17:04 2024

John,

[I would like to see some actual data. In my experience, LLMs are
impressive, confident, and frequently wrong. -John]

LLM's performance on fact recall is poor.

It seems to be much better than other tools when dealing with
grammars. I could not find the example I was looking for, but here are
two others: https://arxiv.org/abs/2305.19234 https://szopa.medium.com/teaching-chatgpt-to-speak-my-sons-invented-language-9d109c0a0f05

My own experience using local (i.e., very small)
models https://shape-of-code.com/2024/02/25/extracting-named-entities-from-a-change-log-using-an-llm/

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	300
Nodes:	16 (2 / 14)
Uptime:	79:27:32
Calls:	6,716
Files:	12,247
Messages:	5,357,917

Figuring out grammars from examples

Who's Online

System Info