Forum: >>> Magnum BBS <<<

ATTN: GAWK developers. I need help with writing an input filter extensi

From Kenny McCormack@21:1/5 to All on Wed Jul 28 13:21:06 2021

First, note that I have already written one. It provides "readline"-like capability to GAWK. It uses a package which is similar to, but different
from, "readline", so that you have a scrollback buffer when you are
entering lines at the terminal in GAWK. I wrote is several years ago and
use it extensively. So far, so good.

Basically, what that extension does is, when called, it calls the "getline" function in the other package, then copies the line read from the buffer of
the "getline" function into the buffer provided by GAWK. GAWK then picks
it up and everything works as expected.

But here's the thing. I want to write one now that will read the line
normally and then do something to the line before returning it to GAWK.
What I don't know how to do is to call GAWK's normal "getline" function
from my extension library. So, what I am thinking of is something like:

/* In my extension code; note that "fd" is passed in as a parameter */
normal_gawk_input(fd,buff);
/* Now examine (and possibly change) buff */
...
/* And return to GAWK */
return awk_true;

Some notes:

0) My target is Linux. Don't care about any other OS or any other
"portability" or "standards" considerations.
1) One of the sample extensions, readfile, looks like it does something
similar to what I want. But it includes a function called
read_file_to_buffer(), that looks more than a little above my pay grade.
It seems like you shouldn't have to do that. I'd rather call
whatever code GAWK already uses to read the line.
2) I thought about using the Linux function getline(3). That would
work, except for one little problem. The problem is that getline
wants a FILE * object, but GAWK deals in "fd"s. You could use
fdopen(3) to convert, but that seems messy. It seems wasteful to
call fdopen() every time the input filter function is called, but I
don't see any entirely safe way to avoid doing that. It would be
nice if there was "fd" version of getline(), but I don't know of
anything like that. (see footnote below at (*))
3) Alternatively, if there was some way to have GAWK read the input
line "normally" and then call my function before continuing (i.e.,
have the extension function be able to examine the line already
read), then that'd be good. But I don't think there is any
capability for that in GAWK, as of the current writing.

Finally, another question about these "input filter" functions in general.
The discussion so far has always been in terms of lines - i.e, the usual line-oriented input model. What happens if RS is set to something other
than the default? Is the input filter function supposed to deal with that itself or does GAWK provide some kind of handling?

(*) Part of the problem is that it seems clear to me that fdopen(3)
allocates memory (presumably, using malloc() or similar) under the covers
for the FILE * object that it creates. There doesn't seem to be any clean
way to free() that allocated memory.

--
The randomly chosen signature file that would have appeared here is more than 4 lines long. As such, it violates one or more Usenet RFCs. In order to remain in compliance with said RFCs, the actual sig can be found at the following URL:
http://user.xmission.com/~gazelle/Sigs/DanaC

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Spiros Bousbouras@21:1/5 to Kenny McCormack on Wed Jul 28 17:43:27 2021

On Wed, 28 Jul 2021 13:21:06 -0000 (UTC)
gazelle@shell.xmission.com (Kenny McCormack) wrote:

First, note that I have already written one. It provides "readline"-like capability to GAWK. It uses a package which is similar to, but different from, "readline", so that you have a scrollback buffer when you are
entering lines at the terminal in GAWK. I wrote is several years ago and
use it extensively. So far, so good.

Basically, what that extension does is, when called, it calls the "getline" function in the other package, then copies the line read from the buffer of the "getline" function into the buffer provided by GAWK. GAWK then picks
it up and everything works as expected.

But here's the thing. I want to write one now that will read the line normally and then do something to the line before returning it to GAWK.
What I don't know how to do is to call GAWK's normal "getline" function
from my extension library. So, what I am thinking of is something like:

/* In my extension code; note that "fd" is passed in as a parameter */
normal_gawk_input(fd,buff);
/* Now examine (and possibly change) buff */
...
/* And return to GAWK */
return awk_true;

Some notes:

[...]

2) I thought about using the Linux function getline(3). That would
work, except for one little problem. The problem is that getline
wants a FILE * object, but GAWK deals in "fd"s. You could use
fdopen(3) to convert, but that seems messy. It seems wasteful to
call fdopen() every time the input filter function is called, but I
don't see any entirely safe way to avoid doing that. It would be
nice if there was "fd" version of getline(), but I don't know of
anything like that. (see footnote below at (*))

Isn't it trivial to write your own getline() with the interface you want ?

[...]

(*) Part of the problem is that it seems clear to me that fdopen(3)
allocates memory (presumably, using malloc() or similar) under the covers
for the FILE * object that it creates. There doesn't seem to be any clean way to free() that allocated memory.

I don't know what your overall set up is and what function calls what when
but you can specify your own buffer using setvbuf() .This way you can free
it whenever you want.

--
vlaho.ninja/prog

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Bruce Horrocks@21:1/5 to Kenny McCormack on Wed Jul 28 23:43:25 2021

On 28/07/2021 14:21, Kenny McCormack wrote:

2) I thought about using the Linux function getline(3). That would
work, except for one little problem. The problem is that getline
wants a FILE * object, but GAWK deals in "fd"s. You could use
fdopen(3) to convert, but that seems messy. It seems wasteful to
call fdopen() every time the input filter function is called, but I
don't see any entirely safe way to avoid doing that. It would be
nice if there was "fd" version of getline(), but I don't know of
anything like that. (see footnote below at (*))

You don't need to call fdopen() every time, if I understand this page correctly: <https://www.gnu.org/software/gawk/manual/html_node/Input-Parsers.html>

I think you need only call it when your XXX_can_take_file() function is
invoked and save the obtained FILE value in a global static.

So that's once per file not once per record.

--
Bruce Horrocks
Surrey, England

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Kenny McCormack@21:1/5 to 07.013@scorecrow.com on Thu Jul 29 00:53:11 2021

In article <0da59d24-2344-71d2-ba62-a548e64c0f7c@scorecrow.com>,
Bruce Horrocks <07.013@scorecrow.com> wrote:

On 28/07/2021 14:21, Kenny McCormack wrote:

2) I thought about using the Linux function getline(3). That would
work, except for one little problem. The problem is that getline
wants a FILE * object, but GAWK deals in "fd"s. You could use
fdopen(3) to convert, but that seems messy. It seems wasteful to
call fdopen() every time the input filter function is called, but I
don't see any entirely safe way to avoid doing that. It would be
nice if there was "fd" version of getline(), but I don't know of
anything like that. (see footnote below at (*))

You don't need to call fdopen() every time, if I understand this page >correctly: ><https://www.gnu.org/software/gawk/manual/html_node/Input-Parsers.html>

I think you need only call it when your XXX_can_take_file() function is >invoked and save the obtained FILE value in a global static.

So that's once per file not once per record.

Thank you. That makes a lot of sense.

Now, as it happens, it turns out I made a boo-boo here. My underlying assumption about what was needed to be implemented was all wrong. Upon
digging into things a bit deeper, I realized that the function you write as
an Input Parser is not a replacement for some line-oriented function like getline(3), but is, rather, supposed to be a "drop-in" for read(2). Note
that the default value for iobuf -> read_func is "read". This is the thing that you change to point to your new function.

That's why the new function that you are to define is declared as:

static ssize_t XXX_read(int fd, void *buf, size_t nbytes);

which is the same signature as read(2).

Once I realized this, everything became quite clear.
It also, incidentally, answers my question about RS.

Anyway, I was able to quickly write the new Input Parser that I had planned.
I will be posting a summary of that new functionality soon.

--
Politics is show business for ugly people.

Sports is politics for stupid people.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Briels
  Tue Apr 23 20:54:03 2024
  from Uk via SSH
- Cronus
  Tue Apr 23 19:46:51 2024
  from Provo, Ut via SSH
- Keyop
  Tue Apr 23 19:40:37 2024
  from Huddersfield, West Yorkshire via SSH
- Guest
  Wed Apr 24 01:40:10 2024
  from A via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	21:29:51
Calls:	6,646
Calls today:	1
Files:	12,190
Messages:	5,327,494

ATTN: GAWK developers. I need help with writing an input filter extensi

Who's Online

Recent Visitors

System Info