I wrote a shell scraper of a news website and one of the options is to
keep re-accessing the initial webpage (while loop) at regular
intervals, grab all news links and scrape text of the ones which are
new. This option keeps the script running for days straight. When
stdout is redirected to a *file*, it works as expected.
Instead if we pipe output to `less', there is a buffering stand. When
`less' buffering is full at about 8-64KB, the whole while loop in the
script hangs and only continues to run when we scroll down `less'
display buffer. But if it hangs for a few hours, it means the scraping
tool will only resume scraping some hours later, too, failing to
scrape a lot of news from the initial webpage during that time.
frogger <somebody@invalid.com> writes:
I wrote a shell scraper of a news website and one of the options is to
keep re-accessing the initial webpage (while loop) at regular
intervals, grab all news links and scrape text of the ones which are
new. This option keeps the script running for days straight. When
stdout is redirected to a *file*, it works as expected.
Instead if we pipe output to `less', there is a buffering stand. When
`less' buffering is full at about 8-64KB, the whole while loop in the
script hangs and only continues to run when we scroll down `less'
display buffer. But if it hangs for a few hours, it means the scraping
tool will only resume scraping some hours later, too, failing to
scrape a lot of news from the initial webpage during that time.
I'm not 100% sure what you want, but does:
$ scraper >file & less file
and then using the F command do something like you want?
Sysop: | Keyop |
---|---|
Location: | Huddersfield, West Yorkshire, UK |
Users: | 422 |
Nodes: | 16 (3 / 13) |
Uptime: | 196:20:34 |
Calls: | 8,951 |
Calls today: | 2 |
Files: | 13,352 |
Messages: | 5,992,474 |