Forum: >>> Magnum BBS <<<

Copying text from n2479.pdf

From Keith Thompson@21:1/5 to All on Fri Sep 25 11:13:20 2020

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2479.pdf is a recent
draft of C20.

When I copy text from n2479.pdf, I get things like this:

The
:
strdup function
::::::
creates
::
a
:::: copy::: of::: the:::::: string::::::: pointed:: to::: by:: s:: in:: a ::::: space::::::::: allocated :: as:: if::: by :a:::: call
::
to
:::::::
malloc.
:

(It varies slightly depending on which PDF viewer I use.)

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Pankaj Jangid@21:1/5 to Keith Thompson on Sat Sep 26 08:48:54 2020

On Fri, Sep 25 2020, Keith Thompson wrote:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2479.pdf is a recent
draft of C20.

When I copy text from n2479.pdf, I get things like this:

The
:
strdup function
::::::
creates
::
a
:::: copy::: of::: the:::::: string::::::: pointed:: to::: by:: s::
in:: a ::::: space::::::::: allocated :: as:: if::: by :a:::: call
::
to
:::::::
malloc.
:

It is because of those wavy underlines.

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Keith Thompson@21:1/5 to Pankaj Jangid on Fri Sep 25 23:05:57 2020

Pankaj Jangid <pankaj.jangid@gmail.com> writes:

On Fri, Sep 25 2020, Keith Thompson wrote:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2479.pdf is a recent
draft of C20.

When I copy text from n2479.pdf, I get things like this:

The
:
strdup function
::::::
creates
::
a
:::: copy::: of::: the:::::: string::::::: pointed:: to::: by:: s::
in:: a ::::: space::::::::: allocated :: as:: if::: by :a:::: call
::
to
:::::::
malloc.
:

It is because of those wavy underlines.

Yes, that explains it, thanks. So I can copy-and-paste from N2478,
which doesn't have the wavy wavy underlining:

The strndup function creates a string initialized with no more than
size initial characters of the array pointed to by s and up to the
first null character, whichever comes first, in a space allocated as
if by a call to malloc .

--
Keith Thompson (The_Other_Keith) Keith.S.Thompson+u@gmail.com
Working, but not speaking, for Philips Healthcare
void Void(void) { Void(); } /* The recursive call of the void */

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

From Paul@21:1/5 to Keith Thompson on Sat Sep 26 07:51:52 2020

Keith Thompson wrote:

http://www.open-std.org/jtc1/sc22/wg14/www/docs/n2479.pdf is a recent
draft of C20.

When I copy text from n2479.pdf, I get things like this:

The
:
strdup function
::::::
creates
::
a
:::: copy::: of::: the:::::: string::::::: pointed:: to::: by:: s:: in:: a ::::: space::::::::: allocated :: as:: if::: by :a:::: call
::
to
:::::::
malloc.
:

(It varies slightly depending on which PDF viewer I use.)

PDF files can be read into Office Word, but this only works
when the author has generated a dual-representation type of
PDF which holds info Office can use.

LibreOffice Draw can read in PDF, but not likely with any
good purpose in mind. Don't try it on this document!!!
Use it on a single page test PDF just to see how it works.

So far, nothing I have handy here, looks immediately useful
in the "pure GUI power tool" department.

*******

I tried this.

mutool convert -F pdf -O decompress,clean -o n2479_out.pdf n2479.pdf # a mess

The underline effect seems to be a font with a single character (sinewave)
in it. In the document, where it underlines the word "underlining", the
stanza looks like... ten sinewaves underneath an eleven character word.

/F3 5.9776 Tf
1 0 0 1 230.857 349.568 Tm
[<0001000100010001000100010001000100010001>] TJ

If converted to Postscript, the underline method looks like this.

.895628 .7673 0 0 cmyk
VWZQUL+LASY6*1 [5.9776 0 0 -5.9776 0 0 ]msf
320.52 467.331 mo
(::::::)
[4.98111 4.98114 4.98111 4.98111 4.98114 0 ]xsh

Neither method was of sufficient quality to be part of a workflow.
The document does not convert cleanly enough for this.

*******

Converted to HTML, there were no complaints about font conversion.
Loading the HTML into a browser sorta works OK. The above spaghetti
shows what the HTML section with the "underlining" text looks like.
The color is blue #0000ff.

mutool convert -F html -o n2479.html n2479.pdf



text that has been deleted and


 <=== to be
:::::::::: <=== removed



underlining


text that has been added. Pages that contain changes


This removed some of them, until I found a ">:: ::: :<" one.
The second expression may have got rid of more of them. What I'm
doing, is just removing the strings of colons and replacing
them with a blank >< pair, an empty text string. Rather than
edit the whole string in front of it.

sed 's/>:*</></g' n2479.html > n2479sed.html

sed 's/>[: ]*</></g' n2479.html > n2479sed.html

That's as far as I got.

Still no good HTML to text function has shown up.
I'd like to preserve some of the positioning so the
file is human-readable.

The colored text still has to be corrected. The HTML version
did not preserve the strikethru effect, and if the file is
converted to text, both old and new strings will be
included. And not all red text is strikethru text, so
finding red coloring and removing strings likely won't
work right either.

You can copy/paste out of Firefox after using

firefox n2479sed.html

That should be workable for small samples.

Paul

--- SoupGate-Win32 v1.05
* Origin: fsxNet Usenet Gateway (21:1/5)

Who's Online
Recent Visitors
- Bob Worm
  Fri Apr 26 15:47:21 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 26 10:09:36 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 26 08:24:20 2024
  from Wales, Uk via Telnet
- Bob Worm
  Fri Apr 26 06:40:30 2024
  from Wales, Uk via Telnet

System Info

Sysop:	Keyop
Location:	Huddersfield, West Yorkshire, UK
Users:	296
Nodes:	16 (2 / 14)
Uptime:	84:19:06
Calls:	6,658
Calls today:	4
Files:	12,203
Messages:	5,333,601
Posted today:	1

Copying text from n2479.pdf

Who's Online

Recent Visitors

System Info