Active TopicsActive Topics  Display List of Forum MembersMemberlist  Search The ForumSearch  HelpHelp
  RegisterRegister  LoginLogin
Using PDF reDirect
 EXP Systems Forum : PDF reDirect : Using PDF reDirect
Topic: Text on some PDFs is not highlightable Post Reply Post New Topic
Author Message
sludge7051-x
Newbie
Newbie
Avatar

Joined: 13 Mar 14
Location: United States
Posts: 9
Quote sludge7051-x Replybullet Topic: Text on some PDFs is not highlightable
    Posted: 17 Mar 14 at 1:30PM
I'm on Win 8, using the latest version of Firefox 27.0.1

I have been using PDF reDirect v2.5.2, to print web pages to PDF.

I later like to use Natural Reader to read the PDF. This is a text-to-speech program . . . http://www.naturalreaders.com/download.php

The problem I have is that the text is not usable all the time, when I print through Firefox (it is through IE 10 though).

When not usable, it's one of two things: 1.) dragging with the mouse, you can see the selection rectangle trying to lasso the text, but it fails, and highlights nothing . . . 2.) you can highlight the text, but there is weird spacing in between the letters and words, and when Natural Reader tries to read, it is garbled.

I get a good PDF with Firefox on some pages, like this page, the Lincoln Trillion bill . . . http://store.livingwaters.com/index.php?page=shop.product_details&flypage=flypage.tpl&product_id=477&category_id=8&option=com_virtuemart&Itemid=199&lang=en

But not on this page, How to Create a Macro in Excel 2010 . . . http://www.ehow.com/how_8037720_create-macro-excel-2010.html

I tried this Add-on: Print pages to Pdf 0.1.9.3 . . . and it was not able to make a good PDF of the Create a Macro page, either.

I went into Firefox's about:config . . . and disabled javascript.enabled . . . but that didn't do anything. This is what to do if you go to a page like Snopes, and Natural Reader cannot read text on the web page.  It's protected from scraping by javascript.

I think there's something up with the web page coding, but I don't know what, or why Firefox can't deal with it.

If I print the Create a Macro page through IE 10, though, it will make a good pdf. IE always makes a good PDF.  I was previously always using IE to make PDFs because . . .

I tried using Firefox to make PDFs a year or so ago, with the above results (but never got a good pdf). I thought something might have changed by now, so I tried it. It looked like it was fixed, but then I got a bad pdf of the Create a Macro page.  Any idea what's going on?  TY
IP IP Logged
yorkshire_lad
Newbie
Newbie


Joined: 29 Jan 10
Posts: 29
Quote yorkshire_lad Replybullet Posted: 18 Mar 14 at 4:45AM
I'm sure Michael will be along shortly to comment. However (as an ordinary user), I have experienced similar behaviour with different browsers.  IE usually produces pdfs that contain text (that can be manipulated as text) whereas Firefox sometimes seems to print as a graphic.  So it's down to the way the browser prints, not PDF reDirect.  I've never found a solution, and if I really want to create a pdf as text, I'll use IE (I mostly use FF).

IP IP Logged
sludge7051-x
Newbie
Newbie
Avatar

Joined: 13 Mar 14
Location: United States
Posts: 9
Quote sludge7051-x Replybullet Posted: 18 Mar 14 at 7:48AM
Hello,

Yes, Firefox seems to print some as a graphic . . . I wonder what it's doing when it prints with weird spacing?  E-Bay statement come out like that.  In that case, I notice they use a tiny font, seems like it has to do with if it's using proportional spacing or not.

It seems like EXP must use something related to printing, that is within each browser, and that's why results vary.  I wonder what it is.

I have this post in to Firefox also.  They're thinking to just use some of the current print-to-PDF Add-ons.  I tried a couple, with no change:

Print to PDF - Text in the PDF is not always good for Text-to-Speech | Firefox Support Forum . . . https://support.mozilla.org/en-US/questions/989439
IP IP Logged
sludge7051-x
Newbie
Newbie
Avatar

Joined: 13 Mar 14
Location: United States
Posts: 9
Quote sludge7051-x Replybullet Posted: 18 Mar 14 at 10:23AM

CleanPrint makes a good PDF of the "Create a Macro" page:

http://www.formatdynamics.com/bookmarklets/

I think it goes to a server to do it, though, so it's not going through Firefox?

IP IP Logged
Michel_K17
Moderator Group
Moderator Group
Avatar
Forum Administrator

Joined: 25 Jan 03
Posts: 1673
Quote Michel_K17 Replybullet Posted: 18 Mar 14 at 9:53PM
Hi there,

Unfortunately, it is not PDF redirect. All that PDF redirect does is create a PDF file of the printout as specified by the browser.

If browser X says to print it one way or another, then PDF redirect will follow the instructions.

Cheers!

Michel
Michel Korwin-Szymanowski
EXP Systems LLC
IP IP Logged
sludge7051-x
Newbie
Newbie
Avatar

Joined: 13 Mar 14
Location: United States
Posts: 9
Quote sludge7051-x Replybullet Posted: 19 Mar 14 at 7:48AM
I think I figured out what's going on with EXP, and other print-to-PDF programs, where the text is weird . . . it's javascript

* * * * * * * * * * * * * * * * * * * * * * * *

Using Firefox . . . NoScript is installed / Allow Scripts Globally (dangerous)

* * * * * * * * * * * * * * * * * * * * * * * *

Go to:
http://nypost.com/2012/08/18/feds-move-to-strike-lewd-details-from-homeland-security-sexual-discrimination-lawsuit/

The text on this page is highlightable, and text-to-speech can read it.
If you make a PDF of it with EXP, though, you can't highlight any of the text.  It's like it's an image.

* * * * * * * * * * * * * * * * * * * * * * * *

If I then do this:
NoScript installed / Forbid Scripts Globally . . . now you get a good PDF

* * * * * * * * * * * * * * * * * * * * * * * *

View Page Info / there are many occurrences of the word "javascript"

. . .

NoScript / Allow Scripts Globally (dangerous) . . . and now you're back to getting a PDF where you can't even lasso the the text

* * * * * * * * * * * * * * * * * * * * * * * *

about:config / javascript.enabled . . . double-click to make it false
go back to that page, and refresh . . . and now you can get a good PDF

* * * * * * * * * * * * * * * * * * * * * * * *

It looks like this is due to javascript protecting the text.

Is there a way for EXP to automatically disable javascript before it prints a PDF?  IE must be doing this itself.

* * * * * * * * * * * * * * * * * * * * * * * *

Another example, but a little different . . .

Make sure this is back to enabled . . .
about:config / javascript.enabled . . . double-click to make it "true"

On this page:
http://www.enviroreporter.com/2014/03/china-syndrome-town/

NoScript installed / Allow Scripts Globally (dangerous) . . . and you can't even highlight any of the text . . . but the PDF comes out fine

NoScript installed / Forbid Scripts Globally . . . now the text on the web page is highlightable, and can be read by text-to-speech
IP IP Logged
yorkshire_lad
Newbie
Newbie


Joined: 29 Jan 10
Posts: 29
Quote yorkshire_lad Replybullet Posted: 19 Mar 14 at 11:31AM
I like that tip: I may try it when I next need to avoid a graphic print. TYVM for posting the feedback.
IP IP Logged
sludge7051-x
Newbie
Newbie
Avatar

Joined: 13 Mar 14
Location: United States
Posts: 9
Quote sludge7051-x Replybullet Posted: 19 Mar 14 at 2:44PM
np
IP IP Logged
Post Reply Post New Topic
Printable version Printable version

Forum Jump
You cannot post new topics in this forum
You cannot reply to topics in this forum
You cannot delete your posts in this forum
You cannot edit your posts in this forum
You cannot create polls in this forum
You cannot vote in polls in this forum