Problems with PDF

Any other tech-related topics
Post Reply
Message
Author
User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Problems with PDF

#1 Post by webfork »

While I've become very good at dealing with difficult PDF files over the years, this offers a very good breakdown on why the original design created so many problems:

What's so hard about PDF text extraction?https://filingdb.com/b/pdf-text-extraction (from: https://news.ycombinator.com/item?id=22473263)

In addition to notes about invisible text, added spaces for text separation, and off-page text (because, you know, why not), the author points out that the format was always intended as an OUTPUT format, never an input.

Setting aside the issues mentioned above along with a host of security issues, poor support for mobile users, compression, and compatibility, it's frustrating to say that in my world PDF is still the source. I still look for information on company intranets in PDF format first, because it tends to be more polished and not vague or in-progress junkfiles. It is still my destination format even (as the article points out) it's far from ideal.

Some suggested freeware workarounds for problems the article details:

User avatar
vevy
Posts: 795
Joined: Tue Sep 10, 2019 11:17 am

Re: Problems with PDF

#2 Post by vevy »

Did you try Xpdf tools?

Code: Select all

pdftotext.exe -table input.pdf output.txt			//output name is optional
pdftotext.exe -layout input.pdf
or

Code: Select all

pdftohtml.exe input.pdf OutputDir

User avatar
webfork
Posts: 10818
Joined: Wed Apr 11, 2007 8:06 pm
Location: US, Texas
Contact:

Re: Problems with PDF

#3 Post by webfork »

vevy wrote: Sun Sep 27, 2020 2:57 pm Did you try Xpdf tools?
They've been discussed in part on the site and yes, I was very impressed.

---

Some related threads

How to pull text out of a PDF and have it wrap correctly
viewtopic.php?p=91372#p91372

Combine batch of images into a PDF
viewtopic.php?p=97239#p97239

User avatar
vevy
Posts: 795
Joined: Tue Sep 10, 2019 11:17 am

Re: Problems with PDF

#4 Post by vevy »

webfork wrote: Sun Sep 27, 2020 10:44 pm
vevy wrote: Sun Sep 27, 2020 2:57 pm Did you try Xpdf tools?
They've been discussed in part on the site and yes, I was very impressed.
These are different tools than the one in the link. And still maintained. :wink: YMMV with complex pdf though.
Although, as lintalist noted here (from your second link above), they are related.

Post Reply