mutool for PDF image extraction

Discuss anything related to command line tools here.
Post Reply
Message
Author
User avatar
Cornflower
Posts: 201
Joined: Fri Aug 31, 2007 7:58 am
Location: Canada's capital

mutool for PDF image extraction

#1 Post by Cornflower » Thu Feb 18, 2021 6:20 am

Just a short note, but I revisited the sparse documentation for muPDF (GUI) and mutool (CLI).

mutool is a stand-alone cli with a LOT of options. just about the easiest is the extract (images and resources)

I pulled out all the images (including both larger and smaller versions where the PDF had both) and non-text pages very quickly with

Code: Select all

mutool extract filespec.pdf
In the current directory it created (at a rate of some 5-6 pages per second) image-xxxx.jpg, png, fonts, and pam (a 2D map format--I have no idea how to use).

I tested an OCR'd book (by PDF X-Change Editor) and it extracted the OCR'd pages anyway, as it did pre-OCR. On a document that did not need OCR to select text, it only picked out images and fonts.

Definitely a process I will use again.

User avatar
Midas
Posts: 5974
Joined: Mon Dec 07, 2009 7:09 am
Location: Sol3

Re: mutool for PDF image extraction

#2 Post by Midas » Thu Feb 18, 2021 6:48 am

Great info, Cornflower. Thanks. 8)

Just to help newcomers, mutool comes in the same package as MuPDF, downloadable from https://mupdf.com/downloads/ (or see viewtopic.php?t=8455).

Latest release is also the same for both: v1.18.0.

Post Reply