Page 1 of 1

mutool for PDF image extraction

Posted: Thu Feb 18, 2021 6:20 am
by Cornflower
Just a short note, but I revisited the sparse documentation for muPDF (GUI) and mutool (CLI).

mutool is a stand-alone cli with a LOT of options. just about the easiest is the extract (images and resources)

I pulled out all the images (including both larger and smaller versions where the PDF had both) and non-text pages very quickly with

Code: Select all

mutool extract filespec.pdf
In the current directory it created (at a rate of some 5-6 pages per second) image-xxxx.jpg, png, fonts, and pam (a 2D map format--I have no idea how to use).

I tested an OCR'd book (by PDF X-Change Editor) and it extracted the OCR'd pages anyway, as it did pre-OCR. On a document that did not need OCR to select text, it only picked out images and fonts.

Definitely a process I will use again.

Re: mutool for PDF image extraction

Posted: Thu Feb 18, 2021 6:48 am
by Midas
Great info, Cornflower. Thanks. 8)

Just to help newcomers, mutool comes in the same package as MuPDF, downloadable from https://mupdf.com/downloads/ (or see viewtopic.php?t=8455).

Latest release is also the same for both: v1.18.0.