mutool is a stand-alone cli with a LOT of options. just about the easiest is the extract (images and resources)
I pulled out all the images (including both larger and smaller versions where the PDF had both) and non-text pages very quickly with
Code: Select all
mutool extract filespec.pdf
I tested an OCR'd book (by PDF X-Change Editor) and it extracted the OCR'd pages anyway, as it did pre-OCR. On a document that did not need OCR to select text, it only picked out images and fonts.
Definitely a process I will use again.