Image object not recognized #4578

flange-ipb · 2025-06-27T11:03:15Z

flange-ipb
Jun 27, 2025

Description of the bug

I'm extracting images from scientific papers. For this PDF I'm having troubles to extract Fig. 3 on page 10 - this image object is not included in Page.get_images().

I have the same issue in pypdf, see pypdf#3335.

How to reproduce the bug

Run the extract-images scripts on the given PDF file.

PyMuPDF version

1.26.1

Operating system

Linux

Python version

3.13

Answered by JorjMcKie

Jun 27, 2025

This not a bug. There is exactly one image on the page (above "Fig. 2"), which is correctly recognized.
You probably believe that "Fig. 3" also is an image - which is not: it is a vector graphic.

View full answer

JorjMcKie · 2025-06-27T11:32:04Z

JorjMcKie
Jun 27, 2025
Maintainer

This not a bug. There is exactly one image on the page (above "Fig. 2"), which is correctly recognized.
You probably believe that "Fig. 3" also is an image - which is not: it is a vector graphic.

0 replies

JorjMcKie · 2025-06-27T12:17:22Z

JorjMcKie
Jun 27, 2025
Maintainer

BTW, what is your underlying problem? Want to extract what looks like an image, no matter what it technically is?

1 reply

flange-ipb Jul 1, 2025
Author

Nice, thanks for the quick analysis.
What I'm designing right now is an agentic AI application that extracts structured data from such publications. One component is to pass images to an optical chemical structure recognition (OCSR) - a task too specific to be handled by multimodal or vision models.
Meanwhile I've also played with pymupdf4llm's to_markdown() function and I'm very happy with the results, in particular it places the image next to the caption which cannot be achieved with Page.get_images() so easily.

JorjMcKie · 2025-06-27T12:35:29Z

JorjMcKie
Jun 27, 2025
Maintainer

In contrast to many other packages, PyMuPDF can

Extract vector graphics
Cluster single vectors which are geometrically connected
Render parts ("clips") of document pages to images of the desired format

Taking together the above, you can find the rectangle that covers Fig. 3, and make a picture / image of this area.

1 reply

JorjMcKie Jun 27, 2025
Maintainer

Let me invite you to our new Forum, where you can discuss with others (and PyMuPDF's maintainer team) under category "How To".
https://forum.pymupdf.com/c/how-to/6

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Image object not recognized #4578

Uh oh!

{{title}}

Uh oh!

Replies: 3 comments 2 replies

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Uh oh!

{{title}}

Uh oh!

Select a reply

Uh oh!

Image object not recognized #4578

Uh oh!

flange-ipb Jun 27, 2025

Description of the bug

How to reproduce the bug

PyMuPDF version

Operating system

Python version

Replies: 3 comments · 2 replies

Uh oh!

JorjMcKie Jun 27, 2025 Maintainer

Uh oh!

JorjMcKie Jun 27, 2025 Maintainer

Uh oh!

flange-ipb Jul 1, 2025 Author

Uh oh!

JorjMcKie Jun 27, 2025 Maintainer

Uh oh!

JorjMcKie Jun 27, 2025 Maintainer

flange-ipb
Jun 27, 2025

Replies: 3 comments 2 replies

JorjMcKie
Jun 27, 2025
Maintainer

JorjMcKie
Jun 27, 2025
Maintainer

flange-ipb Jul 1, 2025
Author

JorjMcKie
Jun 27, 2025
Maintainer

JorjMcKie Jun 27, 2025
Maintainer