-
Notifications
You must be signed in to change notification settings - Fork 677
Closed
Labels
Fixed in next releasefix developedrelease schedule to be determinedrelease schedule to be determined
Description
Description of the bug
I have a PDF with CMYK colorspace images. I want to convert the raw image bytes (e.g. from extract_image or get_text(dict)) to an RGB image.
For images with decode filter = 'DCTDecode', the colorspace conversion does not appear to work when given raw images bytes. If the Pixmap is loaded using xref directly, it works.
The document images look like this:
>>> doc.get_page_images(0)
[(44, 0, 1350, 1125, 8, 'DeviceCMYK', '', 'X10', 'DCTDecode'), (46, 45, 1221, 1357, 8, 'DeviceCMYK', '', 'X11', 'FlateDecode'), (52, 51, 500, 500, 8, 'DeviceCMYK', '', 'X7', 'FlateDecode'), (53, 0, 1650, 1275, 8, 'DeviceCMYK', '', 'X9', 'FlateDecode'), (48, 0, 1024, 683, 8, 'DeviceCMYK', '', 'X4', 'FlateDecode')]
>>> doc.get_page_images(1)
[(7, 0, 2848, 4288, 8, 'DeviceCMYK', '', 'X15', 'DCTDecode')]See sample code below.
Sample PDF is Seven Deadly Sins Program-1.pdf
Correct image (using Pixmap(xref))

Incorrect image (using extract_image(xref)["image"] bytes)

How to reproduce the bug
Here is the code I used to generate the two images:
import pymupdf
doc = pymupdf.open("Seven Deadly Sins Program-1.pdf")
images = doc.get_page_images(0)
xref = images[0][0]
pix = pymupdf.Pixmap(doc, xref)
pix = pymupdf.Pixmap(pymupdf.csRGB, pix)
pix.save("temp.jpeg")
img = doc.extract_image(xref)
pix2 = pymupdf.Pixmap(img["image"])
pix2 = pymupdf.Pixmap(pymupdf.csRGB, pix2)
pix2.save("temp2.jpeg")PyMuPDF version
1.25.1
Operating system
Windows
Python version
3.11
Metadata
Metadata
Assignees
Labels
Fixed in next releasefix developedrelease schedule to be determinedrelease schedule to be determined