Skip to content

Adding Document rewrite images method #4538

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Merged
merged 2 commits into from
Jun 5, 2025
Merged
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
76 changes: 75 additions & 1 deletion docs/document.rst
Original file line number Diff line number Diff line change
@@ -96,6 +96,7 @@ For details on **embedded files** refer to Appendix 3.
:meth:`Document.pdf_catalog` PDF only: :data:`xref` of catalog (root)
:meth:`Document.pdf_trailer` PDF only: trailer source
:meth:`Document.prev_location` return (chapter, pno) of preceding page
:meth:`Document.rewrite_images` PDF only: rewrite / extra compression for images
:meth:`Document.recolor` PDF only: execute :meth:`Page.recolor` for all pages
:meth:`Document.reload_page` PDF only: provide a new copy of a page
:meth:`Document.resolve_names` PDF only: Convert destination names into a Python dict
@@ -592,9 +593,82 @@ For details on **embedded files** refer to Appendix 3.
To maintain a consistent API, for document types not supporting a chapter structure (like PDFs), :attr:`Document.chapter_count` is 1, and pages can also be loaded via tuples *(0, pno)*. See this [#f3]_ footnote for comments on performance improvements.


.. method:: rewrite_images(dpi_threshold=None, dpi_target=0, quality=0, lossy=True, lossless=True, bitonal=True, color=True, gray=True, set_to_gray=False, options=None)

PDF only: Walk through all images and rewrite them according to the specified parameters. This is useful for reducing file size, changing image formats, or converting color spaces.

The typical usage is extra compression of images for significantly reducing the file size of the PDF. When setting quality and the dpi parameters to positive values and accepting defaults for the rest, the following will happen:

* Lossy and lossless images will be rewritten as JPEG images (FZ_RECOMPRESS_JPEG) as far as technically possible.

* Bitonal (monochrome) images will be rewritten in FAX format (FZ_RECOMPRESS_FAX).

* Subsampling method is **FZ_SUBSAMPLE_AVERAGE** (see below).

:arg int dpi_target: target DPI value for the resampled images. Ignored if `dpi_threshold` is `None`, otherwise must be less than `dpi_threshold` and positive.

:arg int dpi_threshold: If None (the default) no resampling takes place. Otherwise images with a DPI value larger than this will be resampled to `dpi_target` (which must be less than `dpi_threshold`).

:arg int quality: desired target JPEG quality, a value between 0 and 100. 0 means no quality change, 100 means best quality.

:arg bool lossy: include lossy image types (e.g. JPEG).

:arg bool lossless: include lossless image types (e.g. PNG).

:arg bool bitonal: include black-and-white images (e.g. FAX).

:arg bool color: include colored images.

:arg bool gray: include grayscale images.

:arg bool set_to_gray: if True, the PDF will be converted to grayscale by executing :meth:`Document.recolor` before all image processing. Please note that this will also change text and vector graphics to grayscale -- not just the images.

:arg dict options: This parameter is intended for expert users. Except ``set_to_gray``, all other parameters are ignored. It must be an object prepared in the following way: ``options = pymupdf.mupdf.PdfImageRewriterOptions()``. Then attributes of this object can be set to achieve fine-grained control. Following are the adjustable attributes of the ``options`` object and their default (do nothing) values.

::
options.bitonal_image_recompress_method = FZ_RECOMPRESS_NEVER
options.bitonal_image_recompress_quality = None
options.bitonal_image_subsample_method = FZ_SUBSAMPLE_AVERAGE
options.bitonal_image_subsample_threshold = 0
options.bitonal_image_subsample_to = 0
options.color_lossless_image_recompress_method = FZ_RECOMPRESS_NEVER
options.color_lossless_image_recompress_quality = None
options.color_lossless_image_subsample_method = FZ_SUBSAMPLE_AVERAGE
options.color_lossless_image_subsample_threshold = 0
options.color_lossless_image_subsample_to = 0
options.color_lossy_image_recompress_method = FZ_RECOMPRESS_NEVER
options.color_lossy_image_recompress_quality = None
options.color_lossy_image_subsample_method = FZ_SUBSAMPLE_AVERAGE
options.color_lossy_image_subsample_threshold = 0
options.color_lossy_image_subsample_to = 0
options.gray_lossless_image_recompress_method = FZ_RECOMPRESS_NEVER
options.gray_lossless_image_recompress_quality = None
options.gray_lossless_image_subsample_method = FZ_SUBSAMPLE_AVERAGE
options.gray_lossless_image_subsample_threshold = 0
options.gray_lossless_image_subsample_to = 0
options.gray_lossy_image_recompress_method = FZ_RECOMPRESS_NEVER
options.gray_lossy_image_recompress_quality = None
options.gray_lossy_image_subsample_method = FZ_SUBSAMPLE_AVERAGE
options.gray_lossy_image_subsample_threshold = 0
options.gray_lossy_image_subsample_to = 0

The ``*_recompress_method`` attributes may be one of the values **FZ_RECOMPRESS_NEVER (0), FZ_RECOMPRESS_SAME (1), FZ_RECOMPRESS_LOSSLESS (2), FZ_RECOMPRESS_JPEG (3), FZ_RECOMPRESS_J2K (4), FZ_RECOMPRESS_FAX (5)**. Value FZ_RECOMPRESS_NEVER will skip this image type altogether and FZ_RECOMPRESS_SAME will not change the type. The other values will execute type conversions (as far as technically possible).

The ``*_quality`` values are strings of integers from "0" to "100" or ``None``.

The ``*_subsample_method`` attributes are either **FZ_SUBSAMPLE_AVERAGE (0)** or **FZ_SUBSAMPLE_BICUBIC (1)** and refer to how a pixel value is derived from its neighboring pixels during subsampling. For some background see `this Wikipedia article about bicubic interpolation <https://en.wikipedia.org/wiki/Bicubic_interpolation>`_.

Attributes ``*_subsample_threshold`` excludes images from subsampling which have a lower DPI. Participating images will be subsampled to the DPI values given by the ``*_subsample_to`` values. Values of 0 mean that no subsampling will take place.

The ``*_subsample_threshold`` values should be chosen notably larger than the ``*_subsample_to`` values to ensure that there are enough size savings. After all, every subsampling inevitably incurs quality losses.

An example for a good choice is ``threshold=100`` and ``to=72``.


.. method:: recolor(components=1)

PDF only: Change the color component counts for all object types text, image and vector graphics for all pages.
PDF only: Change the color component counts for all object types text, images and vector graphics for all pages.

:arg int components: desired color space indicated by the number of color components: 1 = DeviceGRAY, 3 = DeviceRGB, 4 = DeviceCMYK.

99 changes: 99 additions & 0 deletions src/__init__.py
Original file line number Diff line number Diff line change
@@ -5334,6 +5334,93 @@ def resolve_link(self, uri=None, chapters=0):
pno = mupdf.fz_page_number_from_location(self.this, loc)
return pno, xp, yp

def rewrite_images(
self,
dpi_threshold=None,
dpi_target=0,
quality=0,
lossy=True,
lossless=True,
bitonal=True,
color=True,
gray=True,
set_to_gray=False,
options=None,
):
"""Rewrite images in a PDF document.

The typical use case is to reduce the size of the PDF by recompressing
images. Default parameters will convert all images to JPEG where
possible, using the specified resolutions and quality. Exclude
undesired images by setting parameters to False.
Args:
dpi_threshold: look at images with a larger DPI only.
dpi_target: change eligible images to this DPI.
quality: Quality of the recompressed images (0-100).
lossy: process lossy image types (e.g. JPEG).
lossless: process lossless image types (e.g. PNG).
bitonal: process black-and-white images (e.g. FAX)
color: process colored images.
gray: process gray images.
set_to_gray: whether to change the PDF to gray at process start.
options: (PdfImageRewriterOptions) Custom options for image
rewriting (optional). Expert use only. If provided, other
parameters are ignored, except set_to_gray.
"""
quality_str = str(quality)
if not dpi_threshold:
dpi_threshold = dpi_target = 0
if dpi_target > 0 and dpi_target >= dpi_threshold:
raise ValueError("{dpi_target=} must be less than {dpi_threshold=}")
template_opts = mupdf.PdfImageRewriterOptions()
dir1 = set(dir(template_opts)) # for checking that only existing options are set
if not options:
opts = mupdf.PdfImageRewriterOptions()
if bitonal:
opts.bitonal_image_recompress_method = mupdf.FZ_RECOMPRESS_FAX
opts.bitonal_image_subsample_method = mupdf.FZ_SUBSAMPLE_AVERAGE
opts.bitonal_image_subsample_to = dpi_target
opts.bitonal_image_recompress_quality = quality_str
opts.bitonal_image_subsample_threshold = dpi_threshold
if color:
if lossless:
opts.color_lossless_image_recompress_method = mupdf.FZ_RECOMPRESS_JPEG
opts.color_lossless_image_subsample_method = mupdf.FZ_SUBSAMPLE_AVERAGE
opts.color_lossless_image_subsample_to = dpi_target
opts.color_lossless_image_subsample_threshold = dpi_threshold
opts.color_lossless_image_recompress_quality = quality_str
if lossy:
opts.color_lossy_image_recompress_method = mupdf.FZ_RECOMPRESS_JPEG
opts.color_lossy_image_subsample_method = mupdf.FZ_SUBSAMPLE_AVERAGE
opts.color_lossy_image_subsample_threshold = dpi_threshold
opts.color_lossy_image_subsample_to = dpi_target
opts.color_lossy_image_recompress_quality = quality_str
if gray:
if lossless:
opts.gray_lossless_image_recompress_method = mupdf.FZ_RECOMPRESS_JPEG
opts.gray_lossless_image_subsample_method = mupdf.FZ_SUBSAMPLE_AVERAGE
opts.gray_lossless_image_subsample_to = dpi_target
opts.gray_lossless_image_subsample_threshold = dpi_threshold
opts.gray_lossless_image_recompress_quality = quality_str
if lossy:
opts.gray_lossy_image_recompress_method = mupdf.FZ_RECOMPRESS_JPEG
opts.gray_lossy_image_subsample_method = mupdf.FZ_SUBSAMPLE_AVERAGE
opts.gray_lossy_image_subsample_threshold = dpi_threshold
opts.gray_lossy_image_subsample_to = dpi_target
opts.gray_lossy_image_recompress_quality = quality_str
else:
opts = options

dir2 = set(dir(opts)) # checking that only possible options were used
invalid_options = dir2 - dir1
if invalid_options:
raise ValueError(f"Invalid options: {invalid_options}")

if set_to_gray:
self.recolor(1)
pdf = _as_pdf_document(self)
mupdf.pdf_rewrite_images(pdf, opts)

def recolor(self, components=1):
"""Change the color component count on all pages.

@@ -12833,6 +12920,18 @@ def width(self):
JM_mupdf_show_warnings = 0


# ------------------------------------------------------------------------------
# Image recompression constants
# ------------------------------------------------------------------------------
FZ_RECOMPRESS_NEVER = mupdf.FZ_RECOMPRESS_NEVER
FZ_RECOMPRESS_SAME = mupdf.FZ_RECOMPRESS_SAME
FZ_RECOMPRESS_LOSSLESS = mupdf.FZ_RECOMPRESS_LOSSLESS
FZ_RECOMPRESS_JPEG = mupdf.FZ_RECOMPRESS_JPEG
FZ_RECOMPRESS_J2K = mupdf.FZ_RECOMPRESS_J2K
FZ_RECOMPRESS_FAX = mupdf.FZ_RECOMPRESS_FAX
FZ_SUBSAMPLE_AVERAGE = mupdf.FZ_SUBSAMPLE_AVERAGE
FZ_SUBSAMPLE_BICUBIC = mupdf.FZ_SUBSAMPLE_BICUBIC

# ------------------------------------------------------------------------------
# Various PDF Optional Content Flags
# ------------------------------------------------------------------------------
Binary file added tests/resources/test-rewrite-images.pdf
Binary file not shown.
15 changes: 15 additions & 0 deletions tests/test_rewrite_images.py
Original file line number Diff line number Diff line change
@@ -0,0 +1,15 @@
import pymupdf
import os

scriptdir = os.path.dirname(__file__)


def test_rewrite_images():
"""Example for decreasing file size by more than 30%."""
filename = os.path.join(scriptdir, "resources", "test-rewrite-images.pdf")
doc = pymupdf.open(filename)
size0 = os.path.getsize(doc.name)
doc.rewrite_images(dpi_threshold=100, dpi_target=72, quality=33)
data = doc.tobytes(garbage=3, deflate=True)
size1 = len(data)
assert (1 - (size1 / size0)) > 0.3