Skip to content
This commit does not belong to any branch on this repository, and may belong to a fork outside of the repository.

Commit 893eae6

Browse files
committedJun 4, 2025·
Adding Document rewrite images method
1 parent c469893 commit 893eae6

File tree

4 files changed

+191
-1
lines changed

4 files changed

+191
-1
lines changed
 

‎docs/document.rst

Lines changed: 77 additions & 1 deletion
Original file line numberDiff line numberDiff line change
@@ -96,6 +96,7 @@ For details on **embedded files** refer to Appendix 3.
9696
:meth:`Document.pdf_catalog` PDF only: :data:`xref` of catalog (root)
9797
:meth:`Document.pdf_trailer` PDF only: trailer source
9898
:meth:`Document.prev_location` return (chapter, pno) of preceding page
99+
:meth:`Document.rewrite_images` PDF only: rewrite / extra compression for images
99100
:meth:`Document.recolor` PDF only: execute :meth:`Page.recolor` for all pages
100101
:meth:`Document.reload_page` PDF only: provide a new copy of a page
101102
:meth:`Document.resolve_names` PDF only: Convert destination names into a Python dict
@@ -592,9 +593,84 @@ For details on **embedded files** refer to Appendix 3.
592593
To maintain a consistent API, for document types not supporting a chapter structure (like PDFs), :attr:`Document.chapter_count` is 1, and pages can also be loaded via tuples *(0, pno)*. See this [#f3]_ footnote for comments on performance improvements.
593594

594595

596+
.. method:: rewrite_images(dpi_threshold=None, dpi_target=0, quality=0, lossy=True, lossless=True, bitonal=True, color=True, gray=True, set_to_gray=False, options=None)
597+
598+
PDF only: Walk through all images and rewrite them according to the specified parameters. This is useful for reducing file size, changing image formats, or converting color spaces.
599+
600+
The typical usage is extra compression of images for significantly reducing the file size of the PDF. When setting quality and the dpi parameters to positive values and accepting defaults for the rest, the following will happen:
601+
602+
* Lossy and lossless images will be rewritten as JPEG images (FZ_RECOMPRESS_JPEG) as far as technically possible.
603+
604+
* Bitonal (monochrome) images will be rewritten in FAX format (FZ_RECOMPRESS_FAX).
605+
606+
* Subsampling method is **FZ_SUBSAMPLE_AVERAGE** (see below).
607+
608+
:arg int dpi_target: target DPI value for the resampled images. Ignored if `dpi_threshold` is `None`, otherwise must be less than `dpi_threshold`.
609+
610+
:arg int dpi_threshold: If None (the default) no resampling takes place. Otherwise images with a DPI value larger than this will be resampled to `dpi_target` (which must be less than `dpi_threshold`).
611+
612+
:arg int dpi_target: target DPI value for the resampled images. Ignored if `dpi_threshold` is `None`, otherwise must be less than `dpi_threshold`.
613+
614+
:arg int quality: desired target JPEG quality, a value between 0 and 100. 0 means no quality change, 100 means best quality.
615+
616+
:arg bool lossy: include lossy image types (e.g. JPEG).
617+
618+
:arg bool lossless: include lossless image types (e.g. PNG).
619+
620+
:arg bool bitonal: include black-and-white images (e.g. FAX).
621+
622+
:arg bool color: include colored images.
623+
624+
:arg bool gray: include grayscale images.
625+
626+
:arg bool set_to_gray: if True, the PDF will be converted to grayscale by executing :meth:`Document.recolor` before all image processing. Please note that this will also change text and vector graphics to grayscale -- not just the images.
627+
628+
:arg dict options: This parameter is intended for expert users. Except ``set_to_gray``, all other parameters are ignored. It must be an object prepared in the following way: ``options = pymupdf.mupdf.PdfImageRewriterOptions()``. Then attributes of this object can be set to achieve fine-grained control. Following are the adjustable attributes of the ``options`` object and their default (do nothing) values.
629+
630+
::
631+
632+
options.bitonal_image_recompress_method = FZ_RECOMPRESS_NEVER
633+
options.bitonal_image_recompress_quality = None
634+
options.bitonal_image_subsample_method = FZ_SUBSAMPLE_AVERAGE
635+
options.bitonal_image_subsample_threshold = 0
636+
options.bitonal_image_subsample_to = 0
637+
options.color_lossless_image_recompress_method = FZ_RECOMPRESS_NEVER
638+
options.color_lossless_image_recompress_quality = None
639+
options.color_lossless_image_subsample_method = FZ_SUBSAMPLE_AVERAGE
640+
options.color_lossless_image_subsample_threshold = 0
641+
options.color_lossless_image_subsample_to = 0
642+
options.color_lossy_image_recompress_method = FZ_RECOMPRESS_NEVER
643+
options.color_lossy_image_recompress_quality = None
644+
options.color_lossy_image_subsample_method = FZ_SUBSAMPLE_AVERAGE
645+
options.color_lossy_image_subsample_threshold = 0
646+
options.color_lossy_image_subsample_to = 0
647+
options.gray_lossless_image_recompress_method = FZ_RECOMPRESS_NEVER
648+
options.gray_lossless_image_recompress_quality = None
649+
options.gray_lossless_image_subsample_method = FZ_SUBSAMPLE_AVERAGE
650+
options.gray_lossless_image_subsample_threshold = 0
651+
options.gray_lossless_image_subsample_to = 0
652+
options.gray_lossy_image_recompress_method = FZ_RECOMPRESS_NEVER
653+
options.gray_lossy_image_recompress_quality = None
654+
options.gray_lossy_image_subsample_method = FZ_SUBSAMPLE_AVERAGE
655+
options.gray_lossy_image_subsample_threshold = 0
656+
options.gray_lossy_image_subsample_to = 0
657+
658+
The ``*_recompress_method`` attributes may be one of the values **FZ_RECOMPRESS_NEVER (0), FZ_RECOMPRESS_SAME (1), FZ_RECOMPRESS_LOSSLESS (2), FZ_RECOMPRESS_JPEG (3), FZ_RECOMPRESS_J2K (4), FZ_RECOMPRESS_FAX (5)**. Value FZ_RECOMPRESS_NEVER will skip this image type altogether and FZ_RECOMPRESS_SAME will not change the type. The other values will execute type conversions (as far as technically possible).
659+
660+
The ``*_quality`` values are strings of integers from "0" to "100" or ``None``.
661+
662+
The ``*_subsample_method`` attributes are either **FZ_SUBSAMPLE_AVERAGE (0)** or **FZ_SUBSAMPLE_BICUBIC (1)** and refer to how a pixel value is derived from its neighboring pixels during subsampling. For some background see `this Wikipedia article about bicubic interpolation <https://en.wikipedia.org/wiki/Bicubic_interpolation>`_.
663+
664+
Attributes ``*_subsample_threshold`` excludes images from subsampling which have a lower DPI. Participating images will be subsampled to the DPI values given by the ``*_subsample_to`` values. Values of 0 mean that no subsampling will take place.
665+
666+
The ``*_subsample_threshold`` values should be chosen notably larger than the ``*_subsample_to`` values to ensure that there are enough size savings. After all, every subsampling inevitably incurs quality losses.
667+
668+
An example for a good choice is ``threshold=100`` and ``to=72``.
669+
670+
595671
.. method:: recolor(components=1)
596672

597-
PDF only: Change the color component counts for all object types text, image and vector graphics for all pages.
673+
PDF only: Change the color component counts for all object types text, images and vector graphics for all pages.
598674

599675
:arg int components: desired color space indicated by the number of color components: 1 = DeviceGRAY, 3 = DeviceRGB, 4 = DeviceCMYK.
600676

‎src/__init__.py

Lines changed: 99 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -5334,6 +5334,93 @@ def resolve_link(self, uri=None, chapters=0):
53345334
pno = mupdf.fz_page_number_from_location(self.this, loc)
53355335
return pno, xp, yp
53365336

5337+
def rewrite_images(
5338+
self,
5339+
dpi_threshold=None,
5340+
dpi_target=0,
5341+
quality=0,
5342+
lossy=True,
5343+
lossless=True,
5344+
bitonal=True,
5345+
color=True,
5346+
gray=True,
5347+
set_to_gray=False,
5348+
options=None,
5349+
):
5350+
"""Rewrite images in a PDF document.
5351+
5352+
The typical use case is to reduce the size of the PDF by recompressing
5353+
images. Default parameters will convert all images to JPEG where
5354+
possible, using the specified resolutions and quality. Exclude
5355+
undesired images by setting parameters to False.
5356+
Args:
5357+
dpi_threshold: look at images with a larger DPI only.
5358+
dpi_target: change eligible images to this DPI.
5359+
quality: Quality of the recompressed images (0-100).
5360+
lossy: process lossy image types (e.g. JPEG).
5361+
lossless: process lossless image types (e.g. PNG).
5362+
bitonal: process black-and-white images (e.g. FAX)
5363+
color: process colored images.
5364+
gray: process gray images.
5365+
set_to_gray: whether to change the PDF to gray at process start.
5366+
options: (PdfImageRewriterOptions) Custom options for image
5367+
rewriting (optional). Expert use only. If provided, other
5368+
parameters are ignored, except set_to_gray.
5369+
"""
5370+
quality_str = str(quality)
5371+
if not dpi_threshold:
5372+
dpi_threshold = dpi_target = 0
5373+
if dpi_target > 0 and not dpi_threshold > dpi_target:
5374+
raise ValueError("dpi_target must be greater than dpi_threshold")
5375+
template_opts = mupdf.PdfImageRewriterOptions()
5376+
dir1 = set(dir(template_opts)) # for checking that only existing options are set
5377+
if not options:
5378+
opts = mupdf.PdfImageRewriterOptions()
5379+
if bitonal:
5380+
opts.bitonal_image_recompress_method = mupdf.FZ_RECOMPRESS_FAX
5381+
opts.bitonal_image_subsample_method = mupdf.FZ_SUBSAMPLE_AVERAGE
5382+
opts.bitonal_image_subsample_to = dpi_target
5383+
opts.bitonal_image_recompress_quality = quality_str
5384+
opts.bitonal_image_subsample_threshold = dpi_threshold
5385+
if color:
5386+
if lossless:
5387+
opts.color_lossless_image_recompress_method = mupdf.FZ_RECOMPRESS_JPEG
5388+
opts.color_lossless_image_subsample_method = mupdf.FZ_SUBSAMPLE_AVERAGE
5389+
opts.color_lossless_image_subsample_to = dpi_target
5390+
opts.color_lossless_image_subsample_threshold = dpi_threshold
5391+
opts.color_lossless_image_recompress_quality = quality_str
5392+
if lossy:
5393+
opts.color_lossy_image_recompress_method = mupdf.FZ_RECOMPRESS_JPEG
5394+
opts.color_lossy_image_subsample_method = mupdf.FZ_SUBSAMPLE_AVERAGE
5395+
opts.color_lossy_image_subsample_threshold = dpi_threshold
5396+
opts.color_lossy_image_subsample_to = dpi_target
5397+
opts.color_lossy_image_recompress_quality = quality_str
5398+
if gray:
5399+
if lossless:
5400+
opts.gray_lossless_image_recompress_method = mupdf.FZ_RECOMPRESS_JPEG
5401+
opts.gray_lossless_image_subsample_method = mupdf.FZ_SUBSAMPLE_AVERAGE
5402+
opts.gray_lossless_image_subsample_to = dpi_target
5403+
opts.gray_lossless_image_subsample_threshold = dpi_threshold
5404+
opts.gray_lossless_image_recompress_quality = quality_str
5405+
if lossy:
5406+
opts.gray_lossy_image_recompress_method = mupdf.FZ_RECOMPRESS_JPEG
5407+
opts.gray_lossy_image_subsample_method = mupdf.FZ_SUBSAMPLE_AVERAGE
5408+
opts.gray_lossy_image_subsample_threshold = dpi_threshold
5409+
opts.gray_lossy_image_subsample_to = dpi_target
5410+
opts.gray_lossy_image_recompress_quality = quality_str
5411+
else:
5412+
opts = options
5413+
5414+
dir2 = set(dir(opts)) # checking that only possible options were used
5415+
invalid_options = dir2 - dir1
5416+
if invalid_options:
5417+
raise ValueError(f"Invalid options: {invalid_options}")
5418+
5419+
if set_to_gray:
5420+
self.recolor(1)
5421+
pdf = _as_pdf_document(self)
5422+
mupdf.pdf_rewrite_images(pdf, opts)
5423+
53375424
def recolor(self, components=1):
53385425
"""Change the color component count on all pages.
53395426

@@ -13448,6 +13535,18 @@ def width(self):
1344813535
JM_mupdf_show_warnings = 0
1344913536

1345013537

13538+
# ------------------------------------------------------------------------------
13539+
# Image recompression constants
13540+
# ------------------------------------------------------------------------------
13541+
FZ_RECOMPRESS_NEVER = mupdf.FZ_RECOMPRESS_NEVER
13542+
FZ_RECOMPRESS_SAME = mupdf.FZ_RECOMPRESS_SAME
13543+
FZ_RECOMPRESS_LOSSLESS = mupdf.FZ_RECOMPRESS_LOSSLESS
13544+
FZ_RECOMPRESS_JPEG = mupdf.FZ_RECOMPRESS_JPEG
13545+
FZ_RECOMPRESS_J2K = mupdf.FZ_RECOMPRESS_J2K
13546+
FZ_RECOMPRESS_FAX = mupdf.FZ_RECOMPRESS_FAX
13547+
FZ_SUBSAMPLE_AVERAGE = mupdf.FZ_SUBSAMPLE_AVERAGE
13548+
FZ_SUBSAMPLE_BICUBIC = mupdf.FZ_SUBSAMPLE_BICUBIC
13549+
1345113550
# ------------------------------------------------------------------------------
1345213551
# Various PDF Optional Content Flags
1345313552
# ------------------------------------------------------------------------------
645 KB
Binary file not shown.

‎tests/test_rewrite_images.py

Lines changed: 15 additions & 0 deletions
Original file line numberDiff line numberDiff line change
@@ -0,0 +1,15 @@
1+
import pymupdf
2+
import os
3+
4+
scriptdir = os.path.dirname(__file__)
5+
6+
7+
def test_rewrite_images():
8+
"""Example for decreasing file size by more than 30%."""
9+
filename = os.path.join(scriptdir, "resources", "test-rewrite-images.pdf")
10+
doc = pymupdf.open(filename)
11+
size0 = os.path.getsize(doc.name)
12+
doc.rewrite_images(dpi_threshold=100, dpi_target=72, quality=33)
13+
data = doc.tobytes(garbage=3, deflate=True)
14+
size1 = len(data)
15+
assert (1 - (size1 / size0)) > 0.3

0 commit comments

Comments
 (0)
Please sign in to comment.