You signed in with another tab or window. Reload to refresh your session.You signed out in another tab or window. Reload to refresh your session.You switched accounts on another tab or window. Reload to refresh your session.Dismiss alert
Copy file name to clipboardExpand all lines: docs/document.rst
+75-1Lines changed: 75 additions & 1 deletion
Original file line number
Diff line number
Diff line change
@@ -96,6 +96,7 @@ For details on **embedded files** refer to Appendix 3.
96
96
:meth:`Document.pdf_catalog` PDF only: :data:`xref` of catalog (root)
97
97
:meth:`Document.pdf_trailer` PDF only: trailer source
98
98
:meth:`Document.prev_location` return (chapter, pno) of preceding page
99
+
:meth:`Document.rewrite_images` PDF only: rewrite / extra compression for images
99
100
:meth:`Document.recolor` PDF only: execute :meth:`Page.recolor` for all pages
100
101
:meth:`Document.reload_page` PDF only: provide a new copy of a page
101
102
:meth:`Document.resolve_names` PDF only: Convert destination names into a Python dict
@@ -592,9 +593,82 @@ For details on **embedded files** refer to Appendix 3.
592
593
To maintain a consistent API, for document types not supporting a chapter structure (like PDFs), :attr:`Document.chapter_count` is 1, and pages can also be loaded via tuples *(0, pno)*. See this [#f3]_ footnote for comments on performance improvements.
PDF only: Walk through all images and rewrite them according to the specified parameters. This is useful for reducing file size, changing image formats, or converting color spaces.
599
+
600
+
The typical usage is extra compression of images for significantly reducing the file size of the PDF. When setting quality and the dpi parameters to positive values and accepting defaults for the rest, the following will happen:
601
+
602
+
* Lossy and lossless images will be rewritten as JPEG images (as far as technically possible).
603
+
604
+
* Bitonal (monochrome) images will be rewritten in FAX format (``/Filter /CCITTFaxDecode``).
605
+
606
+
* Subsampling method is **"average"** (see below).
607
+
608
+
:arg int dpi_target: target DPI value for the rewritten images. Default is 0, which means that no resampling will take place. If positive, then ``dpi_threshold`` must be larger.
609
+
610
+
:arg int dpi_threshold: only images with a DPI value larger than this will be rewritten. Default is 0, in which case ``dpi_target`` must also be 0.
611
+
612
+
:arg int quality: target quality. This is a value between 0 and 100. For lossy images, this is the JPEG quality. For PNG images, this is translated to the compression level (0 = no compression, 100 = maximum compression). Default is 0, which means that no quality changes will take place.
613
+
614
+
:arg bool lossy: include lossy image types (e.g. JPEG).
615
+
616
+
:arg bool lossless: include lossless image types (e.g. PNG).
617
+
618
+
:arg bool bitonal: include black-and-white images (e.g. FAX).
619
+
620
+
:arg bool color: include colored images.
621
+
622
+
:arg bool gray: include grayscale images.
623
+
624
+
:arg bool set_to_gray: if True, the PDF will be converted to grayscale by executing :meth:`Document.recolor` before all image processing. Please note that this will also change text and vector graphics to grayscale -- not just the images.
625
+
626
+
:arg dict options: This parameter is intended for expert users. Except ``set_to_gray``, all other parameters are ignored. It must be an object prepared in the following way: ``options = pymupdf.mupdf.PdfImageRewriterOptions()``. Then attributes of this object can be set to achieve fine-grained control. Following are the adjustable attributes of the ``options`` object and their default (do nothing) values.
The ``*_recompress_method`` attributes may be one of the values **0 (never), 1 (same), 2 (lossless), 3 (JPEG), 4 (J2K), 5 (FAX)**. Value 0 will skip this image type altogether and 1 will not change the type. The other values will execute type conversions (as far as technically possible).
657
+
658
+
The ``*_quality`` values are strings of integers from "0" to "100" or ``None``.
659
+
660
+
The ``*_subsample_method`` attributes are either **0 (average)** or **1 (bicubic interpolation)** and refer to how a pixel value is derived from its neighboring pixels during subsampling. For some background see `here <https://en.wikipedia.org/wiki/Bicubic_interpolation>`_.
661
+
662
+
Attributes ``*_subsample_threshold`` excludes images from subsampling which have a lower DPI. Participating images will be subsampled to the DPI values given by the ``*_subsample_to`` values. Values of 0 mean that no subsampling will take place.
663
+
664
+
The ``*_subsample_threshold`` values should be chosen notably larger than the ``*_subsample_to`` values to ensure that there are enough size savings. After all, every subsampling inevitably incurs quality losses.
665
+
666
+
An example for a good choice is ``threshold=100`` and ``to=72``.
667
+
668
+
595
669
.. method:: recolor(components=1)
596
670
597
-
PDF only: Change the color component counts for all object types text, image and vector graphics for all pages.
671
+
PDF only: Change the color component counts for all object types text, images and vector graphics for all pages.
598
672
599
673
:arg int components: desired color space indicated by the number of color components: 1 = DeviceGRAY, 3 = DeviceRGB, 4 = DeviceCMYK.
0 commit comments