Issue search results

Filter by

184 results

(104 ms)inpymupdf/RAG (press backspace or delete to remove)

pymupdf/RAG
Some content will be identified twice

As the title suggests The identified content is shown in the following figure img width= 1143 height= 973 alt= Image src= https://github.com/user-attachments/assets/1403f9fb-571c-4d5f-94f8-3505e4e5f6bc ...

not a bug

neverlatetolearn0

Opened
5 days ago

#298

pymupdf/RAG
[Bug] A specific diagram recognized as significant is not extracted as images by pymupdf4llm.to_markdown

The following diagram is not extracted by pymupdf4llm.to_markdown( uart.pdf , write_images=True) as images, which it should. img width= 1665 height= 800 alt= Image src= https://github.com/user-attachments/assets/f39dd327-6d4c-484d-84ed-939e72ee25ce ...

bug

fix developed

xcpky

Opened
10 days ago

#296

pymupdf/RAG
⚠️ NOTICE: Please use forum.pymupdf.com for any issues

We highly recommend posting bugs, issues, feature requests and discussions on our forum.pymupdf.com

jamie-lemon

Opened
13 days ago

#295

pymupdf/RAG
Unable to extract images from Page

I have a PDF version of a picture (manually filled out form). When using pymupdf4llm.to_markdown(doc, page_chunks=True), the page image is not detected. I believe this has to do with the size of the image ...

bug

fix developed

jmoreno11

Opened
13 days ago

#294

pymupdf/RAG
Extract table in markdown format, if table is borderless

To_markdown function extracts table in markdown format perfectly if in the pdf the table has borders. Like this... img width= 1233 height= 401 alt= Image src= https://github.com/user-attachments/assets/a730d77f-87c9-4912-9456-cbfde65af19f ...

wontfix

Aryabhattacharjee

Opened
14 days ago

#293

pymupdf/RAG
pymupdf4llm.to_markdown() returns empty output in 0.0.25 (worked in 0.0.17)

Summary I m encountering an issue where pymupdf4llm.to_markdown() returns an empty string for a specific PDF in version 0.0.25 (and also 0.0.24). However, the same file works correctly in version 0.0.17. ...

azhurb

Opened
on Jun 18

#289

pymupdf/RAG
Instead of printing progress in to_markdown(), pass the progress

Instead of printing progress in to_markdown(), pass the progress using generator or something so that ui based applications can use it

enhancement

devilsaint99

Opened
on Jun 16

#288

pymupdf/RAG
code=4: no font file for digest

Hello, there is an exception when trying to extract the pdf text. It seems that some fonts are missing. The exception was found in versions 20 to 25, but not in versions 14 or so Here s my pdf output_first_20_pages.pdf ...

upstream

yumingmin88

Opened
on Jun 16

#287

pymupdf/RAG
pymupdf.mupdf.FzErrorFormat: code=7: compression bomb detected

Hello @JorjMcKie , I tried to extract markdown text from the given file and it gave me the given error. Could you please help to identify and fix the issue ? Here is my code: import pymupdf4llm md_text ...

wontfix

urvisism

Opened
on Jun 11

#283

pymupdf/RAG
Content Duplication with the latest version

Hi, With pymupdf 1.26.0, pymupdf4llm 0.0.24, I recently found that the text can sometimes be duplicated and the duplication seems to be conducted several time under a row. Don t know if it is something ...

fix developed

IronK77

Opened
on Jun 10

#282

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Restrict your search to the title by using the in:title qualifier.

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues

ProTip!

Restrict your search to the title by using the in:title qualifier.

Languages

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Filter by

State

Advanced

pymupdf/RAG
Some content will be identified twice

pymupdf/RAG
[Bug] A specific diagram recognized as significant is not extracted as images by pymupdf4llm.to_markdown

pymupdf/RAG
⚠️ NOTICE: Please use forum.pymupdf.com for any issues

pymupdf/RAG
Unable to extract images from Page

pymupdf/RAG
Extract table in markdown format, if table is borderless

pymupdf/RAG
pymupdf4llm.to_markdown() returns empty output in 0.0.25 (worked in 0.0.17)

pymupdf/RAG
Instead of printing progress in to_markdown(), pass the progress

pymupdf/RAG
code=4: no font file for digest

pymupdf/RAG
pymupdf.mupdf.FzErrorFormat: code=7: compression bomb detected

pymupdf/RAG
Content Duplication with the latest version

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.

issues Search Results · repo:pymupdf/RAG language:Python

Filter by

State

Advanced

184 results

Learn how you can use GitHub Issues to plan and track your work.

Learn how you can use GitHub Issues to plan and track your work.