Skip to content

issues Search Results · repo:pymupdf/RAG language:Python

Filter by

184 results
 (104 ms)

184 results

inpymupdf/RAG (press backspace or delete to remove)

As the title suggests The identified content is shown in the following figure img width= 1143 height= 973 alt= Image src= https://github.com/user-attachments/assets/1403f9fb-571c-4d5f-94f8-3505e4e5f6bc ...
not a bug
  • neverlatetolearn0
  • 3
  • Opened 
    5 days ago
  • #298

The following diagram is not extracted by pymupdf4llm.to_markdown( uart.pdf , write_images=True) as images, which it should. img width= 1665 height= 800 alt= Image src= https://github.com/user-attachments/assets/f39dd327-6d4c-484d-84ed-939e72ee25ce ...
bug
fix developed
  • xcpky
  • 4
  • Opened 
    10 days ago
  • #296

We highly recommend posting bugs, issues, feature requests and discussions on our forum.pymupdf.com
  • jamie-lemon
  • Opened 
    13 days ago
  • #295

I have a PDF version of a picture (manually filled out form). When using pymupdf4llm.to_markdown(doc, page_chunks=True), the page image is not detected. I believe this has to do with the size of the image ...
bug
fix developed
  • jmoreno11
  • 7
  • Opened 
    13 days ago
  • #294

To_markdown function extracts table in markdown format perfectly if in the pdf the table has borders. Like this... img width= 1233 height= 401 alt= Image src= https://github.com/user-attachments/assets/a730d77f-87c9-4912-9456-cbfde65af19f ...
wontfix
  • Aryabhattacharjee
  • 2
  • Opened 
    14 days ago
  • #293

Summary I m encountering an issue where pymupdf4llm.to_markdown() returns an empty string for a specific PDF in version 0.0.25 (and also 0.0.24). However, the same file works correctly in version 0.0.17. ...
  • azhurb
  • 4
  • Opened 
    on Jun 18
  • #289

Instead of printing progress in to_markdown(), pass the progress using generator or something so that ui based applications can use it
enhancement
  • devilsaint99
  • 1
  • Opened 
    on Jun 16
  • #288

Hello, there is an exception when trying to extract the pdf text. It seems that some fonts are missing. The exception was found in versions 20 to 25, but not in versions 14 or so Here s my pdf output_first_20_pages.pdf ...
upstream
  • yumingmin88
  • 1
  • Opened 
    on Jun 16
  • #287

Hello @JorjMcKie , I tried to extract markdown text from the given file and it gave me the given error. Could you please help to identify and fix the issue ? Here is my code: import pymupdf4llm md_text ...
wontfix
  • urvisism
  • 3
  • Opened 
    on Jun 11
  • #283

Hi, With pymupdf 1.26.0, pymupdf4llm 0.0.24, I recently found that the text can sometimes be duplicated and the duplication seems to be conducted several time under a row. Don t know if it is something ...
fix developed
  • IronK77
  • 2
  • Opened 
    on Jun 10
  • #282
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Restrict your search to the title by using the in:title qualifier.
Issue origami icon

Learn how you can use GitHub Issues to plan and track your work.

Save views for sprints, backlogs, teams, or releases. Rank, sort, and filter issues to suit the occasion. The possibilities are endless.Learn more about GitHub Issues
ProTip! 
Restrict your search to the title by using the in:title qualifier.
Issue search results · GitHub