PDF Text Extractor
No credit card required
PDF Text Extractor
No credit card required
PDF Text Extractor allows you to extract text from PDF files. It also supports chunking of the text to prepare the data for usage with large language models.
when crawling I receive the following error: 2024-02-14T06:07:31.285Z ACTOR: Pulling Docker image of build 5lFVfc3pf7JN70PcE from repository. 2024-02-14T06:07:35.343Z ACTOR: Creating Docker container. 2024-02-14T06:07:35.641Z ACTOR: Starting Docker container. 2024-02-14T06:07:37.360Z INFO Initializing actor... 2024-02-14T06:07:37.363Z INFO System info ({"apify_sdk_version": "1.1.5", "apify_client_version": "1.4.1", "python_version": "3.11.7", "os": "linux"}) 2024-02-14T06:07:37.628Z --- Logging error --- 2024-02-14T06:07:37.629Z Traceback (most recent call last): 2024-02-14T06:07:37.631Z File "/usr/src/app/src/main.py", line 15, in main 2024-02-14T06:07:37.633Z pdf_document = pdfium.PdfDocument(io.BytesIO(pdf.content)) 2024-02-14T06:07:37.634Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-02-14T06:07:37.636Z File "/usr/local/lib/python3.11/site-packages/pypdfium2/_helpers/document.py", line 77, in init 2024-02-14T06:07:37.638Z self.raw, to_hold, to_close = _open_pdf(self._input, self._password, self._autoclose) 2024-02-14T06:07:37.639Z ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ 2024-02-14T06:07:37.641Z File "/usr/local/lib/python3.11/site-packages/pypdfium2/_helpers/document.py", line 744, in _open_pdf 2024-02-14T06:07:37.642Z raise PdfiumError(f"Failed to load document (PDFium: {pdfium_i.ErrorToStr.get(err_code)}).") 2024-02-14T06:07:37.644Z pypdfium2._helpers.misc.Pdfi... [trimmed]
Hi, sadly it seems that PDFium has problems parsing the PDF file you provided. Can you try some other files to see if it is caused by that specific file?
- 38 monthly users
- 17 stars
- 100.0% runs succeeded
- 2.4 days response time
- Created in Oct 2023
- Modified about 2 months ago