GPT Scraper avatar

GPT Scraper

Try for free

Pay $9.00 for 1,000 pages

Go to Store
GPT Scraper

GPT Scraper

drobnikj/gpt-scraper
Try for free

Pay $9.00 for 1,000 pages

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

Do you want to learn more about this Actor?

Get a demo
JoseJet avatar

prompt gets filled with base64 representation of image

Open

Pepa <b>J</b> (JoseJet) opened this issue
14 days ago

For this input, the Actor returned wrong answer, when we checked the MD generated prompt from KV-store it was overflowed with base64 inline representation from one of the images on the website.

When we remove the img, link tags everything works fine. I do not think that including inline base64 encoded source of image into the prompt should be default behavior.

1{
2  "dynamicContentWaitSecs": 10,
3  "instructions": "Gets the amount of results on the page and return it as single number in JSON format: \njobAmount",
4  "pageFormatInRequest": "Markdown",
5  "proxyConfiguration": {
6    "useApifyProxy": true
7  },
8  "removeElementsCssSelector": "script, style, noscript, path, svg, xlink",
9  "removeLinkUrls": true,
10  "saveSnapshots": true,
11  "schema": {
12    "type": "object",
13    "properties": {
14      "jobAmount": {
15        "type": "number",
16        "description": "results"
17      }
18    },
19    "required": [
20      "jobAmount"
21    ]
22  },
23  "startUrls": [
24    {
25      "url": "https://workforcenow.adp.com/mascsr/default/mdf/recruitment/recruitment.html?cid=e4f6ff38-1bcd-40e3-b778-ec98a30f2192&ccId=19000101_000001&type=MP&lang=en_US&selectedMenuKey=CurrentOpenings",
26      "method": "GET"
27    }
28  ],
29  "useStructureOutput": false,
30  "includeUrlGlobs": [],
31  "excludeUrlGlobs": [],
32  "maxCrawlingDepth": 99999999,
33  "maxPagesPerCrawl": 10,
34  "initialCookies": [],
35  "temperature": "0",
36  "topP": "1",
37  "frequencyPenalty": "0",
38  "presencePenalty": "0"
39}
lukas.prusa avatar

Thanks for reporting this!

Yeah, this should definitely not be the default, it should get processed out of the page content before it is sent to the GPT. We will investigate and fix this :)

Developer
Maintained by Apify

Actor Metrics

  • 150 monthly users

  • 69 stars

  • >99% runs succeeded

  • 2.6 days response time

  • Created in Mar 2023

  • Modified 8 days ago