GPT Scraper

  • drobnikj/gpt-scraper
  • Modified
  • Users 2k
  • Runs 120.9k
  • Created by Author's avatarJakub Drobn铆k

Extract data from any website and feed it into GPT via the OpenAI API. Use ChatGPT to proofread content, analyze sentiment, summarize reviews, extract contact details, and much more.

GPT Scraper

To run the code examples, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token. For a more detailed explanation, please read about running Actors via the API in Apify Docs.

# Set API token
API_TOKEN=<YOUR_API_TOKEN>

# Prepare Actor input
cat > input.json <<'EOF'
{
  "startUrls": [
    {
      "url": "https://news.ycombinator.com/"
    }
  ],
  "globs": [],
  "linkSelector": "a[href]",
  "instructions": "Get from the page the post with the most points and returns it as JSON in format:\npostTitle\npostUrl\npointsCount",
  "targetSelector": "",
  "schema": {
    "type": "object",
    "properties": {
      "title": {
        "type": "string",
        "description": "Page title"
      },
      "description": {
        "type": "string",
        "description": "Page description"
      }
    },
    "required": [
      "title",
      "description"
    ]
  },
  "proxyConfiguration": {
    "useApifyProxy": true
  }
}
EOF

# Run the Actor
curl "https://api.apify.com/v2/acts/drobnikj~gpt-scraper/runs?token=$API_TOKEN" \
  -X POST \
  -d @input.json \
  -H 'Content-Type: application/json'