Website Content Crawler avatar
Website Content Crawler

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler

No credit card required

Automatically crawl and extract text content from websites with documentation, knowledge bases, help centers, or blogs. This Actor is designed to provide data to feed, fine-tune, or train large language models such as ChatGPT or LLaMA.

Below you can find a list of relevant HTTP API endpoints for calling the Actor. To use them, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the URLs with your API token, which you can find under Settings > Integrations in Apify Console. For details, see the API reference .

Run Actor synchronously and get dataset items

Runs this Actor and waits for it to finish. The POST payload including its Content-Type header is passed as INPUT to the Actor (usually application/json). The HTTP response contains the Actor's dataset items, while the format of items depends on specifying dataset items' format parameter.

POST
https://api.apify.com/v2/acts/apify~website-content-crawler/run-sync-get-dataset-items?token=<YOUR_API_TOKEN>

Hint: This endpoint can be used with both POST and GET request methods, but only the POST method allows you to pass input.

Run Actor synchronously

Runs this Actor and waits for it to finish. The POST payload including its Content-Type is passed as INPUT to the Actor (usually application/json) and the OUTPUT is returned in the HTTP response. The Actor is started with the default options; you can override them using various URL query parameters. Note that long HTTP connections might break.

POST
https://api.apify.com/v2/acts/apify~website-content-crawler/run-sync?token=<YOUR_API_TOKEN>

Hint: This endpoint can be used with both POST and GET request methods, but only the POST method allows you to pass input.

Run Actor

Runs this Actor. The POST payload including its Content-Type header is passed as INPUT to the Actor (typically application/json). The Actor is started with the default options; you can override them using various URL query parameters.

POST
https://api.apify.com/v2/acts/apify~website-content-crawler/runs?token=<YOUR_API_TOKEN>

Hint: By adding the method=POST query parameter, this API endpoint can be called using a GET request and thus used in third-party webhooks.

Get Actor

Returns settings of this Actor in JSON format.

GET
https://api.apify.com/v2/acts/apify~website-content-crawler?token=<YOUR_API_TOKEN>

Get list of Actor versions

Returns a list of versions of this Actor in JSON format.

GET
https://api.apify.com/v2/acts/apify~website-content-crawler/versions?token=<YOUR_API_TOKEN>

Get list of Actor webhooks

Returns a list of webhooks of this Actor in JSON format.

GET
https://api.apify.com/v2/acts/apify~website-content-crawler/webhooks?token=<YOUR_API_TOKEN>
Developer
Apify logo Maintained by Apify
Actor metrics
  • 1.8k monthly users
  • 97.9% runs succeeded
  • 2.8 days response time
  • Modified 16 days ago

You might also like these Actors