Website Content Crawler avatar
Website Content Crawler

Pricing

Pay per usage

Go to Store
Website Content Crawler

Website Content Crawler

apify/website-content-crawler

Developed by

Apify

Maintained by Apify

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.

4.6 (38)

Pricing

Pay per usage

1.1k

Monthly users

5.9k

Runs succeeded

>99%

Response time

2.3 days

Last modified

6 days ago

CT

Page Title

Open
CtrlAltElite opened this issue
23 days ago

Our company wants to know if there's a way to ask for a modification to this actor so that it could get the page's HTML "page title" for each URL / page that it crawls and provide this as a new field to the existing fields (ex: url, text etc..) Example, for a URL like this: https://www.tps.org/board_of_education We would like the actor to include the page's title as a new field in the available fields to download (ex: "Board of Education - Toledo Public Schools")

jiri.spilka avatar

Hi, thank you for using Website Content Crawler!

I'm sorry, but I don't quite understand your feature request.
When Website Content Crawler scrapes a URL, it saves detailed information, including the page title. You can see an example here: Run.

The page title is stored in metadata.title.
If you meant something else, could you provide an example run and specify what exactly is missing in the output?

Thank you, Jiri

CT

CtrlAltElite

20 days ago

Hi Jiri, The specific request is to add "title" to the available fields to download when choosing "export data set". See attached image An example run would be this: https://console.apify.com/organization/3vNBAWdW4tPWMWAre/actors/runs/jHxCHJiLQS1CNhWvd#output And then "Export Result" for Data set "rz64spECyO0DiArCW"

jakub.kopecky avatar

Hi,

Since the metadata is an object that has the title field, the title itself is not visible, but is contained within the metadata. When you actually download the CSV export file, you can access the page title in the metadata/title column.

Let me know if that solves your issue.

Jakub

Pricing

Pricing model

Pay per usage

This Actor is paid per platform usage. The Actor is free to use, and you only pay for the Apify platform usage.