
Blog / Dated Content Crawler
No credit card required

Blog / Dated Content Crawler
No credit card required
Crawl an entire blog / knowledge base or filter to just the new content. Supporting relevant AI queries by filtering pages by date
Actor Metrics
9 Monthly users
5.0 / 5 (2)
3 bookmarks
97% runs succeeded
Created in Feb 2025
Modified 17 hours ago
This actor enables you to crawl blog / dated content websites. What this means is that you can filter the content by its publish date and only keep the content that is newer than a date you select. This is very useful for AI applications to avoid training your model or feeding your LLM with old / outdated / irrelevant data.
This is also useful for any application where you want to download data from websites such as documentation,help articles, or your knowledge base.
How it works
- Enter the url(s) (
startUrls
) of the pages / site you want to crawl. - Optional: Enter a start date (
startDate
) or more likely a "Relative" start date (relativeStartDate
) to filter the content by. "Relative" means that you can enter a date like "1 month" or "2 years" and the crawler will calculate the date relative to the current date each time it runs. - Run the crawler
- The crawler will retrieve only the pages that are newer than the start date (
startDate
) you entered or will retrieve all the pages if you don't enter a start date.