Website Content Crawler
No credit card required
Website Content Crawler
No credit card required
Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗 LangChain, LlamaIndex, and the wider LLM ecosystem.
Do you want to learn more about this Actor?
Get a demoHello, Two Issues:
- First: The actor is not parsing the full rest of the website but only the first URL. The run id is: k0Zi66wkCr4NdGuBf Task: custombizio/hvac-innogreen-solutions Container url: https://kyz0to5t3klc.runs.apify.net/ -Second issue: My Apify usage is showing $87 / $200 but it says I have used up all my prepaid usage? Anything over the plan limit will be charged as overage? It doesn't make any sense since I put in extra money to use the Apify.
Here is the task I am trying to run: custombizio/hvac-innogreen-solutions
I haven't been able to scrap this site. It only scraping 1 page. https://natural-resources.canada.ca/energy-efficiency/homes/canada-greener-homes-initiative/canada-greener-homes-grant/canada-greener-homes-grant/23441
Please give me some advice. Thanks, Syed Ali Custombizio
Hi Syed, thank you for trying Website Content Crawler.
Regarding scraping (Run ID: k0Zi66wkCr4NdGuBf): The Actor will only crawl sub-pages of the specified startURLs. For example, if you specify http://example.com/blog
, it will only crawl pages like http://example.com/blog/1
or http://example.com/blog/2
, but not http://example.com/new
.
You need to provide a startURL
that is generic enough to cover the desired subpages. Or you need to play with the inputGlob
.
Let me know what you’re trying to achieve, and I’ll help you set it up.
For the second issue: I’m not sure I fully understand. What you mean that you "put extra money in"? Here’s how you can check current usage: Go to console.apify → Billing → Subscription. There is a column labeled "Next Invoice." Click on View Breakdown, where it explains that your usage includes prepaid amounts from your subscription plan and redeemed coupon. However, you’ve used the platform beyond the subscription plan + redeemed the coupon, and the additional usage will be added to your invoice.
That is what I am saying. It is NOT parsing the subpages of this web URL. Here is the Run id: k0Zi66wkCr4NdGuBf And the URL: https://natural-resources.canada.ca/energy-efficiency/homes/canada-greener-homes-initiative/oil-heat-pump-affordability-program/24775
Run id: yV8jm4qofNKMNJKbO URL: https://natural-resources.canada.ca/energy-efficiency/homes/canada-greener-homes-initiative/canada-greener-homes-grant/canada-greener-homes-grant/23441
Second part is: Apify is saying the limit will reset on Dec. 8. I put extra money in my account ($200) so I don't have to worry about it limits and continue to parse.
I have added $200 in my account so how is that possible that I have used beyond the plan? Doesn't make sense to add extra money and not being able to use it.
Hi, I’m sorry for the misunderstanding.
If you want to crawl everything under energy-efficiency
, you need to start the crawl using https://natural-resources.canada.ca/energy-efficiency so that other pages will be included as well. Please see my example run, which successfully crawled 904 pages.
Please refer to documentation regarding the crawling details
The actor crawls the start URLs, finds links to other pages, and recursively crawls those pages, too, as long as their URL is under the start URL.
If your startUrl
is very specific, such as https://natural-resources.canada.ca/energy-efficiency/homes/canada-greener-homes-initiative/oil-heat-pump-affordability-program/24775, it will only scrape this URL (and any other pages that start with the same URL, i.e. are under this URL). As far as I can tell, there is no such URL at that particular page.
Regarding the payment issues, I can't see your payments. I’ll ask customer support to reach out to you about this.
I hope you were able to resolve the payment issue with our customer support. I will go ahead and close this issue for now.
If you have any other technical requests, please don’t hesitate to ask!
Actor Metrics
3.9k monthly users
-
718 stars
>99% runs succeeded
2.2 days response time
Created in Mar 2023
Modified 15 hours ago