No credit card required

Puppeteer Scraper

apify/puppeteer-scraper

No credit card required

Crawls websites with the headless Chrome and Puppeteer library using a provided server-side Node.js code. This crawler is an alternative to apify/web-scraper that gives you finer control over the process. Supports both recursive crawling and list of URLs. Supports login to website.

All issues Create new issue

how i can exclude start url in request queue list

Closed

pizicai36 opened this issue

i set request queue name now all url save in request queue list how i can exclude start url in request queue list ？

Andrey Bykov (Andrey_Bykov)

Hey there! I don't quite understand what are you trying to achieve, could you please elaborate?

pizicai36

when i set request queue name, all url will save in request queue list, when i run new task, it show the url has been processed ,so i can't get the new url and new data all new url (detail page url) in start page , so how i can exclude start url in request queue list

Andrey Bykov (Andrey_Bykov)

I think for your use-case - just leave the request queue name empty. This way each run will use the default request queue and it will be empty at the beginning of each run.

pizicai36

now i leave the request queue name empty. but when i run again ,it shoiw error: All requests from the queue have been processed, the crawler will shut down i confirm have new url in the start page

Martin Adámek (adamek)

I see your latest runs are getting some results, did you find the problem yourself?

pizicai36

now is ok, thanks

554291 554291@qq.com

------------------ Original ------------------

Add comment

Developer

Apify

Actor metrics

287 monthly users
99.8% runs succeeded
15 days response time
Created in Apr 2019
Modified about 1 month ago

Categories

Developer tools

For creators

Web Scraper

apify/web-scraper

Crawls arbitrary websites using the Chrome browser and extracts data from pages using a provided JavaScript code. The actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.

Apify

63.4k

Website Content Crawler

apify/website-content-crawler

Automatically crawl and extract text content from websites with documentation, knowledge bases, help centers, or blogs. This Actor is designed to provide data to feed, fine-tune, or train large language models such as ChatGPT or LLaMA.

Apify

13.3k

Cheerio Scraper

apify/cheerio-scraper

Crawls websites using raw HTTP requests, parses the HTML with the Cheerio library, and extracts data from the pages using a Node.js code. Supports both recursive crawling and lists of URLs. This actor is a high-performance alternative to apify/web-scraper for websites that do not require JavaScript.

Apify

4.3k

Merge, Dedup & Transform Datasets

lukaskrivka/dedup-datasets

The ultimate dataset processor. Extremely fast merging, deduplications & transformations all in a single run.

Lukáš Křivka

1.7k

Actor fail manager

lukaskrivka/actor-fail-manager

Automatically triggered on a failed run to analyze if the run should be resurrected and to create an error report for the author.

Lukáš Křivka

2.7k

BeautifulSoup Scraper

apify/beautifulsoup-scraper

Crawls websites using raw HTTP requests. It parses the HTML with the BeautifulSoup library and extracts data from the pages using Python code. Supports both recursive crawling and lists of URLs. This Actor is a Python alternative to Cheerio Scraper.

Apify

609

Website Screenshot Generator

apify/screenshot-url

Create a screenshot of a website based on a specified URL. The screenshot is stored as the output in a key-value store. It can be used to monitor web changes regularly after setting up the scheduler.

Apify

2.4k

Anti Captcha Recaptcha

petr_cermak/anti-captcha-recaptcha

🧰 Actor for solving Google reCAPTCHA using the anti-captcha.com service. You need to have an anti-captcha subscription.

Petr Cermak

1.3k

Page Scraping Analyzer

apify/page-analyzer

Performs analysis of a webpage to figure out the best way how to scrape its data. Provide a URL and data points to find and get back a detailed dashboard showing how the data can be scraped. Works with initial and rendered HTML, JavaScript variables and dynamically loaded data.

Apify

1.1k