Playwright Scraper avatar
Playwright Scraper
Try for free

No credit card required

View all Actors
Playwright Scraper

Playwright Scraper

apify/playwright-scraper
Try for free

No credit card required

Crawls websites with the headless Chromium, Chrome, or Firefox browser and Playwright library using a provided server-side Node.js code. Supports both recursive crawling and a list of URLs. Supports login to a website.

The code examples below show how to run the Actor and get its results. To run the code, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token, which you can find under Settings > Integrations in Apify Console. Learn more

1from apify_client import ApifyClient
2
3# Initialize the ApifyClient with your Apify API token
4client = ApifyClient("<YOUR_API_TOKEN>")
5
6# Prepare the Actor input
7run_input = {
8    "startUrls": [{ "url": "https://crawlee.dev" }],
9    "globs": [{ "glob": "https://crawlee.dev/*/*" }],
10    "pseudoUrls": [],
11    "excludes": [{ "glob": "/**/*.{png,jpg,jpeg,pdf}" }],
12    "linkSelector": "a",
13    "pageFunction": """async function pageFunction(context) {
14    const { page, request, log } = context;
15    const title = await page.title();
16    log.info(`URL: ${request.url} TITLE: ${title}`);
17    return {
18        url: request.url,
19        title
20    };
21}""",
22    "proxyConfiguration": { "useApifyProxy": True },
23    "initialCookies": [],
24    "launcher": "chromium",
25    "waitUntil": "networkidle",
26    "preNavigationHooks": """// We need to return array of (possibly async) functions here.
27// The functions accept two arguments: the \"crawlingContext\" object
28// and \"gotoOptions\".
29[
30    async (crawlingContext, gotoOptions) => {
31        const { page } = crawlingContext;
32        // ...
33    },
34]""",
35    "postNavigationHooks": """// We need to return array of (possibly async) functions here.
36// The functions accept a single argument: the \"crawlingContext\" object.
37[
38    async (crawlingContext) => {
39        const { page } = crawlingContext;
40        // ...
41    },
42]""",
43    "customData": {},
44}
45
46# Run the Actor and wait for it to finish
47run = client.actor("apify/playwright-scraper").call(run_input=run_input)
48
49# Fetch and print Actor results from the run's dataset (if there are any)
50print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
51for item in client.dataset(run["defaultDatasetId"]).iterate_items():
52    print(item)
53
54# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start
Developer
Maintained by Apify
Actor metrics
  • 49 monthly users
  • 12 stars
  • 99.5% runs succeeded
  • 18 hours response time
  • Created in Aug 2022
  • Modified about 1 month ago