Vanilla JS Scraper avatar
Vanilla JS Scraper
Try for free

No credit card required

View all Actors
Vanilla JS Scraper

Vanilla JS Scraper

mstephen190/vanilla-js-scraper
Try for free

No credit card required

Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.

The code examples below show how to run the Actor and get its results. To run the code, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token, which you can find under Settings > Integrations in Apify Console. Learn more

1from apify_client import ApifyClient
2
3# Initialize the ApifyClient with your Apify API token
4client = ApifyClient("<YOUR_API_TOKEN>")
5
6# Prepare the Actor input
7run_input = {
8    "requests": [{ "url": "https://apify.com" }],
9    "pseudoUrls": [{ "purl": "https://apify.com[(/[\\w-]+)?]" }],
10    "linkSelector": "a[href]",
11    "pageFunction": """async function pageFunction(context) {
12    const { window, document, crawler, enqueueRequest, request, response, userData, json, body, kvStore, customData } = context;
13
14    const title = document.querySelector('title').textContent
15
16    const responseHeaders = response.headers
17
18    return {
19        title,
20        responseHeaders
21    };
22}""",
23    "preNavigationHooks": """// We need to return array of (possibly async) functions here.
24// The functions accept two arguments: the \"crawlingContext\" object
25// and \"requestAsBrowserOptions\" which are passed to the `requestAsBrowser()`
26// function the crawler calls to navigate..
27[
28    async (crawlingContext, requestAsBrowserOptions) => {
29        // ...
30    }
31]""",
32    "postNavigationHooks": """// We need to return array of (possibly async) functions here.
33// The functions accept a single argument: the \"crawlingContext\" object.
34[
35    async (crawlingContext) => {
36        // ...
37    },
38]""",
39    "proxy": { "useApifyProxy": True },
40    "additionalMimeTypes": [],
41    "customData": {},
42}
43
44# Run the Actor and wait for it to finish
45run = client.actor("mstephen190/vanilla-js-scraper").call(run_input=run_input)
46
47# Fetch and print Actor results from the run's dataset (if there are any)
48print("💾 Check your data here: https://console.apify.com/storage/datasets/" + run["defaultDatasetId"])
49for item in client.dataset(run["defaultDatasetId"]).iterate_items():
50    print(item)
51
52# 📚 Want to learn more 📖? Go to → https://docs.apify.com/api/client/python/docs/quick-start
Developer
Maintained by Community
Actor metrics
  • 11 monthly users
  • 2 stars
  • 99.5% runs succeeded
  • Created in Mar 2022
  • Modified 10 months ago