Under maintenance

Pricing

Pay per usage

Try for free

Go to Apify Store

My Actorrr

Under maintenance

Try for free

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Catalina

Actor stats

Bookmarked

Total users

Monthly active users

a year ago

Last modified

.actor/Dockerfile

# Specify the base Docker image. You can read more about
# the available images at https://docs.apify.com/sdk/js/docs/guides/docker-images
# You can also use any other image from Docker Hub.
FROM apify/actor-node:20

# Check preinstalled packages
RUN npm ls crawlee apify puppeteer playwright

# Copy just package.json and package-lock.json
# to speed up the build using Docker layer cache.
COPY package*.json ./

# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
    && npm install --omit=dev --omit=optional \
    && echo "Installed NPM packages:" \
    && (npm list --omit=dev --all || true) \
    && echo "Node.js version:" \
    && node --version \
    && echo "NPM version:" \
    && npm --version \
    && rm -r ~/.npm

# Next, copy the remaining files and directories with the source code.
# Since we do this after NPM install, quick build will be really fast
# for most source file changes.
COPY . ./


# Run the image.
CMD npm start --silent

.actor/actor.json

{
    "actorSpecification": 1,
    "name": "my-actor",
    "title": "Scrape single page in JavaScript",
    "description": "Scrape data from single page with provided URL.",
    "version": "0.0",
    "meta": {
        "templateId": "js-start"
    },
    "input": "./input_schema.json",
    "dockerfile": "./Dockerfile"
}

.actor/input_schema.json

{
    "title": "Scrape data from a web page",
    "type": "object",
    "schemaVersion": 1,
    "properties": {
        "url": {
            "title": "URL of the page",
            "type": "string",
            "description": "The URL of website you want to get the data from.",
            "editor": "textfield",
            "prefill": "https://www.apify.com/"
        }
    },
    "required": ["url"]
}

.dockerignore

# configurations
.idea

# crawlee and apify storage folders
apify_storage
crawlee_storage
storage

# installed files
node_modules

# git folder
.git

.gitignore

# This file tells Git which files shouldn't be added to source control
.DS_Store
.idea
dist
node_modules
apify_storage
storage/*
!storage/key_value_stores
storage/key_value_stores/*
!storage/key_value_stores/default
storage/key_value_stores/default/*
!storage/key_value_stores/default/INPUT.json

package.json

{
    "name": "js-scrape-single-page",
    "version": "0.0.1",
    "type": "module",
    "description": "This is an example of an Apify actor.",
    "engines": {
        "node": ">=18.0.0"
    },
    "dependencies": {
        "apify": "^3.2.6",
        "axios": "^1.5.0",
        "cheerio": "^1.0.0-rc.12"
    },
    "scripts": {
        "start": "node ./src/main.js",
        "test": "echo \"Error: oops, the actor has no tests yet, sad!\" && exit 1"
    },
    "author": "It's not you it's me",
    "license": "ISC"
}

src/main.js

1// Axios - Promise based HTTP client for the browser and node.js (Read more at https://axios-http.com/docs/intro).
2import axios from 'axios';
3// Cheerio - The fast, flexible & elegant library for parsing and manipulating HTML and XML (Read more at https://cheerio.js.org/).
4import * as cheerio from 'cheerio';
5// Apify SDK - toolkit for building Apify Actors (Read more at https://docs.apify.com/sdk/js/).
6import { Actor } from 'apify';
7// this is ESM project, and as such, it requires you to specify extensions in your relative imports
8// read more about this here: https://nodejs.org/docs/latest-v18.x/api/esm.html#mandatory-file-extensions
9// import { router } from './routes.js';
10
11// The init() call configures the Actor for its environment. It's recommended to start every Actor with an init().
12await Actor.init();
13
14// Structure of input is defined in input_schema.json
15const input = await Actor.getInput();
16const { url } = input;
17
18// Fetch the HTML content of the page.
19const response = await axios.get(url);
20
21// Parse the downloaded HTML with Cheerio to enable data extraction.
22const $ = cheerio.load(response.data);
23
24// Extract all headings from the page (tag name and text).
25const headings = [];
26$("h1, h2, h3, h4, h5, h6").each((i, element) => {
27    const headingObject = {
28        level: $(element).prop("tagName").toLowerCase(),
29        text: $(element).text(),
30    };
31    console.log("Extracted heading", headingObject);
32    headings.push(headingObject);
33});
34
35// Save headings to Dataset - a table-like storage.
36await Actor.pushData(headings);
37
38// Gracefully exit the Actor process. It's recommended to quit all Actors with an exit().
39await Actor.exit();

My Actor

bedazzled_insanity/my-actor

k v

My Actor

oh_shit_here_we_go_again/my-actor

bandi

My Actor

bigtreemin/my-actor

Min Zhou

My Actor

lurid_quadrille/my-actor

Hari Hari

My Actor

serper/my-actor

Serper

My Actor

open_ai/my-actor

OpenAi

My Actor

prospeo/my-actor

Prospeo

My Actor

radiant_notation/my-actor

Saif Reda

My Actor

storeleads/my-actor

Storeleads

My Actor

flow_matic/my-actor

Flow Matic

# Specify the base Docker image. You can read more about # the available images at https://docs.apify.com/sdk/js/docs/guides/docker-images # You can also use any other image from Docker Hub. FROM apify/actor-node:20 # Check preinstalled packages RUN npm ls crawlee apify puppeteer playwright # Copy just package.json and package-lock.json # to speed up the build using Docker layer cache. COPY package*.json ./ # Install NPM packages, skip optional and development dependencies to # keep the image small. Avoid logging too much and print the dependency # tree for debugging RUN npm --quiet set progress=false \ && npm install --omit=dev --omit=optional \ && echo "Installed NPM packages:" \ && (npm list --omit=dev --all || true) \ && echo "Node.js version:" \ && node --version \ && echo "NPM version:" \ && npm --version \ && rm -r ~/.npm # Next, copy the remaining files and directories with the source code. # Since we do this after NPM install, quick build will be really fast # for most source file changes. COPY . ./ # Run the image. CMD npm start --silent

{ "actorSpecification": 1, "name": "my-actor", "title": "Scrape single page in JavaScript", "description": "Scrape data from single page with provided URL.", "version": "0.0", "meta": { "templateId": "js-start" }, "input": "./input_schema.json", "dockerfile": "./Dockerfile" }

{ "title": "Scrape data from a web page", "type": "object", "schemaVersion": 1, "properties": { "url": { "title": "URL of the page", "type": "string", "description": "The URL of website you want to get the data from.", "editor": "textfield", "prefill": "https://www.apify.com/" } }, "required": ["url"] }

# This file tells Git which files shouldn't be added to source control .DS_Store .idea dist node_modules apify_storage storage/* !storage/key_value_stores storage/key_value_stores/* !storage/key_value_stores/default storage/key_value_stores/default/* !storage/key_value_stores/default/INPUT.json

{ "name": "js-scrape-single-page", "version": "0.0.1", "type": "module", "description": "This is an example of an Apify actor.", "engines": { "node": ">=18.0.0" }, "dependencies": { "apify": "^3.2.6", "axios": "^1.5.0", "cheerio": "^1.0.0-rc.12" }, "scripts": { "start": "node ./src/main.js", "test": "echo \"Error: oops, the actor has no tests yet, sad!\" && exit 1" }, "author": "It's not you it's me", "license": "ISC" }

1// Axios - Promise based HTTP client for the browser and node.js (Read more at https://axios-http.com/docs/intro). 2import axios from 'axios'; 3// Cheerio - The fast, flexible & elegant library for parsing and manipulating HTML and XML (Read more at https://cheerio.js.org/). 4import * as cheerio from 'cheerio'; 5// Apify SDK - toolkit for building Apify Actors (Read more at https://docs.apify.com/sdk/js/). 6import { Actor } from 'apify'; 7// this is ESM project, and as such, it requires you to specify extensions in your relative imports 8// read more about this here: https://nodejs.org/docs/latest-v18.x/api/esm.html#mandatory-file-extensions 9// import { router } from './routes.js'; 10 11// The init() call configures the Actor for its environment. It's recommended to start every Actor with an init(). 12await Actor.init(); 13 14// Structure of input is defined in input_schema.json 15const input = await Actor.getInput(); 16const { url } = input; 17 18// Fetch the HTML content of the page. 19const response = await axios.get(url); 20 21// Parse the downloaded HTML with Cheerio to enable data extraction. 22const $ = cheerio.load(response.data); 23 24// Extract all headings from the page (tag name and text). 25const headings = []; 26$("h1, h2, h3, h4, h5, h6").each((i, element) => { 27 const headingObject = { 28 level: $(element).prop("tagName").toLowerCase(), 29 text: $(element).text(), 30 }; 31 console.log("Extracted heading", headingObject); 32 headings.push(headingObject); 33}); 34 35// Save headings to Dataset - a table-like storage. 36await Actor.pushData(headings); 37 38// Gracefully exit the Actor process. It's recommended to quit all Actors with an exit(). 39await Actor.exit();

My Actorrr

.actor/Dockerfile

.actor/actor.json

.actor/input_schema.json

.dockerignore

.gitignore

package.json

src/main.js

You might also like

My Actor

My Actor

My Actor

My Actor

My Actor

My Actor

My Actor

My Actor

My Actor

My Actor

.actor/Dockerfile

.actor/actor.json

.actor/input_schema.json

.dockerignore

.gitignore

package.json

src/main.js