GC7 01 SimpleRequest avatar
GC7 01 SimpleRequest
Deprecated
View all Actors
This Actor is deprecated

This Actor is unavailable because the developer has decided to deprecate it. Would you like to try a similar Actor instead?

See alternative Actors
GC7 01 SimpleRequest

GC7 01 SimpleRequest

gc7/gc7-01-simplerequest

Simple Request title from url (Requête du titre de la page depuis une URL) - Retrieve the page title from a list of URLs and display the result in a table sorted by URL (Requête du titre de la page depuis une URL et affiche le résultat sous forme de tableau)

.dockerignore

1# configurations
2.idea
3.vscode
4
5# crawlee and apify storage folders
6apify_storage
7crawlee_storage
8storage
9
10# installed files
11node_modules
12
13# git folder
14.git

.editorconfig

1root = true
2
3[*]
4indent_style = space
5indent_size = 4
6charset = utf-8
7trim_trailing_whitespace = true
8insert_final_newline = true
9end_of_line = lf

.gitignore

1# This file tells Git which files shouldn't be added to source control
2
3.DS_Store
4.idea
5dist
6node_modules
7apify_storage
8storage
9
10# Added by Apify CLI
11.venv

package.json

1{
2    "name": "Simple crawlee",
3    "version": "0.0.1",
4    "type": "module",
5    "description": "This is an example to learn Apify actor.",
6    "dependencies": {
7        "crawlee": "^3.10.0"
8    },
9    "scripts": {
10        "start": "node src/main.mjs",
11        "test": "echo \"Error: oops, the actor has no tests yet, sad!\" && exit 1"
12    },
13    "author": "GC7",
14    "license": "ISC"
15}

.actor/actor.json

1{
2	"actorSpecification": 1,
3	"name": "GC7-01-SimpleRequest",
4	"title": "Project Crawler JavaScript",
5	"description": "Crawlee project in JavaScript.",
6	"version": "0.1",
7	"meta": {
8		"templateId": "js-crawlee-puppeteer-chrome"
9	},
10	"input": "./input_schema.json",
11	"dockerfile": "./Dockerfile"
12}

.actor/Dockerfile

1# Specify the base Docker image. You can read more about
2# the available images at https://crawlee.dev/docs/guides/docker-images
3# You can also use any other image from Docker Hub.
4# FROM apify/actor-node-puppeteer-chrome:18
5FROM apify/actor-node-playwright:20
6
7# Copy just package.json and package-lock.json
8# to speed up the build using Docker layer cache.
9COPY --chown=myuser package*.json ./
10
11# Install NPM packages, skip optional and development dependencies to
12# keep the image small. Avoid logging too much and print the dependency
13# tree for debugging
14RUN npm --quiet set progress=false \
15    && npm install --omit=dev --omit=optional \
16    && echo "Installed NPM packages:" \
17    && (npm list --omit=dev --all || true) \
18    && echo "Node.js version:" \
19    && node --version \
20    && echo "NPM version:" \
21    && npm --version \
22    && rm -r ~/.npm
23
24# Next, copy the remaining files and directories with the source code.
25# Since we do this after NPM install, quick build will be really fast
26# for most source file changes.
27COPY --chown=myuser . ./
28
29
30# Run the image. If you know you won't need headful browsers,
31# you can remove the XVFB start script for a micro perf gain.
32CMD ./start_xvfb_and_run_cmd.sh && npm start --silent

.actor/input_schema.json

1{
2    "title": "PlaywrightCrawler Template",
3    "type": "object",
4    "schemaVersion": 1,
5    "properties": {
6        "startUrls": {
7            "title": "Start URLs",
8            "type": "array",
9            "description": "URLs to start with.",
10            "editor": "requestListSources",
11            "prefill": [
12                {
13                    "url": "https://apify.com"
14                }
15            ]
16        }
17    }
18}

.actor/welcome.md

1## Hello, coder!
2
3Thank you for using this actor GC7 01 (SimpleRequest)
4
5I will let you know the output of the next one…
6
7Good use, in the meantime.
8
9See you soon,
10
11---
12
13## Bonjour, codeur!
14
15Merci d'utiliser cet actor GC7 01 (SimpleRequest)
16
17À la sortie du suivant, je vous ferai signe...
18
19Bon usage, en attendant.
20
21@++,
22
23---
24
25GC7

.vscode/settings.json

1{
2    "cSpell.words": ["apify", "crawlee"]
3}

src/main.mjs

1import { RequestQueue, CheerioCrawler, log } from 'crawlee';
2
3// Uncomment the line above to see INFO logs
4// Décommentez la ligne ci-dessous pour voir les journaux INFO
5log.setLevel(log.LEVELS.WARNING);
6
7// 0 for textual result
8// 0 pour un résultat textuel
9const tableResult = 1;
10
11
12// Create queue and add your URLs to it
13// Créez une file d'attente et ajoutez vos URLs à celle-ci
14const requestQueue = await RequestQueue.open();
15const urls = ['https://crawlee.dev', 'https://google.com', 'https://c57.fr'];
16
17for (const url of urls) {
18    await requestQueue.addRequest({ url });
19}
20
21// Create the crawler and add the queue with our URL and a request handler to process the page.
22// Créez le crawler et ajoutez la file d'attente avec notre URL et un gestionnaire de requêtes pour traiter la page.
23let websites = [];
24const crawler = new CheerioCrawler({
25    requestQueue,
26    // The `$` argument is the Cheerio object which contains parsed HTML of the website.
27    // L'argument `$` est l'objet Cheerio qui contient le HTML analysé du site Web.
28    async requestHandler({ $, request }) {
29        // Extract <title> text with Cheerio. See Cheerio documentation for API docs.
30        // Extrait le texte <title> avec Cheerio. Voir la documentation de Cheerio pour les docs de l'API.
31        const title = $('title').text();
32        if (!tableResult) {
33            log.info('Result:');
34            console.log(`The title of "${request.url}" is: ${title}.`);
35        }
36        websites.push({ url: request.url, title });
37    },
38});
39// Start the crawler and wait for it to finish
40// Démarre le crawler et attendez qu'il termine
41await crawler.run(['https://apify.com']);
42
43// Show result as table, sorting by URL, without 'https://'
44// Affiche le résultat sous forme de tableau, trié par URL, sans le 'https://'
45websites = websites.map(website => ({ ...website, url: website.url.slice(8) })).sort((a, b) => a.url.localeCompare(b.url));
46if (tableResult) console.table(websites);
Developer
Maintained by Community