GC7 01 SimpleRequest avatar
GC7 01 SimpleRequest

Deprecated

Pricing

Pay per usage

Go to Store
GC7 01 SimpleRequest

GC7 01 SimpleRequest

Deprecated

Developed by

GC7

GC7

Maintained by Community

Simple Request title from url (Requête du titre de la page depuis une URL) - Retrieve the page title from a list of URLs and display the result in a table sorted by URL (Requête du titre de la page depuis une URL et affiche le résultat sous forme de tableau)

0.0 (0)

Pricing

Pay per usage

1

Total users

1

Monthly users

1

Last modified

a year ago

.dockerignore

# configurations
.idea
.vscode
# crawlee and apify storage folders
apify_storage
crawlee_storage
storage
# installed files
node_modules
# git folder
.git

.editorconfig

root = true
[*]
indent_style = space
indent_size = 4
charset = utf-8
trim_trailing_whitespace = true
insert_final_newline = true
end_of_line = lf

.gitignore

# This file tells Git which files shouldn't be added to source control
.DS_Store
.idea
dist
node_modules
apify_storage
storage
# Added by Apify CLI
.venv

package.json

{
"name": "Simple crawlee",
"version": "0.0.1",
"type": "module",
"description": "This is an example to learn Apify actor.",
"dependencies": {
"crawlee": "^3.10.0"
},
"scripts": {
"start": "node src/main.mjs",
"test": "echo \"Error: oops, the actor has no tests yet, sad!\" && exit 1"
},
"author": "GC7",
"license": "ISC"
}

.actor/actor.json

{
"actorSpecification": 1,
"name": "GC7-01-SimpleRequest",
"title": "Project Crawler JavaScript",
"description": "Crawlee project in JavaScript.",
"version": "0.1",
"meta": {
"templateId": "js-crawlee-puppeteer-chrome"
},
"input": "./input_schema.json",
"dockerfile": "./Dockerfile"
}

.actor/Dockerfile

# Specify the base Docker image. You can read more about
# the available images at https://crawlee.dev/docs/guides/docker-images
# You can also use any other image from Docker Hub.
# FROM apify/actor-node-puppeteer-chrome:18
FROM apify/actor-node-playwright:20
# Copy just package.json and package-lock.json
# to speed up the build using Docker layer cache.
COPY --chown=myuser package*.json ./
# Install NPM packages, skip optional and development dependencies to
# keep the image small. Avoid logging too much and print the dependency
# tree for debugging
RUN npm --quiet set progress=false \
&& npm install --omit=dev --omit=optional \
&& echo "Installed NPM packages:" \
&& (npm list --omit=dev --all || true) \
&& echo "Node.js version:" \
&& node --version \
&& echo "NPM version:" \
&& npm --version \
&& rm -r ~/.npm
# Next, copy the remaining files and directories with the source code.
# Since we do this after NPM install, quick build will be really fast
# for most source file changes.
COPY --chown=myuser . ./
# Run the image. If you know you won't need headful browsers,
# you can remove the XVFB start script for a micro perf gain.
CMD ./start_xvfb_and_run_cmd.sh && npm start --silent

.actor/input_schema.json

{
"title": "PlaywrightCrawler Template",
"type": "object",
"schemaVersion": 1,
"properties": {
"startUrls": {
"title": "Start URLs",
"type": "array",
"description": "URLs to start with.",
"editor": "requestListSources",
"prefill": [
{
"url": "https://apify.com"
}
]
}
}
}

.actor/welcome.md

1## Hello, coder!
2
3Thank you for using this actor GC7 01 (SimpleRequest)
4
5I will let you know the output of the next one…
6
7Good use, in the meantime.
8
9See you soon,
10
11---
12
13## Bonjour, codeur!
14
15Merci d'utiliser cet actor GC7 01 (SimpleRequest)
16
17À la sortie du suivant, je vous ferai signe...
18
19Bon usage, en attendant.
20
21@++,
22
23---
24
25GC7

.vscode/settings.json

{
"cSpell.words": ["apify", "crawlee"]
}

src/main.mjs

1import { RequestQueue, CheerioCrawler, log } from 'crawlee';
2
3// Uncomment the line above to see INFO logs
4// Décommentez la ligne ci-dessous pour voir les journaux INFO
5log.setLevel(log.LEVELS.WARNING);
6
7// 0 for textual result
8// 0 pour un résultat textuel
9const tableResult = 1;
10
11
12// Create queue and add your URLs to it
13// Créez une file d'attente et ajoutez vos URLs à celle-ci
14const requestQueue = await RequestQueue.open();
15const urls = ['https://crawlee.dev', 'https://google.com', 'https://c57.fr'];
16
17for (const url of urls) {
18 await requestQueue.addRequest({ url });
19}
20
21// Create the crawler and add the queue with our URL and a request handler to process the page.
22// Créez le crawler et ajoutez la file d'attente avec notre URL et un gestionnaire de requêtes pour traiter la page.
23let websites = [];
24const crawler = new CheerioCrawler({
25 requestQueue,
26 // The `$` argument is the Cheerio object which contains parsed HTML of the website.
27 // L'argument `$` est l'objet Cheerio qui contient le HTML analysé du site Web.
28 async requestHandler({ $, request }) {
29 // Extract <title> text with Cheerio. See Cheerio documentation for API docs.
30 // Extrait le texte <title> avec Cheerio. Voir la documentation de Cheerio pour les docs de l'API.
31 const title = $('title').text();
32 if (!tableResult) {
33 log.info('Result:');
34 console.log(`The title of "${request.url}" is: ${title}.`);
35 }
36 websites.push({ url: request.url, title });
37 },
38});
39// Start the crawler and wait for it to finish
40// Démarre le crawler et attendez qu'il termine
41await crawler.run(['https://apify.com']);
42
43// Show result as table, sorting by URL, without 'https://'
44// Affiche le résultat sous forme de tableau, trié par URL, sans le 'https://'
45websites = websites.map(website => ({ ...website, url: website.url.slice(8) })).sort((a, b) => a.url.localeCompare(b.url));
46if (tableResult) console.table(websites);