
Vanilla JS Scraper
No credit card required

Vanilla JS Scraper
No credit card required
Scrape the web using familiar JavaScript methods! Crawls websites using raw HTTP requests, parses the HTML with the JSDOM package, and extracts data from the pages using Node.js code. Supports both recursive crawling and lists of URLs. This actor is a non jQuery alternative to CheerioScraper.
Actor Metrics
10 Monthly users
No reviews yet
4 bookmarks
>99% runs succeeded
Created in Mar 2022
Modified a year ago
You can access the Vanilla JS Scraper programmatically from your own applications by using the Apify API. You can also choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.
1{
2 "openapi": "3.0.1",
3 "info": {
4 "version": "0.0",
5 "x-build-id": "KNr2xqkLyGdERqbgE"
6 },
7 "servers": [
8 {
9 "url": "https://api.apify.com/v2"
10 }
11 ],
12 "paths": {
13 "/acts/mstephen190~vanilla-js-scraper/run-sync-get-dataset-items": {
14 "post": {
15 "operationId": "run-sync-get-dataset-items-mstephen190-vanilla-js-scraper",
16 "x-openai-isConsequential": false,
17 "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
18 "tags": [
19 "Run Actor"
20 ],
21 "requestBody": {
22 "required": true,
23 "content": {
24 "application/json": {
25 "schema": {
26 "$ref": "#/components/schemas/inputSchema"
27 }
28 }
29 }
30 },
31 "parameters": [
32 {
33 "name": "token",
34 "in": "query",
35 "required": true,
36 "schema": {
37 "type": "string"
38 },
39 "description": "Enter your Apify token here"
40 }
41 ],
42 "responses": {
43 "200": {
44 "description": "OK"
45 }
46 }
47 }
48 },
49 "/acts/mstephen190~vanilla-js-scraper/runs": {
50 "post": {
51 "operationId": "runs-sync-mstephen190-vanilla-js-scraper",
52 "x-openai-isConsequential": false,
53 "summary": "Executes an Actor and returns information about the initiated run in response.",
54 "tags": [
55 "Run Actor"
56 ],
57 "requestBody": {
58 "required": true,
59 "content": {
60 "application/json": {
61 "schema": {
62 "$ref": "#/components/schemas/inputSchema"
63 }
64 }
65 }
66 },
67 "parameters": [
68 {
69 "name": "token",
70 "in": "query",
71 "required": true,
72 "schema": {
73 "type": "string"
74 },
75 "description": "Enter your Apify token here"
76 }
77 ],
78 "responses": {
79 "200": {
80 "description": "OK",
81 "content": {
82 "application/json": {
83 "schema": {
84 "$ref": "#/components/schemas/runsResponseSchema"
85 }
86 }
87 }
88 }
89 }
90 }
91 },
92 "/acts/mstephen190~vanilla-js-scraper/run-sync": {
93 "post": {
94 "operationId": "run-sync-mstephen190-vanilla-js-scraper",
95 "x-openai-isConsequential": false,
96 "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
97 "tags": [
98 "Run Actor"
99 ],
100 "requestBody": {
101 "required": true,
102 "content": {
103 "application/json": {
104 "schema": {
105 "$ref": "#/components/schemas/inputSchema"
106 }
107 }
108 }
109 },
110 "parameters": [
111 {
112 "name": "token",
113 "in": "query",
114 "required": true,
115 "schema": {
116 "type": "string"
117 },
118 "description": "Enter your Apify token here"
119 }
120 ],
121 "responses": {
122 "200": {
123 "description": "OK"
124 }
125 }
126 }
127 }
128 },
129 "components": {
130 "schemas": {
131 "inputSchema": {
132 "type": "object",
133 "required": [
134 "requests",
135 "pageFunction"
136 ],
137 "properties": {
138 "requests": {
139 "title": "Requests",
140 "type": "array",
141 "description": "A static list of URLs to scrape. <br><br>For details, see the <a href='https://apify.com/apify/cheerio-scraper#start-urls' target='_blank' rel='noopener'>Start URLs</a> section in the README.",
142 "items": {
143 "type": "object",
144 "required": [
145 "url"
146 ],
147 "properties": {
148 "url": {
149 "type": "string",
150 "title": "URL of a web page",
151 "format": "uri"
152 }
153 }
154 }
155 },
156 "pseudoUrls": {
157 "title": "Pseudo-URLs",
158 "type": "array",
159 "description": "Specifies what kind of URLs found by the <b>Link selector</b> should be added to the request queue. A pseudo-URL is a URL with <b>regular expressions</b> enclosed in <code>[]</code> brackets, e.g. <code>http://www.example.com/[.*]</code>. <br><br>If <b>Pseudo-URLs</b> are omitted, the actor enqueues all links matched by the <b>Link selector</b>.<br><br>For details, see <a href='https://apify.com/apify/cheerio-scraper#pseudo-urls' target='_blank' rel='noopener'>Pseudo-URLs</a> in README.",
160 "default": [],
161 "items": {
162 "type": "object",
163 "required": [
164 "purl"
165 ],
166 "properties": {
167 "purl": {
168 "type": "string",
169 "title": "Pseudo-URL of a web page"
170 }
171 }
172 }
173 },
174 "linkSelector": {
175 "title": "Link selector",
176 "type": "string",
177 "description": "A CSS selector stating which links on the page (<code><a></code> elements with <code>href</code> attribute) shall be followed and added to the request queue. To filter the links added to the queue, use the <b>Pseudo-URLs</b> field.<br><br>If the <b>Link selector</b> is empty, the page links are ignored.<br><br>For details, see the <a href='https://apify.com/apify/cheerio-scraper#link-selector' target='_blank' rel='noopener'>Link selector</a> in README."
178 },
179 "pageFunction": {
180 "title": "Page function",
181 "type": "string",
182 "description": "A JavaScript function that is executed for every page loaded server-side in Node.js 12. Use it to scrape data from the page, perform actions or add new URLs to the request queue.<br><br>For details, see <a href='https://apify.com/apify/cheerio-scraper#page-function' target='_blank' rel='noopener'>Page function</a> in README."
183 },
184 "preNavigationHooks": {
185 "title": "Pre-navigation hooks",
186 "type": "string",
187 "description": "Async functions that are sequentially evaluated before the navigation. Good for setting additional cookies or browser properties before navigation. The function accepts two parameters, `crawlingContext` and `requestAsBrowserOptions`, which are passed to the `requestAsBrowser()` function the crawler calls to navigate."
188 },
189 "postNavigationHooks": {
190 "title": "Post-navigation hooks",
191 "type": "string",
192 "description": "Async functions that are sequentially evaluated after the navigation. Good for checking if the navigation was successful. The function accepts `crawlingContext` as the only parameter."
193 },
194 "proxy": {
195 "title": "Proxy configuration",
196 "type": "object",
197 "description": "Specifies proxy servers that will be used by the scraper in order to hide its origin.<br><br>For details, see <a href='https://apify.com/apify/cheerio-scraper#proxy-configuration' target='_blank' rel='noopener'>Proxy configuration</a> in README.",
198 "default": {
199 "useApifyProxy": false
200 }
201 },
202 "debug": {
203 "title": "Debug log",
204 "type": "boolean",
205 "description": "Include debug messages in the log?",
206 "default": false
207 },
208 "maxConcurrency": {
209 "title": "Max concurrency",
210 "minimum": 1,
211 "type": "integer",
212 "description": "Specifies the maximum number of pages that can be processed by the scraper in parallel. The scraper automatically increases and decreases concurrency based on available system resources. This option enables you to set an upper limit, for example to reduce the load on a target web server.",
213 "default": 50
214 },
215 "maxRequestRetries": {
216 "title": "Max request retries",
217 "minimum": 0,
218 "type": "integer",
219 "description": "The maximum number of times the scraper will retry to load each web page on error, in case of a page load error or an exception thrown by the <b>Page function</b>.<br><br>If set to <code>0</code>, the page will be considered failed right after the first error.",
220 "default": 3
221 },
222 "pageLoadTimeoutSecs": {
223 "title": "Page load timeout",
224 "minimum": 1,
225 "type": "integer",
226 "description": "The maximum amount of time the scraper will wait for a web page to load, in seconds. If the web page does not load in this timeframe, it is considered to have failed and will be retried (subject to <b>Max page retries</b>), similarly as with other page load errors.",
227 "default": 60
228 },
229 "pageFunctionTimeoutSecs": {
230 "title": "Page function timeout",
231 "minimum": 1,
232 "type": "integer",
233 "description": "The maximum amount of time the scraper will wait for the <b>Page function</b> to execute, in seconds. It is always a good idea to set this limit, to ensure that unexpected behavior in page function will not get the scraper stuck.",
234 "default": 60
235 },
236 "ignoreSslErrors": {
237 "title": "Ignore SSL errors",
238 "type": "boolean",
239 "description": "If enabled, the scraper will ignore SSL/TLS certificate errors. Use at your own risk.",
240 "default": false
241 },
242 "additionalMimeTypes": {
243 "title": "Additional MIME types",
244 "type": "array",
245 "description": "A JSON array specifying additional MIME content types of web pages to support. By default, Cheerio Scraper supports the <code>text/html</code> and <code>application/xhtml+xml</code> content types, and skips all other resources. For details, see <a href='https://apify.com/apify/cheerio-scraper#content-types' target='_blank' rel='noopener'>Content types</a> in README.",
246 "default": [],
247 "items": {
248 "type": "string"
249 }
250 },
251 "datasetName": {
252 "title": "Dataset name",
253 "type": "string",
254 "description": "Name or ID of the dataset that will be used for storing results. If left empty, the default dataset of the run will be used."
255 },
256 "keyValueStoreName": {
257 "title": "Key-value store name",
258 "type": "string",
259 "description": "Name or ID of the key-value store that will be used for storing records. If left empty, the default key-value store of the run will be used."
260 },
261 "customData": {
262 "title": "Custom data",
263 "type": "object",
264 "description": "A custom JSON object that is passed to the <b>Page function</b> as <code>context.customData</code>. This setting is useful when invoking the scraper via API, in order to pass some arbitrary parameters to your code.",
265 "default": {}
266 }
267 }
268 },
269 "runsResponseSchema": {
270 "type": "object",
271 "properties": {
272 "data": {
273 "type": "object",
274 "properties": {
275 "id": {
276 "type": "string"
277 },
278 "actId": {
279 "type": "string"
280 },
281 "userId": {
282 "type": "string"
283 },
284 "startedAt": {
285 "type": "string",
286 "format": "date-time",
287 "example": "2025-01-08T00:00:00.000Z"
288 },
289 "finishedAt": {
290 "type": "string",
291 "format": "date-time",
292 "example": "2025-01-08T00:00:00.000Z"
293 },
294 "status": {
295 "type": "string",
296 "example": "READY"
297 },
298 "meta": {
299 "type": "object",
300 "properties": {
301 "origin": {
302 "type": "string",
303 "example": "API"
304 },
305 "userAgent": {
306 "type": "string"
307 }
308 }
309 },
310 "stats": {
311 "type": "object",
312 "properties": {
313 "inputBodyLen": {
314 "type": "integer",
315 "example": 2000
316 },
317 "rebootCount": {
318 "type": "integer",
319 "example": 0
320 },
321 "restartCount": {
322 "type": "integer",
323 "example": 0
324 },
325 "resurrectCount": {
326 "type": "integer",
327 "example": 0
328 },
329 "computeUnits": {
330 "type": "integer",
331 "example": 0
332 }
333 }
334 },
335 "options": {
336 "type": "object",
337 "properties": {
338 "build": {
339 "type": "string",
340 "example": "latest"
341 },
342 "timeoutSecs": {
343 "type": "integer",
344 "example": 300
345 },
346 "memoryMbytes": {
347 "type": "integer",
348 "example": 1024
349 },
350 "diskMbytes": {
351 "type": "integer",
352 "example": 2048
353 }
354 }
355 },
356 "buildId": {
357 "type": "string"
358 },
359 "defaultKeyValueStoreId": {
360 "type": "string"
361 },
362 "defaultDatasetId": {
363 "type": "string"
364 },
365 "defaultRequestQueueId": {
366 "type": "string"
367 },
368 "buildNumber": {
369 "type": "string",
370 "example": "1.0.0"
371 },
372 "containerUrl": {
373 "type": "string"
374 },
375 "usage": {
376 "type": "object",
377 "properties": {
378 "ACTOR_COMPUTE_UNITS": {
379 "type": "integer",
380 "example": 0
381 },
382 "DATASET_READS": {
383 "type": "integer",
384 "example": 0
385 },
386 "DATASET_WRITES": {
387 "type": "integer",
388 "example": 0
389 },
390 "KEY_VALUE_STORE_READS": {
391 "type": "integer",
392 "example": 0
393 },
394 "KEY_VALUE_STORE_WRITES": {
395 "type": "integer",
396 "example": 1
397 },
398 "KEY_VALUE_STORE_LISTS": {
399 "type": "integer",
400 "example": 0
401 },
402 "REQUEST_QUEUE_READS": {
403 "type": "integer",
404 "example": 0
405 },
406 "REQUEST_QUEUE_WRITES": {
407 "type": "integer",
408 "example": 0
409 },
410 "DATA_TRANSFER_INTERNAL_GBYTES": {
411 "type": "integer",
412 "example": 0
413 },
414 "DATA_TRANSFER_EXTERNAL_GBYTES": {
415 "type": "integer",
416 "example": 0
417 },
418 "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
419 "type": "integer",
420 "example": 0
421 },
422 "PROXY_SERPS": {
423 "type": "integer",
424 "example": 0
425 }
426 }
427 },
428 "usageTotalUsd": {
429 "type": "number",
430 "example": 0.00005
431 },
432 "usageUsd": {
433 "type": "object",
434 "properties": {
435 "ACTOR_COMPUTE_UNITS": {
436 "type": "integer",
437 "example": 0
438 },
439 "DATASET_READS": {
440 "type": "integer",
441 "example": 0
442 },
443 "DATASET_WRITES": {
444 "type": "integer",
445 "example": 0
446 },
447 "KEY_VALUE_STORE_READS": {
448 "type": "integer",
449 "example": 0
450 },
451 "KEY_VALUE_STORE_WRITES": {
452 "type": "number",
453 "example": 0.00005
454 },
455 "KEY_VALUE_STORE_LISTS": {
456 "type": "integer",
457 "example": 0
458 },
459 "REQUEST_QUEUE_READS": {
460 "type": "integer",
461 "example": 0
462 },
463 "REQUEST_QUEUE_WRITES": {
464 "type": "integer",
465 "example": 0
466 },
467 "DATA_TRANSFER_INTERNAL_GBYTES": {
468 "type": "integer",
469 "example": 0
470 },
471 "DATA_TRANSFER_EXTERNAL_GBYTES": {
472 "type": "integer",
473 "example": 0
474 },
475 "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
476 "type": "integer",
477 "example": 0
478 },
479 "PROXY_SERPS": {
480 "type": "integer",
481 "example": 0
482 }
483 }
484 }
485 }
486 }
487 }
488 }
489 }
490 }
491}
Vanilla JS Scraper OpenAPI definition
OpenAPI is a standard for designing and describing RESTful APIs, allowing developers to define API structure, endpoints, and data formats in a machine-readable way. It simplifies API development, integration, and documentation.
OpenAPI is effective when used with AI agents and GPTs by standardizing how these systems interact with various APIs, for reliable integrations and efficient communication.
By defining machine-readable API specifications, OpenAPI allows AI models like GPTs to understand and use varied data sources, improving accuracy. This accelerates development, reduces errors, and provides context-aware responses, making OpenAPI a core component for AI applications.
You can download the OpenAPI definitions for Vanilla JS Scraper from the options below:
If you’d like to learn more about how OpenAPI powers GPTs, read our blog post.
You can also check out our other API clients: