
Page Scraping Analyzer
No credit card required

Page Scraping Analyzer
No credit card required
Performs analysis of a webpage to figure out the best way how to scrape its data. Provide a URL and data points to find and get back a detailed dashboard showing how the data can be scraped. Works with initial and rendered HTML, JavaScript variables and dynamically loaded data.
Do you want to learn more about this Actor?
Get a demoYou can access the Page Scraping Analyzer programmatically from your own applications by using the Apify API. You can choose the language preference from below. To use the Apify API, you’ll need an Apify account and your API token, found in Integrations settings in Apify Console.
1{
2 "openapi": "3.0.1",
3 "info": {
4 "version": "0.0",
5 "x-build-id": "KEWgs2hPsnc3CZ3Fe"
6 },
7 "servers": [
8 {
9 "url": "https://api.apify.com/v2"
10 }
11 ],
12 "paths": {
13 "/acts/apify~page-analyzer/run-sync-get-dataset-items": {
14 "post": {
15 "operationId": "run-sync-get-dataset-items-apify-page-analyzer",
16 "x-openai-isConsequential": false,
17 "summary": "Executes an Actor, waits for its completion, and returns Actor's dataset items in response.",
18 "tags": [
19 "Run Actor"
20 ],
21 "requestBody": {
22 "required": true,
23 "content": {
24 "application/json": {
25 "schema": {
26 "$ref": "#/components/schemas/inputSchema"
27 }
28 }
29 }
30 },
31 "parameters": [
32 {
33 "name": "token",
34 "in": "query",
35 "required": true,
36 "schema": {
37 "type": "string"
38 },
39 "description": "Enter your Apify token here"
40 }
41 ],
42 "responses": {
43 "200": {
44 "description": "OK"
45 }
46 }
47 }
48 },
49 "/acts/apify~page-analyzer/runs": {
50 "post": {
51 "operationId": "runs-sync-apify-page-analyzer",
52 "x-openai-isConsequential": false,
53 "summary": "Executes an Actor and returns information about the initiated run in response.",
54 "tags": [
55 "Run Actor"
56 ],
57 "requestBody": {
58 "required": true,
59 "content": {
60 "application/json": {
61 "schema": {
62 "$ref": "#/components/schemas/inputSchema"
63 }
64 }
65 }
66 },
67 "parameters": [
68 {
69 "name": "token",
70 "in": "query",
71 "required": true,
72 "schema": {
73 "type": "string"
74 },
75 "description": "Enter your Apify token here"
76 }
77 ],
78 "responses": {
79 "200": {
80 "description": "OK",
81 "content": {
82 "application/json": {
83 "schema": {
84 "$ref": "#/components/schemas/runsResponseSchema"
85 }
86 }
87 }
88 }
89 }
90 }
91 },
92 "/acts/apify~page-analyzer/run-sync": {
93 "post": {
94 "operationId": "run-sync-apify-page-analyzer",
95 "x-openai-isConsequential": false,
96 "summary": "Executes an Actor, waits for completion, and returns the OUTPUT from Key-value store in response.",
97 "tags": [
98 "Run Actor"
99 ],
100 "requestBody": {
101 "required": true,
102 "content": {
103 "application/json": {
104 "schema": {
105 "$ref": "#/components/schemas/inputSchema"
106 }
107 }
108 }
109 },
110 "parameters": [
111 {
112 "name": "token",
113 "in": "query",
114 "required": true,
115 "schema": {
116 "type": "string"
117 },
118 "description": "Enter your Apify token here"
119 }
120 ],
121 "responses": {
122 "200": {
123 "description": "OK"
124 }
125 }
126 }
127 }
128 },
129 "components": {
130 "schemas": {
131 "inputSchema": {
132 "type": "object",
133 "required": [
134 "url",
135 "keywords"
136 ],
137 "properties": {
138 "url": {
139 "title": "Page URL",
140 "type": "string",
141 "description": "URL of a website to by analyzed. Use a page URL that you would want to scrape. You can run this actor again for different page type (e.g. one run for search results page, another run for product detail page).",
142 "default": "http://books.toscrape.com/catalogue/a-light-in-the-attic_1000/index.html"
143 },
144 "keywords": {
145 "title": "Keywords",
146 "type": "array",
147 "description": "Keywords to be searched for during analysis",
148 "default": [
149 "A Light in the Attic",
150 "51.77",
151 "In stock",
152 "22 available",
153 "a897fe39b1053632",
154 "It's hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein's humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love th It's hard to imagine a world without A Light in the Attic. This now-classic collection of poetry and drawings from Shel Silverstein celebrates its 20th anniversary with this special edition. Silverstein's humorous and creative verse can amuse the dowdiest of readers. Lemon-faced adults and fidgety kids sit still and read these rhythmic words and laugh and smile and love that Silverstein. Need proof of his genius? RockabyeRockabye baby, in the treetopDon't you know a treetopIs no safe place to rock?And who put you up there,And your cradle, too?Baby, I think someone down here'sGot it in for you. Shel, you never sounded so good. ...more"
155 ],
156 "items": {
157 "type": "string"
158 }
159 },
160 "proxyConfig": {
161 "title": "Proxy configuration",
162 "type": "object",
163 "description": "You should use proxies for scraping. Use datacenter proxies but fallback to residential ones if blocked too much.",
164 "default": {
165 "useApifyProxy": true
166 }
167 }
168 }
169 },
170 "runsResponseSchema": {
171 "type": "object",
172 "properties": {
173 "data": {
174 "type": "object",
175 "properties": {
176 "id": {
177 "type": "string"
178 },
179 "actId": {
180 "type": "string"
181 },
182 "userId": {
183 "type": "string"
184 },
185 "startedAt": {
186 "type": "string",
187 "format": "date-time",
188 "example": "2025-01-08T00:00:00.000Z"
189 },
190 "finishedAt": {
191 "type": "string",
192 "format": "date-time",
193 "example": "2025-01-08T00:00:00.000Z"
194 },
195 "status": {
196 "type": "string",
197 "example": "READY"
198 },
199 "meta": {
200 "type": "object",
201 "properties": {
202 "origin": {
203 "type": "string",
204 "example": "API"
205 },
206 "userAgent": {
207 "type": "string"
208 }
209 }
210 },
211 "stats": {
212 "type": "object",
213 "properties": {
214 "inputBodyLen": {
215 "type": "integer",
216 "example": 2000
217 },
218 "rebootCount": {
219 "type": "integer",
220 "example": 0
221 },
222 "restartCount": {
223 "type": "integer",
224 "example": 0
225 },
226 "resurrectCount": {
227 "type": "integer",
228 "example": 0
229 },
230 "computeUnits": {
231 "type": "integer",
232 "example": 0
233 }
234 }
235 },
236 "options": {
237 "type": "object",
238 "properties": {
239 "build": {
240 "type": "string",
241 "example": "latest"
242 },
243 "timeoutSecs": {
244 "type": "integer",
245 "example": 300
246 },
247 "memoryMbytes": {
248 "type": "integer",
249 "example": 1024
250 },
251 "diskMbytes": {
252 "type": "integer",
253 "example": 2048
254 }
255 }
256 },
257 "buildId": {
258 "type": "string"
259 },
260 "defaultKeyValueStoreId": {
261 "type": "string"
262 },
263 "defaultDatasetId": {
264 "type": "string"
265 },
266 "defaultRequestQueueId": {
267 "type": "string"
268 },
269 "buildNumber": {
270 "type": "string",
271 "example": "1.0.0"
272 },
273 "containerUrl": {
274 "type": "string"
275 },
276 "usage": {
277 "type": "object",
278 "properties": {
279 "ACTOR_COMPUTE_UNITS": {
280 "type": "integer",
281 "example": 0
282 },
283 "DATASET_READS": {
284 "type": "integer",
285 "example": 0
286 },
287 "DATASET_WRITES": {
288 "type": "integer",
289 "example": 0
290 },
291 "KEY_VALUE_STORE_READS": {
292 "type": "integer",
293 "example": 0
294 },
295 "KEY_VALUE_STORE_WRITES": {
296 "type": "integer",
297 "example": 1
298 },
299 "KEY_VALUE_STORE_LISTS": {
300 "type": "integer",
301 "example": 0
302 },
303 "REQUEST_QUEUE_READS": {
304 "type": "integer",
305 "example": 0
306 },
307 "REQUEST_QUEUE_WRITES": {
308 "type": "integer",
309 "example": 0
310 },
311 "DATA_TRANSFER_INTERNAL_GBYTES": {
312 "type": "integer",
313 "example": 0
314 },
315 "DATA_TRANSFER_EXTERNAL_GBYTES": {
316 "type": "integer",
317 "example": 0
318 },
319 "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
320 "type": "integer",
321 "example": 0
322 },
323 "PROXY_SERPS": {
324 "type": "integer",
325 "example": 0
326 }
327 }
328 },
329 "usageTotalUsd": {
330 "type": "number",
331 "example": 0.00005
332 },
333 "usageUsd": {
334 "type": "object",
335 "properties": {
336 "ACTOR_COMPUTE_UNITS": {
337 "type": "integer",
338 "example": 0
339 },
340 "DATASET_READS": {
341 "type": "integer",
342 "example": 0
343 },
344 "DATASET_WRITES": {
345 "type": "integer",
346 "example": 0
347 },
348 "KEY_VALUE_STORE_READS": {
349 "type": "integer",
350 "example": 0
351 },
352 "KEY_VALUE_STORE_WRITES": {
353 "type": "number",
354 "example": 0.00005
355 },
356 "KEY_VALUE_STORE_LISTS": {
357 "type": "integer",
358 "example": 0
359 },
360 "REQUEST_QUEUE_READS": {
361 "type": "integer",
362 "example": 0
363 },
364 "REQUEST_QUEUE_WRITES": {
365 "type": "integer",
366 "example": 0
367 },
368 "DATA_TRANSFER_INTERNAL_GBYTES": {
369 "type": "integer",
370 "example": 0
371 },
372 "DATA_TRANSFER_EXTERNAL_GBYTES": {
373 "type": "integer",
374 "example": 0
375 },
376 "PROXY_RESIDENTIAL_TRANSFER_GBYTES": {
377 "type": "integer",
378 "example": 0
379 },
380 "PROXY_SERPS": {
381 "type": "integer",
382 "example": 0
383 }
384 }
385 }
386 }
387 }
388 }
389 }
390 }
391 }
392}
Page Scraping Analyzer OpenAPI definition
OpenAPI is a standard for designing and describing RESTful APIs, allowing developers to define API structure, endpoints, and data formats in a machine-readable way. It simplifies API development, integration, and documentation.
OpenAPI is effective when used with AI agents and GPTs by standardizing how these systems interact with various APIs, for reliable integrations and efficient communication.
By defining machine-readable API specifications, OpenAPI allows AI models like GPTs to understand and use varied data sources, improving accuracy. This accelerates development, reduces errors, and provides context-aware responses, making OpenAPI a core component for AI applications.
You can download the OpenAPI definitions for Page Scraping Analyzer from the options below:
If you’d like to learn more about how OpenAPI powers GPTs, read our blog post.
You can also check out our other API clients:
Actor Metrics
27 monthly users
-
11 stars
91% runs succeeded
Created in Feb 2018
Modified 8 months ago