![Web Scraper avatar](https://images.apifyusercontent.com/rSycYnQcYLGbeVmu0KEvfJzQCBrJH7XWIv1O6VJVk1U/rs:fill:92:92/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9tb0pSTFJjODVBaXRBcnBOTi9abjh2YldUaWthN2FuQ1FNbi1TRC0wMi0wMi5wbmc.webp)
No credit card required
![Web Scraper](https://images.apifyusercontent.com/rSycYnQcYLGbeVmu0KEvfJzQCBrJH7XWIv1O6VJVk1U/rs:fill:92:92/aHR0cHM6Ly9hcGlmeS1pbWFnZS11cGxvYWRzLXByb2QuczMuYW1hem9uYXdzLmNvbS9tb0pSTFJjODVBaXRBcnBOTi9abjh2YldUaWthN2FuQ1FNbi1TRC0wMi0wMi5wbmc.webp)
Web Scraper
No credit card required
Crawls arbitrary websites using the Chrome browser and extracts data from pages using a provided JavaScript code. The actor supports both recursive crawling and lists of URLs and automatically manages concurrency for maximum performance. This is Apify's basic tool for web crawling and scraping.
Error with "pseudoUrls", ¿it is deprecated?
Closed
I am trying to use regex for a link selection filtering, and using the pseudoUrls, its not working, but testing the pseudoUrls, it gives an OK in the detection.
I use this pseudoURL: "pseudoUrls": [ { "purl": "http[s?]://planderecuperacion.gob.es/como-acceder-a-los-fondos/convocatorias?combine=&field_estado_value%5B0%5D=Proximamente&field_estado_value%5B1%5D=Abierta&field_tipo_convocatoria_value%5BAyuda/subvencion%5D=Ayuda/subvencion&page=[\d*]" } ], and i try to catch this kind of URLs:
<a href="http://planderecuperacion.gob.es/como-acceder-a-los-fondos/convocatorias?combine=&field_estado_value%5B0%5D=Proximamente&field_estado_value%5B1%5D=Abierta&field_tipo_convocatoria_value%5BAyuda/subvencion%5D=Ayuda/subvencion&page=41" title="Ir a pagina 41"> 41 Pagina
But I got this warning:
2024-03-02T14:39:16.217Z WARN pseudoUrls
option is deprecated, use globs
or regexps
instead
And i dont know how to use that "regexps" in the web console instead of pseudoURLs.
Thanks in advance.
![adamek avatar](https://apify-image-uploads-prod.s3.amazonaws.com/EgPtw3oej6TaDt5qn/4My5YgvNjXFwBaiEw-IMG_1676_%28kopie%29.jpg)
Yes, they are deprecated (I think for more than a year now), but they should still work as before - although as a deprecated option, we won't likely fix anything on it, so adopting globs is surely something I would suggest.
My guess is your problem is the escaping of special characters (e.g. &
vs &
).
ArturoS
I finally managed to make it work, it was more an HTML problem than others. Thank you anyway :D. Nonetheless, I would love to see the regexps option in the web console, instead of the pseudoUrls, since I guess Globs only accepts wildcards and not regular expresions...
- 2.3k monthly users
- 119 stars
- 99.9% runs succeeded
- 5.2 days response time
- Created in Mar 2019
- Modified about 1 month ago