Article Text Extractor
No credit card required
Article Text Extractor
No credit card required
Simply extracts article texts and other meta info from the given URL. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose.
Simply extracts article text and other meta info from given url. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose. Check out also lukaskrivka/article-extractor-smart.
Output get's saved into a default key-value store under the OUTPUT
key. HTML of the given page is stored under the page.html
key.
Example output:
1{ 2 "title": "Sánchez no logra extender su poder territorial pese al triunfo del 26-M", 3 "softTitle": "Sánchez no logra extender su poder territorial pese al triunfo del 26-M", 4 "date": "16/06/2019 22:03", 5 "author": [ 6 "Madrid" 7 ], 8 "publisher": "La Vanguardia", 9 "copyright": "La Vanguardia Ediciones Todos los derechos reservados", 10 "favicon": "https://www.lavanguardia.com/rsc/images/ico/favicon.ico", 11 "description": "El PSOE ganó el pasado 26 de mayo las elecciones municipales y autonómicas de manera 'clara y rotunda', según celebró el propio Pedro Sánchez aquella misma noche. Aunque la victoria socialista se tiñó...", 12 "lang": "es", 13 "canonicalLink": "https://www.lavanguardia.com/politica/20190617/462906149711/psoe-pedro-sanchez-elecciones-26m-alcaldias-gobiernos-espana.html", 14 "tags": [], 15 "image": "https://www.lavanguardia.com/r/GODO/LV/p6/WebSite/2019/06/17/Recortada/20190614-636961455890161857_20190614215051428-kvhE-U462903686315FDE-992x558@LaVanguardia-Web.jpg", 16 "videos": [], 17 "links": [], 18 "text": "..." 19}
Actor Metrics
22 monthly users
-
10 stars
>99% runs succeeded
Created in Mar 2018
Modified a year ago