Article Text Extractor
Try for free
No credit card required
View all Actors
Article Text Extractor
mtrunkat/article-text-extractor
Try for free
No credit card required
Simply extracts article texts and other meta info from the given URL. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose.
Simply extracts article text and other meta info from given url. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose. Check out also lukaskrivka/article-extractor-smart.
Output get's saved into a default key-value store under the OUTPUT
key. HTML of the given page is stored under the page.html
key.
Example output:
1{ 2 "title": "Sánchez no logra extender su poder territorial pese al triunfo del 26-M", 3 "softTitle": "Sánchez no logra extender su poder territorial pese al triunfo del 26-M", 4 "date": "16/06/2019 22:03", 5 "author": [ 6 "Madrid" 7 ], 8 "publisher": "La Vanguardia", 9 "copyright": "La Vanguardia Ediciones Todos los derechos reservados", 10 "favicon": "https://www.lavanguardia.com/rsc/images/ico/favicon.ico", 11 "description": "El PSOE ganó el pasado 26 de mayo las elecciones municipales y autonómicas de manera 'clara y rotunda', según celebró el propio Pedro Sánchez aquella misma noche. Aunque la victoria socialista se tiñó...", 12 "lang": "es", 13 "canonicalLink": "https://www.lavanguardia.com/politica/20190617/462906149711/psoe-pedro-sanchez-elecciones-26m-alcaldias-gobiernos-espana.html", 14 "tags": [], 15 "image": "https://www.lavanguardia.com/r/GODO/LV/p6/WebSite/2019/06/17/Recortada/20190614-636961455890161857_20190614215051428-kvhE-U462903686315FDE-992x558@LaVanguardia-Web.jpg", 16 "videos": [], 17 "links": [], 18 "text": "..." 19}
Developer
Maintained by Community
Actor metrics
- 17 monthly users
- 10 stars
- 99.3% runs succeeded
- Created in Mar 2018
- Modified about 1 year ago
Categories