Actor picture

Article Text Extractor

mtrunkat/article-text-extractor

Simply extracts article text and other meta info from given url. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose.

To run the actor, you'll need an Apify account. Simply create a new task for the actor by clicking the button below, modify the actor input configuration, click Run and get your results.

API

To run the actor from your code, send a HTTP POST request to the following API endpoint:

https://api.apify.com/v2/acts/mtrunkat~article-text-extractor/runs?token=<YOUR_API_TOKEN>

The POST payload including its Content-Type header is passed as INPUT to the actor (usually application/json). The actor is started with the default options; you can override them using various URL query parameters.

Example
curl https://api.apify.com/v2/acts/mtrunkat~article-text-extractor/runs?token=<YOUR_API_TOKEN> \
-d '{ "url": "https://techcrunch.com/2018/03/15/blue-vision-labs-which-builds-collaborative-ar-emerges-from-stealth-with-14-5m-led-by-gv/" }' \
-H 'Content-Type: application/json' \
-X POST

To use the API, you'll need to replace <YOUR_API_TOKEN> with the API token of your Apify account (view here).

For more information, view the list of actor's API endpoints or the full API reference.