Article Text Extractor

  • mtrunkat/article-text-extractor
  • Modified
  • Users 653
  • Runs 118k
  • Created by Author's avatarMarek Trunk谩t

Simply extracts article texts and other meta info from the given URL. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose.

To run the code examples, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token. For a more detailed explanation, please read about running Actors via the API in Apify Docs.

# Set API token
API_TOKEN=<YOUR_API_TOKEN>

# Prepare Actor input
cat > input.json <<'EOF'
{
  "url": "https://www.bbc.com/news/world-asia-china-48659073"
}
EOF

# Run the Actor
curl "https://api.apify.com/v2/acts/mtrunkat~article-text-extractor/runs?token=$API_TOKEN" \
  -X POST \
  -d @input.json \
  -H 'Content-Type: application/json'