Article Text Extractor

  • mtrunkat/article-text-extractor
  • Modified
  • Users 653
  • Runs 117.9k
  • Created by Author's avatarMarek Trunkát

Simply extracts article texts and other meta info from the given URL. Uses https://github.com/ageitgey/node-unfluff which is a NodeJS implementation of https://github.com/grangier/python-goose.

To run the code examples, you need to have an Apify account. Replace <YOUR_API_TOKEN> in the code with your API token. For a more detailed explanation, please read about running Actors via the API in Apify Docs.

from apify_client import ApifyClient

# Initialize the ApifyClient with your API token
client = ApifyClient("<YOUR_API_TOKEN>")

# Prepare the Actor input
run_input = { "url": "https://www.bbc.com/news/world-asia-china-48659073" }

# Run the Actor and wait for it to finish
run = client.actor("mtrunkat/article-text-extractor").call(run_input=run_input)

# Fetch and print Actor results from the run's dataset (if there are any)
for item in client.dataset(run["defaultDatasetId"]).iterate_items():
    print(item)