Daily Data Feeds Scraper avatar

Daily Data Feeds Scraper

Pricing

from $0.05 / 1,000 results

Go to Apify Store
Daily Data Feeds Scraper

Daily Data Feeds Scraper

Scrapes daily datasets: VC funding, domain drops, patents, crypto prices, and news.

Pricing

from $0.05 / 1,000 results

Rating

0.0

(0)

Developer

Soft But Savage

Soft But Savage

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

18 days ago

Last modified

Categories

Share

Get structured daily records across VC funding, patent-related signals, crypto prices, and news in one Actor, with an optional experimental domain-drop feed. The output is normalized for downstream automation with stable record IDs, source URLs, ISO timestamps, is_new flags, and a per-run summary.

What does Daily Data Feeds Scraper do?

This Actor collects multiple daily feeds in one run and normalizes them into one dataset shape. It is built for scheduled collection, internal pipelines, and downstream filtering where traceability matters.

Datasets included:

  • VC Funding — Latest startup funding rounds from TechCrunch
  • Domain Drops — Experimental deleted-domain feed from a public source that is currently unstable
  • Patents — Patent-related company signals, with an explicit fallback path when direct patent sources block requests
  • Crypto Prices — Top 50 cryptocurrencies by market cap with 24h price changes
  • News — Latest articles for configurable topics (AI, startups, tech layoffs, etc.)

Why use Daily Data Feeds Scraper?

  • One run, multiple feeds — Funding, patents, crypto, and news stay in one scheduled workflow, with optional domain monitoring
  • Stable IDs — Each record carries a deterministic record_id for dedupe and change detection
  • Source traceability — Records include source_url, source name, and normalized timestamps
  • Operational visibility — Every run stores a RUN_SUMMARY record in the default key-value store
  • Cross-run change tracking — Records carry is_new and first_seen_at using a persistent named state store
  • Transparent fallbacks — When a primary source blocks requests, records can carry a source_type that shows a fallback path was used
  • Pipeline-friendly shape — Shared fields make downstream filtering and storage simpler

How to use Daily Data Feeds Scraper

  1. Click Try for free to open the Actor
  2. Configure which datasets you want, or leave defaults for the most reliable feeds
  3. Set max_items_per_dataset to control volume and cost
  4. Optionally set custom news topics
  5. Click Run to get immediate results
  6. Set up a Schedule to run daily automatically
  7. Access records via the Dataset tab and run metadata via RUN_SUMMARY

Input

{
"datasets": ["funding", "patents", "crypto_prices", "news"],
"max_items_per_dataset": 20,
"news_topics": ["startup funding", "AI", "tech layoffs"]
}
FieldTypeDefaultDescription
datasetsarrayallWhich datasets to scrape
max_items_per_datasetinteger20Maximum records to collect from each dataset
news_topicsarray["startup funding", "AI", "tech layoffs"]Topics for news scraping

Output

Results are pushed to the default dataset. Each record includes normalized core fields:

  • record_id
  • dataset
  • entity_name
  • entity_type
  • source_url
  • published_at
  • observed_at
  • first_seen_at
  • run_date
  • is_new

Example funding record:

{
"record_id": "7f22db8b2f4f4f0d8b1bbf0abf1c6221f6e7d630",
"dataset": "funding",
"entity_name": "Startup raises $50M Series B for AI platform",
"entity_type": "article",
"source_url": "https://techcrunch.com/2026/04/09/example-round/",
"title": "Startup raises $50M Series B for AI platform",
"source": "TechCrunch",
"published_at": "2026-04-09T10:00:00Z",
"description": "The company plans to use the funding to...",
"observed_at": "2026-04-14T16:30:00Z",
"run_date": "2026-04-14"
}

Example domain drop record:

{
"record_id": "7ccf3f4ef5f9f2f9b88e6027bcb73dbb287bc790",
"dataset": "domain_drops",
"entity_name": "example.com",
"entity_type": "domain",
"source_url": "https://www.expireddomains.net/deleted-com-domains/",
"domain": "example.com",
"backlinks": "1240",
"referring_domains": "87",
"observed_at": "2026-04-14T16:30:00Z",
"run_date": "2026-04-14"
}

Example crypto price record:

{
"record_id": "f6b7f0af55c3240d4fe2db85df5391d3b3fd0db5",
"dataset": "crypto_prices",
"entity_name": "Bitcoin",
"entity_type": "crypto_asset",
"source_url": "https://www.coingecko.com/en/coins/bitcoin",
"name": "Bitcoin",
"symbol": "btc",
"price_usd": 82500.00,
"change_24h_pct": -2.3,
"volume_24h": 38000000000,
"market_cap": 1630000000000,
"observed_at": "2026-04-14T16:30:00Z",
"run_date": "2026-04-14"
}

The Actor also writes a RUN_SUMMARY record to the default key-value store with per-dataset status, counts, and any captured error message.

Patent records may carry source_type: "news_fallback" when direct patent endpoints block automated access. domain_drops remains available as an input, but it is not part of the default run because its public source is unstable.

Data fields

FieldDescription
record_idStable identifier for dedupe and downstream sync
datasetType: funding, domain_drops, patents, crypto_prices, news
entity_nameNormalized entity label for the record
entity_typeNormalized type such as article, domain, patent, or crypto_asset
source_urlCanonical source URL for the record
source_typeWhether the record came from the primary source or a fallback path
published_atISO timestamp from the source when available
observed_atISO timestamp when this run captured the record
first_seen_atFirst time this Actor saw the record across runs
run_dateDate of the Actor run
is_newWhether the record is new versus previously seen

Pricing

Each Actor run costs a small amount based on compute time and results produced. Use max_items_per_dataset to keep full runs predictable and cheaper.

Estimated cost per run: $0.01–$0.05 depending on memory and result count.

Schedule daily runs to keep your data fresh for pennies per day.

Tips

  • Schedule it — Go to Saved Tasks → Schedule to run automatically every morning
  • Filter by dataset — Pass only the datasets you need to reduce compute time
  • Use reliable defaults — Leave domain_drops off unless you specifically want to test that unstable source
  • Use record IDs — Persist record_id in your own system to detect new vs already-seen records
  • Custom news topics — Set news_topics to track your specific industry or competitors
  • Inspect run health — Read RUN_SUMMARY from the default key-value store after scheduled runs

FAQ

Is this legal to use? This Actor scrapes publicly available data from public RSS feeds and public APIs. Always ensure your use case complies with the terms of service of the data sources and applicable laws in your jurisdiction.

How fresh is the data? As fresh as your last run. Schedule it daily for daily data.

Can I request additional datasets? Open an issue in the Issues tab and describe what data you need.