Pricing

$0.01 / 1,000 valid_records

CatchAll

Submit a CatchAll job, poll until completion, and retrieve all valid records. Results are saved to the Dataset and Key-Value Store.

Pricing

$0.01 / 1,000 valid_records

Rating

0.0

(0)

Developer

Newscatcher-CatchAll

Actor stats

Bookmarked

Total users

Monthly active users

21 days ago

Last modified

CatchAll — structured web research

CatchAll transforms plain-text questions into structured, validated datasets extracted from billions of web pages. Enter a query like "Series B funding rounds for SaaS startups" and receive structured JSON records with company names, deal sizes, dates, and source citations — no scraping logic required.

CatchAll is not a traditional web scraper. It searches NewsCatcher's proprietary index of 2+ billion web pages, clusters related pages into real-world events, validates relevance, and extracts structured data — all in a single run.

What can CatchAll do?

Find specific events at scale — acquisitions, funding rounds, product launches, regulatory approvals, executive changes, and more
Return structured JSON — each record includes extracted fields, confidence scores, and source citations
Handle the full job lifecycle — this Actor submits a job, polls until completion, and retrieves all results automatically
Save results to Apify storage — records are stored in both a Dataset and a Key-value store for easy export

CatchAll pairs well with the Apify platform. Schedule recurring runs, chain with other Actors using integrations, export results via API, or send data to external services through webhooks.

How to use CatchAll

Go to the CatchAll Actor page and click Try for free.
Enter your CatchAll API key (get one at platform.newscatcherapi.com).
Type a plain-text query describing what you want to find.
Optionally adjust the record limit, or add custom validators and enrichments as JSON.
Click Save & Start. The Actor submits the job, polls for status, and retrieves results when complete.
Open the Output tab to review records, or go to Storage to download the dataset as JSON, CSV, or Excel.

A typical run takes 10–15 minutes depending on query complexity and the number of web pages processed.

Input

Field	Type	Required	Description
`apiKey`	String	Yes	Your CatchAll API key
`query`	String	Yes	Plain-text question describing what to find
`context`	String	No	Additional guidance to focus extraction
`limit`	Integer	No	Maximum number of records to return (default: 50, minimum: 11)
`validatorsJson`	String	No	JSON array of validator objects. Example: `[{"name":"is_acquisition","description":"...","type":"boolean"}]`
`enrichmentsJson`	String	No	JSON array of enrichment objects. Example: `[{"name":"acquirer_company","description":"...","type":"company"}]`
`pollIntervalSeconds`	Integer	No	How often to check job status, in seconds (default: 60)
`timeoutMinutes`	Integer	No	Stop polling after this many minutes (default: 30)
`pageSize`	Integer	No	Records to fetch per page when pulling results (default: 100)

If you leave validatorsJson and enrichmentsJson empty (or as []), CatchAll generates them automatically based on your query.

Input example

{
  "apiKey": "YOUR_CATCHALL_API_KEY",
  "query": "AI company acquisitions",
  "context": "Focus on deal size and acquiring company details",
  "limit": 10
}

Input example with custom enrichments

{
  "apiKey": "YOUR_CATCHALL_API_KEY",
  "query": "AI company acquisitions",
  "context": "Focus on deal size and acquiring company details",
  "limit": 10,
  "validatorsJson": "[{\"name\":\"is_acquisition\",\"description\":\"true if the page describes a company acquisition\",\"type\":\"boolean\"}]",
  "enrichmentsJson": "[{\"name\":\"acquiring_company\",\"description\":\"Name of the acquiring company\",\"type\":\"company\"},{\"name\":\"deal_value\",\"description\":\"Deal value in USD\",\"type\":\"number\"}]"
}

Output

Each record in the output dataset contains:

Field	Description
`record_id`	Unique identifier for the record
`record_title`	Short title summarizing the event
`enrichment`	Structured data extracted from web pages (dynamic fields)
`enrichment.enrichment_confidence`	Overall confidence score: `low`, `medium`, or `high`
`citations`	Array of source documents with title, URL, and publication date

The enrichment object uses dynamic schemas — field names are generated based on your query. For example, a funding query might return funding_amount, investee_company, and funding_date. If you need consistent field names across runs, define custom enrichments in enrichmentsJson.

Output example

{
  "record_id": "6983973854314692457",
  "record_title": "VulnCheck Raises $25M Series B Funding",
  "enrichment": {
    "enrichment_confidence": "high",
    "funding_amount": 25000000,
    "funding_currency": "USD",
    "funding_date": "2026-02-17",
    "investee_company": {
      "source_text": "VulnCheck",
      "confidence": 0.99,
      "metadata": {
        "name": "VulnCheck",
        "domain_url": "vulncheck.com",
        "domain_url_confidence": "high"
      }
    },
    "investor_company": {
      "source_text": "Sorenson Capital",
      "confidence": 0.99,
      "metadata": {
        "name": "Sorenson Capital",
        "domain_url": null,
        "domain_url_confidence": null
      }
    }
  },
  "citations": [
    {
      "title": "Exclusive: VulnCheck raises $25M funding to help companies patch software bugs",
      "link": "https://example.com/article",
      "published_date": "2026-02-17T10:00:00Z"
    }
  ]
}

Tips for effective queries

Be specific about what you're looking for. "Series B funding rounds for SaaS startups" works better than "startup funding."
Use the journalist test. If a journalist would write a news article about it, CatchAll can find it.
Target single entities or related entities. "Apple OR Google acquisitions in healthcare" is effective. Mixing unrelated topics in one query reduces accuracy.
Add context to guide extraction. Use the context field to specify what data points matter most.
Start with a small limit for testing. You can expand results later with the CatchAll Continue Actor without reprocessing.

How much does it cost?

This Actor is free to use on Apify — you only pay for Apify platform usage (compute units). However, each run consumes credits from your CatchAll API plan. Check your plan limits at platform.newscatcherapi.com.

Other CatchAll Actors

CatchAll also offers utility Actors for building custom workflows. Each maps to a single API endpoint:

Actor	Description
CatchAll Initialize	Get suggested validators, enrichments, and date ranges before submitting
CatchAll Create Job	Submit a job without polling or fetching results
CatchAll Get Job Status	Check current job status and step progress
CatchAll Pull Results	Retrieve all records from a completed job
CatchAll Early Results	Get partial results before a job completes
CatchAll Continue	Expand a job to process more records
CatchAll Create Monitor	Schedule recurring jobs
CatchAll Update Monitor	Update a monitor's webhook configuration
CatchAll Start/Stop Monitor	Pause or resume a monitor

Chain these Actors using Apify's built-in integrations and webhooks to build automated data pipelines.

FAQ

How long does a run take? A typical CatchAll job processes 50,000+ web pages and takes 10–15 minutes. The Actor polls the API automatically until the job completes or the timeout is reached (default: 30 minutes).

What are dynamic schemas? CatchAll generates response schemas dynamically for each job. Field names in the enrichment object can vary between runs, even with the same query. To get consistent field names, define custom enrichments in enrichmentsJson. Learn more in the dynamic schemas guide.

Can I get more results after a run completes? Yes. Use the CatchAll Continue Actor with the same jobId to process additional records without restarting the job.

Where can I get help?

CatchAll documentation
Write effective queries
Open an issue on the Actor's Issues tab in Apify Console

CatchAll - Pull Results

newscatcher/catchall-pull-results

Retrieve all valid records from a completed CatchAll job with automatic pagination.

Newscatcher-CatchAll

CatchAll - Get Job Status

newscatcher/catchall-get-job-status

Check the current processing status and step progress of a CatchAll job.

Newscatcher-CatchAll

CatchAll - Early Results

newscatcher/catchall-early-results

Retrieve partial results from a CatchAll job before it completes. Available once the job reaches the enriching stage.

Newscatcher-CatchAll

CatchAll - Continue

newscatcher/catchall-continue

Expand an existing CatchAll job to process more records beyond the initial limit.

Newscatcher-CatchAll

CatchAll - Create Job

newscatcher/catchall-create-job

Create a CatchAll job without polling or fetching results. Returns a job_id for use in modular workflows.

Newscatcher-CatchAll

CatchAll - Start Stop Monitor

newscatcher/catchall-start-stop-monitor

Pause or resume an existing CatchAll monitor.

Newscatcher-CatchAll

CatchAll - Create Monitor

newscatcher/catchall-create-monitor

Create a scheduled monitor for recurring CatchAll jobs with optional webhook delivery.

Newscatcher-CatchAll

CatchAll - Update Monitor

newscatcher/catchall-update-monitor

Update an existing CatchAll monitor's schedule or webhook configuration.

Newscatcher-CatchAll

CatchAll - Initialize

newscatcher/catchall-initialize

Get suggested validators, enrichments, and a date range for a query before submitting a job.

Newscatcher-CatchAll

Dataset(s) To Schema

zuzka/dataset-to-schema

Takes a Dataset ID(s) and outputs a JSON schema of the contents of the dataset into key value store.

Zuzka Pelechová

5.0

CatchAll

CatchAll — structured web research

What can CatchAll do?

How to use CatchAll

Input

Input example

Input example with custom enrichments

Output

Output example

Tips for effective queries

How much does it cost?

Other CatchAll Actors

FAQ

You might also like

CatchAll - Pull Results

CatchAll - Get Job Status

CatchAll - Early Results

CatchAll - Continue

CatchAll - Create Job

CatchAll - Start Stop Monitor

CatchAll - Create Monitor

CatchAll - Update Monitor

CatchAll - Initialize

Dataset(s) To Schema

Related articles