Datapro Financial Intelligence avatar

Datapro Financial Intelligence

Under maintenance

Pricing

from $5.00 / 1,000 stock data scrapeds

Go to Apify Store
Datapro Financial Intelligence

Datapro Financial Intelligence

Under maintenance

Scrape real-time stock prices, options chains with Greeks, SEC EDGAR filings, FRED economic indicators, and financial news with sentiment analysis. Auto-generate LLM fine-tuning datasets in Alpaca/ShareGPT format. AI-powered investment theses via Gemini 2.5. Export as JSONL, CSV, or Parquet.

Pricing

from $5.00 / 1,000 stock data scrapeds

Rating

0.0

(0)

Developer

d.leigh hunte

d.leigh hunte

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

5 days ago

Last modified

Share

DataPro Financial Intelligence — Apify Actor

Production-ready financial data scraping platform with SEC EDGAR filings, macroeconomic data, technical analysis, and LLM training-data generation. Deploys as an Apify Actor for cloud-scale scraping.

Quick Start

Run Locally

# Install dependencies
pip install -r requirements.txt
# Set API keys
export OPENAI_API_KEY=your_key
export FRED_API_KEY=your_key # Free: https://fred.stlouisfed.org/docs/api/api_key.html
export SEC_USER_AGENT="YourName your@email.com" # SEC requires identification
# Start API server
python api_server_ultimate.py

API runs at http://localhost:8000

Deploy to Apify

# Install Apify CLI
npm install -g apify-cli
# Login & push
apify login
apify push

Once deployed, run the Actor from the Apify Console or via API — see Apify Deployment below.


Financial Intelligence Features

FeatureStatusDescription
Stock Market DataReal-time prices, fundamentals, dividends via yfinance
Technical AnalysisRSI, MACD, Bollinger Bands, moving averages, support/resistance
SEC EDGAR Filings10-K, 10-Q, 8-K, S-1 filings with full-text search
FRED Economic Data800 k+ macroeconomic time series (GDP, CPI, unemployment …)
Financial NewsMulti-source aggregation (Yahoo Finance, MarketWatch, RSS)
Training Data PipelineConvert scraped data → Alpaca / ShareGPT / completion format
Apify ActorOne-click cloud deployment with configurable input schema

Plus the original research stack

FeatureStatusDescription
Deep ResearchMulti-source synthesis with 6 analyzers
RAG ExtractionClean web content for LLM pipelines
Structured ReportsExecutive summaries, citations
Fact CheckingCross-source claim verification
Iterative DepthQuery decomposition, gap filling
Fine-tuningFull GPU support on DGX Spark
Multi-LLM SupportOpenAI, Anthropic, Google, Ollama, vLLM
Domain DatasetsPre-built legal, medical, finance, tech datasets

Apify Deployment

The project ships as a ready-to-deploy Apify Actor. The .actor/ directory and Dockerfile handle everything.

Input Schema (Apify Console)

ParameterTypeDefaultDescription
modeenumscrape_onlyscrape_only, scrape_and_train, train_only
tickersstring[]["AAPL","MSFT","GOOGL"]Stock / ETF symbols
filing_typesstring[]["10-K","10-Q"]SEC filing types
fred_seriesstring[]["GDP","UNRATE","CPIAUCSL"]FRED series IDs
news_sourcesstring[]["yahoo_finance","marketwatch"]News sources
generate_training_databoolfalseBuild LLM training set
training_data_formatenumalpacaalpaca, sharegpt, completion
max_training_examplesint1000Cap on training examples
include_technical_analysisbooltrueAdd RSI / MACD / Bollinger
scrape_depthenumstandardbasic, standard, comprehensive

Example Apify API Call

curl -X POST "https://api.apify.com/v2/acts/<YOUR_ACTOR_ID>/runs" \
-H "Authorization: Bearer <APIFY_TOKEN>" \
-H "Content-Type: application/json" \
-d '{
"mode": "scrape_and_train",
"tickers": ["AAPL", "NVDA", "TSLA"],
"generate_training_data": true,
"training_data_format": "sharegpt",
"scrape_depth": "comprehensive"
}'

Financial Scraping Examples

SEC EDGAR Filings

from scrapers import SECEdgarScraper
scraper = SECEdgarScraper()
filings = scraper.scrape(["AAPL", "MSFT"], filing_types=["10-K"], max_filings=5)

FRED Economic Data

from scrapers import EconomicDataScraper
scraper = EconomicDataScraper(api_key="YOUR_FRED_KEY")
data = scraper.scrape(series_ids=["GDP", "UNRATE", "CPIAUCSL"])
snapshot = scraper.get_economic_snapshot()

Technical Analysis

from scrapers import FinanceScraper
scraper = FinanceScraper()
result = scraper.scrape_comprehensive(["AAPL", "TSLA"])
# Includes RSI, MACD, Bollinger Bands, support/resistance

Generate Training Data

from finetuning import FinancialTrainingPipeline
pipeline = FinancialTrainingPipeline()
dataset = pipeline.generate_full_dataset(
market_data=market_data,
sec_filings=sec_data,
economic_data=econ_data,
news_data=news_data,
max_examples=500,
)
# Export in Alpaca, ShareGPT, or completion format
formatted = pipeline.format_dataset(dataset, "sharegpt")

API Endpoints

Research

# Standard research
POST /api/research/sync
{"query": "Your research question", "use_web": true}
# Deep research (multi-pass)
POST /api/research/deep
{"query": "Complex topic", "depth": 3, "format": "report"}
# Structured report
POST /api/research/report
{"query": "Topic to research"}
# Fact verification
POST /api/research/fact-check
{"claim": "Statement to verify"}

Scraping

# RAG-optimized scraping
POST /api/scrape/rag
{"urls": ["https://example.com"], "output_format": "markdown"}
# Stock market data + technical analysis
POST /api/scrape
{"scraper": "finance", "targets": ["AAPL", "GOOGL"], "include_technicals": true}
# SEC EDGAR filings
POST /api/scrape
{"scraper": "sec_edgar", "targets": ["AAPL"], "filing_types": ["10-K", "10-Q"]}
# FRED economic data
POST /api/scrape
{"scraper": "economic", "series_ids": ["GDP", "UNRATE", "CPIAUCSL"]}
# Financial news aggregation
POST /api/scrape
{"scraper": "financial_news", "sources": ["yahoo_finance", "marketwatch"]}
# News
POST /api/scrape
{"scraper": "news", "targets": ["AI", "technology"]}

Analysis

POST /api/analyze
{"agent": "sentiment_analysis", "data": {"text": "..."}}

Python Usage

from deep_research_agent import deep_research_agent
from research_report_generator import generate_research_report
from web_scraper import web_scraper
from iterative_researcher import IterativeResearcher
# Standard research
result = deep_research_agent.research("Your query", use_web=True)
# Generate structured report
report = generate_research_report(result)
print(report.to_markdown())
# Deep iterative research
researcher = IterativeResearcher()
deep_result = researcher.research_deep("Complex topic", depth=3)
# RAG-optimized scraping
content = web_scraper.extract_for_rag("https://example.com")

Fine-tuning with Domain Datasets

from finetuning import (
TrainingDataStudio, DomainTrainer,
get_dataset, list_available_datasets
)
# List available pre-built datasets
print(list_available_datasets())
# ['legal', 'medical', 'finance', 'technology', 'research']
# Load a pre-built dataset
legal_data = get_dataset("legal")
print(f"Legal dataset: {len(legal_data)} examples")
# Fine-tune on DGX Spark
trainer = DomainTrainer.from_preset("llama-8b-qlora")
trainer.train(legal_data.to_alpaca_format(), epochs=3)
trainer.save_adapter("./adapters/legal-expert")

Project Structure

Ultimate_DataPro/
├── .actor/ # Apify Actor configuration
│ ├── actor.json # Actor metadata & env vars
│ ├── input_schema.json # Apify Console input form
│ └── output_schema.json # Output data schema
├── src/ # Apify Actor entry point
│ ├── main.py # Actor logic (scrape → train → push)
│ └── __main__.py # Package runner
├── Dockerfile # Apify cloud build
├── scrapers/ # Financial & general scrapers
│ ├── finance_scraper.py # yfinance + technical analysis
│ ├── sec_edgar_scraper.py # SEC EDGAR filings (10-K, 10-Q, 8-K …)
│ ├── economic_data_scraper.py # FRED macroeconomic data (800k+ series)
│ ├── financial_news_scraper.py # Multi-source financial news aggregator
│ ├── news_scraper.py # General news scraper
│ └── base_scraper.py # Abstract base class
├── finetuning/ # LLM fine-tuning (DGX optimized)
│ ├── financial_training_pipeline.py # Scraped data → training datasets
│ ├── domain_datasets.py # Pre-built domain datasets
│ ├── trainer.py # QLoRA/LoRA training with Unsloth
│ ├── data_studio.py # Dataset curation studio
│ └── model_hub.py # Adapter management
├── api_server_ultimate.py # FastAPI server
├── deep_research_agent.py # Core research engine
├── research_report_generator.py # Structured reports
├── iterative_researcher.py # Multi-pass research
├── web_scraper.py # RAG-ready scraping
├── fact_checker.py # Cross-source validation
├── nlp_utils.py # Advanced NLP (spaCy, transformers)
├── actors/ # Website crawler actors
├── integrations/ # External connectors
├── test_financial_pipeline.py # End-to-end pipeline test
└── requirements.txt # All dependencies

Environment Variables

# Required
OPENAI_API_KEY= # OpenAI API (for research & analysis)
# Financial Scrapers
FRED_API_KEY= # Free: https://fred.stlouisfed.org/docs/api/api_key.html
SEC_USER_AGENT= # Your name + email (SEC requires identification)
# Optional
BRAVE_API_KEY= # Brave Search
SERPER_API_KEY= # Serper.dev Search
APIFY_TOKEN= # Apify platform token (for cloud runs)

Documentation

  • ARCHITECTURE.md - System design
  • SETUP_GUIDE.md - Installation details
  • INTEGRATIONS.md - External connectors
  • docs/ - Additional guides

License

Proprietary - All rights reserved.