Datapro Financial Intelligence
Pricing
from $5.00 / 1,000 stock data scrapeds
Datapro Financial Intelligence
Scrape real-time stock prices, options chains with Greeks, SEC EDGAR filings, FRED economic indicators, and financial news with sentiment analysis. Auto-generate LLM fine-tuning datasets in Alpaca/ShareGPT format. AI-powered investment theses via Gemini 2.5. Export as JSONL, CSV, or Parquet.
Pricing
from $5.00 / 1,000 stock data scrapeds
Rating
0.0
(0)
Developer

d.leigh hunte
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
5 days ago
Last modified
Categories
Share
DataPro Financial Intelligence — Apify Actor
Production-ready financial data scraping platform with SEC EDGAR filings, macroeconomic data, technical analysis, and LLM training-data generation. Deploys as an Apify Actor for cloud-scale scraping.
Quick Start
Run Locally
# Install dependenciespip install -r requirements.txt# Set API keysexport OPENAI_API_KEY=your_keyexport FRED_API_KEY=your_key # Free: https://fred.stlouisfed.org/docs/api/api_key.htmlexport SEC_USER_AGENT="YourName your@email.com" # SEC requires identification# Start API serverpython api_server_ultimate.py
API runs at http://localhost:8000
Deploy to Apify
# Install Apify CLInpm install -g apify-cli# Login & pushapify loginapify push
Once deployed, run the Actor from the Apify Console or via API — see Apify Deployment below.
Financial Intelligence Features
| Feature | Status | Description |
|---|---|---|
| Stock Market Data | ✅ | Real-time prices, fundamentals, dividends via yfinance |
| Technical Analysis | ✅ | RSI, MACD, Bollinger Bands, moving averages, support/resistance |
| SEC EDGAR Filings | ✅ | 10-K, 10-Q, 8-K, S-1 filings with full-text search |
| FRED Economic Data | ✅ | 800 k+ macroeconomic time series (GDP, CPI, unemployment …) |
| Financial News | ✅ | Multi-source aggregation (Yahoo Finance, MarketWatch, RSS) |
| Training Data Pipeline | ✅ | Convert scraped data → Alpaca / ShareGPT / completion format |
| Apify Actor | ✅ | One-click cloud deployment with configurable input schema |
Plus the original research stack
| Feature | Status | Description |
|---|---|---|
| Deep Research | ✅ | Multi-source synthesis with 6 analyzers |
| RAG Extraction | ✅ | Clean web content for LLM pipelines |
| Structured Reports | ✅ | Executive summaries, citations |
| Fact Checking | ✅ | Cross-source claim verification |
| Iterative Depth | ✅ | Query decomposition, gap filling |
| Fine-tuning | ✅ | Full GPU support on DGX Spark |
| Multi-LLM Support | ✅ | OpenAI, Anthropic, Google, Ollama, vLLM |
| Domain Datasets | ✅ | Pre-built legal, medical, finance, tech datasets |
Apify Deployment
The project ships as a ready-to-deploy Apify Actor. The .actor/ directory and Dockerfile handle everything.
Input Schema (Apify Console)
| Parameter | Type | Default | Description |
|---|---|---|---|
mode | enum | scrape_only | scrape_only, scrape_and_train, train_only |
tickers | string[] | ["AAPL","MSFT","GOOGL"] | Stock / ETF symbols |
filing_types | string[] | ["10-K","10-Q"] | SEC filing types |
fred_series | string[] | ["GDP","UNRATE","CPIAUCSL"] | FRED series IDs |
news_sources | string[] | ["yahoo_finance","marketwatch"] | News sources |
generate_training_data | bool | false | Build LLM training set |
training_data_format | enum | alpaca | alpaca, sharegpt, completion |
max_training_examples | int | 1000 | Cap on training examples |
include_technical_analysis | bool | true | Add RSI / MACD / Bollinger |
scrape_depth | enum | standard | basic, standard, comprehensive |
Example Apify API Call
curl -X POST "https://api.apify.com/v2/acts/<YOUR_ACTOR_ID>/runs" \-H "Authorization: Bearer <APIFY_TOKEN>" \-H "Content-Type: application/json" \-d '{"mode": "scrape_and_train","tickers": ["AAPL", "NVDA", "TSLA"],"generate_training_data": true,"training_data_format": "sharegpt","scrape_depth": "comprehensive"}'
Financial Scraping Examples
SEC EDGAR Filings
from scrapers import SECEdgarScraperscraper = SECEdgarScraper()filings = scraper.scrape(["AAPL", "MSFT"], filing_types=["10-K"], max_filings=5)
FRED Economic Data
from scrapers import EconomicDataScraperscraper = EconomicDataScraper(api_key="YOUR_FRED_KEY")data = scraper.scrape(series_ids=["GDP", "UNRATE", "CPIAUCSL"])snapshot = scraper.get_economic_snapshot()
Technical Analysis
from scrapers import FinanceScraperscraper = FinanceScraper()result = scraper.scrape_comprehensive(["AAPL", "TSLA"])# Includes RSI, MACD, Bollinger Bands, support/resistance
Generate Training Data
from finetuning import FinancialTrainingPipelinepipeline = FinancialTrainingPipeline()dataset = pipeline.generate_full_dataset(market_data=market_data,sec_filings=sec_data,economic_data=econ_data,news_data=news_data,max_examples=500,)# Export in Alpaca, ShareGPT, or completion formatformatted = pipeline.format_dataset(dataset, "sharegpt")
API Endpoints
Research
# Standard researchPOST /api/research/sync{"query": "Your research question", "use_web": true}# Deep research (multi-pass)POST /api/research/deep{"query": "Complex topic", "depth": 3, "format": "report"}# Structured reportPOST /api/research/report{"query": "Topic to research"}# Fact verificationPOST /api/research/fact-check{"claim": "Statement to verify"}
Scraping
# RAG-optimized scrapingPOST /api/scrape/rag{"urls": ["https://example.com"], "output_format": "markdown"}# Stock market data + technical analysisPOST /api/scrape{"scraper": "finance", "targets": ["AAPL", "GOOGL"], "include_technicals": true}# SEC EDGAR filingsPOST /api/scrape{"scraper": "sec_edgar", "targets": ["AAPL"], "filing_types": ["10-K", "10-Q"]}# FRED economic dataPOST /api/scrape{"scraper": "economic", "series_ids": ["GDP", "UNRATE", "CPIAUCSL"]}# Financial news aggregationPOST /api/scrape{"scraper": "financial_news", "sources": ["yahoo_finance", "marketwatch"]}# NewsPOST /api/scrape{"scraper": "news", "targets": ["AI", "technology"]}
Analysis
POST /api/analyze{"agent": "sentiment_analysis", "data": {"text": "..."}}
Python Usage
from deep_research_agent import deep_research_agentfrom research_report_generator import generate_research_reportfrom web_scraper import web_scraperfrom iterative_researcher import IterativeResearcher# Standard researchresult = deep_research_agent.research("Your query", use_web=True)# Generate structured reportreport = generate_research_report(result)print(report.to_markdown())# Deep iterative researchresearcher = IterativeResearcher()deep_result = researcher.research_deep("Complex topic", depth=3)# RAG-optimized scrapingcontent = web_scraper.extract_for_rag("https://example.com")
Fine-tuning with Domain Datasets
from finetuning import (TrainingDataStudio, DomainTrainer,get_dataset, list_available_datasets)# List available pre-built datasetsprint(list_available_datasets())# ['legal', 'medical', 'finance', 'technology', 'research']# Load a pre-built datasetlegal_data = get_dataset("legal")print(f"Legal dataset: {len(legal_data)} examples")# Fine-tune on DGX Sparktrainer = DomainTrainer.from_preset("llama-8b-qlora")trainer.train(legal_data.to_alpaca_format(), epochs=3)trainer.save_adapter("./adapters/legal-expert")
Project Structure
Ultimate_DataPro/├── .actor/ # Apify Actor configuration│ ├── actor.json # Actor metadata & env vars│ ├── input_schema.json # Apify Console input form│ └── output_schema.json # Output data schema├── src/ # Apify Actor entry point│ ├── main.py # Actor logic (scrape → train → push)│ └── __main__.py # Package runner├── Dockerfile # Apify cloud build├── scrapers/ # Financial & general scrapers│ ├── finance_scraper.py # yfinance + technical analysis│ ├── sec_edgar_scraper.py # SEC EDGAR filings (10-K, 10-Q, 8-K …)│ ├── economic_data_scraper.py # FRED macroeconomic data (800k+ series)│ ├── financial_news_scraper.py # Multi-source financial news aggregator│ ├── news_scraper.py # General news scraper│ └── base_scraper.py # Abstract base class├── finetuning/ # LLM fine-tuning (DGX optimized)│ ├── financial_training_pipeline.py # Scraped data → training datasets│ ├── domain_datasets.py # Pre-built domain datasets│ ├── trainer.py # QLoRA/LoRA training with Unsloth│ ├── data_studio.py # Dataset curation studio│ └── model_hub.py # Adapter management├── api_server_ultimate.py # FastAPI server├── deep_research_agent.py # Core research engine├── research_report_generator.py # Structured reports├── iterative_researcher.py # Multi-pass research├── web_scraper.py # RAG-ready scraping├── fact_checker.py # Cross-source validation├── nlp_utils.py # Advanced NLP (spaCy, transformers)├── actors/ # Website crawler actors├── integrations/ # External connectors├── test_financial_pipeline.py # End-to-end pipeline test└── requirements.txt # All dependencies
Environment Variables
# RequiredOPENAI_API_KEY= # OpenAI API (for research & analysis)# Financial ScrapersFRED_API_KEY= # Free: https://fred.stlouisfed.org/docs/api/api_key.htmlSEC_USER_AGENT= # Your name + email (SEC requires identification)# OptionalBRAVE_API_KEY= # Brave SearchSERPER_API_KEY= # Serper.dev SearchAPIFY_TOKEN= # Apify platform token (for cloud runs)
Documentation
- ARCHITECTURE.md - System design
- SETUP_GUIDE.md - Installation details
- INTEGRATIONS.md - External connectors
- docs/ - Additional guides
License
Proprietary - All rights reserved.