Deal Scraper PRNewswire
Pricing
$19.95 / 1,000 results
Go to Apify Store

Deal Scraper PRNewswire
Scrapes last 10 articles published on M&A section of PRNewswire and provides relevant deal info
Pricing
$19.95 / 1,000 results
Rating
0.0
(0)
Developer

Brad
Maintained by Community
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
7 days ago
Last modified
Categories
Share
M&A x402 Service
A FastAPI service for Coinbase AgentKit that scrapes PR Newswire articles and extracts M&A (Mergers & Acquisitions) entities using OpenAI's GPT models.
Features
- π Scrapes PR Newswire article links from a given URL
- π Extracts full article text content
- π€ Uses OpenAI API to extract structured M&A entities:
- BUYER: Acquiring companies/entities
- SELLER: Companies being sold/divested
- FUND: Investment funds, PE, VC firms
- LAW_FIRM: Legal firms involved
- INTERMEDIARY: Investment banks, advisors
- PROFESSIONAL: Individual professionals
- MONEY: Deal values and financial figures
- DATE: Transaction dates
- DEAL_TYPE: Type of transaction
Tech Stack
- Python 3.11
- FastAPI: Modern web framework
- Uvicorn: ASGI server
- BeautifulSoup4: HTML parsing
- Newspaper3k: Article extraction
- OpenAI API: Entity extraction
- Pydantic: Data validation
Setup
1. Clone and Install Dependencies
cd mna-x402-servicepip install -r requirements.txt
2. Environment Variables
Copy .env.example to .env and add your OpenAI API key:
$cp .env.example .env
Edit .env:
OPENAI_API_KEY=sk-your-actual-key-here
3. Run Locally
$uvicorn app.main:app --reload --host 0.0.0.0 --port 8000
The API will be available at http://localhost:8000
API Endpoints
Health Check
GET /GET /health
Scrape Articles
POST /x402/scrapeContent-Type: application/json{"site_url": "https://www.prnewswire.com/news-releases/financial-services-latest-news/acquisitions-mergers-and-takeovers-list/","max_articles": 10}
Parameters:
site_url(required): URL of the PR Newswire page to scrapemax_articles(optional): Maximum number of newest articles to process. Defaults to 10, maximum 100. Returns the most recently posted articles first. Helps prevent timeouts and control processing time.
Response:
{"site_url": "https://...","articles": [{"url": "https://...","content": "Full article text...","entities": {"buyer": ["Company A"],"seller": ["Company B"],"fund": ["PE Fund XYZ"],"law_firm": ["Law Firm ABC"],"intermediary": ["Investment Bank DEF"],"professional": ["John Doe"],"money": ["$100M", "$50 million"],"date": ["2024-01-15", "Q1 2024"],"deal_type": "Acquisition"}}],"count": 10,"total_found": 25,"processed": 10,"limit": 10}
Response Fields:
site_url: The URL that was scrapedarticles: Array of article results with extracted entitiescount: Number of articles in the responsetotal_found: Total number of articles found on the pageprocessed: Number of articles actually processedlimit: The limit that was applied (max_articles parameter)
## Docker### Build```bashdocker build -t mna-x402-service .
Run
$docker run -p 8000:8000 --env-file .env mna-x402-service
API Documentation
Once running, visit:
- Swagger UI:
http://localhost:8000/docs - ReDoc:
http://localhost:8000/redoc
Project Structure
mna-x402-service/βββ app/β βββ __init__.pyβ βββ main.py # FastAPI applicationβ βββ models.py # Pydantic modelsβ βββ scraper.py # PR Newswire scrapingβ βββ extractor.py # OpenAI entity extractionβββ requirements.txtβββ Dockerfileβββ .env.exampleβββ README.md
Error Handling
The service includes comprehensive error handling:
- Network errors during scraping
- Article extraction failures
- OpenAI API errors
- JSON parsing errors
Failed articles are included in the response with an error field.
Production Considerations
- Add rate limiting
- Implement caching for repeated requests
- Add authentication/API keys
- Set up monitoring and logging
- Configure proper CORS origins
- Use environment-specific configurations
License
MIT