Deal Scraper PRNewswire avatar
Deal Scraper PRNewswire

Pricing

$19.95 / 1,000 results

Go to Apify Store
Deal Scraper PRNewswire

Deal Scraper PRNewswire

Scrapes last 10 articles published on M&A section of PRNewswire and provides relevant deal info

Pricing

$19.95 / 1,000 results

Rating

0.0

(0)

Developer

Brad

Brad

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

1

Monthly active users

7 days ago

Last modified

Share

M&A x402 Service

A FastAPI service for Coinbase AgentKit that scrapes PR Newswire articles and extracts M&A (Mergers & Acquisitions) entities using OpenAI's GPT models.

Features

  • πŸ” Scrapes PR Newswire article links from a given URL
  • πŸ“„ Extracts full article text content
  • πŸ€– Uses OpenAI API to extract structured M&A entities:
    • BUYER: Acquiring companies/entities
    • SELLER: Companies being sold/divested
    • FUND: Investment funds, PE, VC firms
    • LAW_FIRM: Legal firms involved
    • INTERMEDIARY: Investment banks, advisors
    • PROFESSIONAL: Individual professionals
    • MONEY: Deal values and financial figures
    • DATE: Transaction dates
    • DEAL_TYPE: Type of transaction

Tech Stack

  • Python 3.11
  • FastAPI: Modern web framework
  • Uvicorn: ASGI server
  • BeautifulSoup4: HTML parsing
  • Newspaper3k: Article extraction
  • OpenAI API: Entity extraction
  • Pydantic: Data validation

Setup

1. Clone and Install Dependencies

cd mna-x402-service
pip install -r requirements.txt

2. Environment Variables

Copy .env.example to .env and add your OpenAI API key:

$cp .env.example .env

Edit .env:

OPENAI_API_KEY=sk-your-actual-key-here

3. Run Locally

$uvicorn app.main:app --reload --host 0.0.0.0 --port 8000

The API will be available at http://localhost:8000

API Endpoints

Health Check

GET /
GET /health

Scrape Articles

POST /x402/scrape
Content-Type: application/json
{
"site_url": "https://www.prnewswire.com/news-releases/financial-services-latest-news/acquisitions-mergers-and-takeovers-list/",
"max_articles": 10
}

Parameters:

  • site_url (required): URL of the PR Newswire page to scrape
  • max_articles (optional): Maximum number of newest articles to process. Defaults to 10, maximum 100. Returns the most recently posted articles first. Helps prevent timeouts and control processing time.

Response:

{
"site_url": "https://...",
"articles": [
{
"url": "https://...",
"content": "Full article text...",
"entities": {
"buyer": ["Company A"],
"seller": ["Company B"],
"fund": ["PE Fund XYZ"],
"law_firm": ["Law Firm ABC"],
"intermediary": ["Investment Bank DEF"],
"professional": ["John Doe"],
"money": ["$100M", "$50 million"],
"date": ["2024-01-15", "Q1 2024"],
"deal_type": "Acquisition"
}
}
],
"count": 10,
"total_found": 25,
"processed": 10,
"limit": 10
}

Response Fields:

  • site_url: The URL that was scraped
  • articles: Array of article results with extracted entities
  • count: Number of articles in the response
  • total_found: Total number of articles found on the page
  • processed: Number of articles actually processed
  • limit: The limit that was applied (max_articles parameter)
## Docker
### Build
```bash
docker build -t mna-x402-service .

Run

$docker run -p 8000:8000 --env-file .env mna-x402-service

API Documentation

Once running, visit:

  • Swagger UI: http://localhost:8000/docs
  • ReDoc: http://localhost:8000/redoc

Project Structure

mna-x402-service/
β”œβ”€β”€ app/
β”‚ β”œβ”€β”€ __init__.py
β”‚ β”œβ”€β”€ main.py # FastAPI application
β”‚ β”œβ”€β”€ models.py # Pydantic models
β”‚ β”œβ”€β”€ scraper.py # PR Newswire scraping
β”‚ └── extractor.py # OpenAI entity extraction
β”œβ”€β”€ requirements.txt
β”œβ”€β”€ Dockerfile
β”œβ”€β”€ .env.example
└── README.md

Error Handling

The service includes comprehensive error handling:

  • Network errors during scraping
  • Article extraction failures
  • OpenAI API errors
  • JSON parsing errors

Failed articles are included in the response with an error field.

Production Considerations

  • Add rate limiting
  • Implement caching for repeated requests
  • Add authentication/API keys
  • Set up monitoring and logging
  • Configure proper CORS origins
  • Use environment-specific configurations

License

MIT