PDF Proofreader
Pricing
Pay per usage
PDF Proofreader
Analyzes PDF documents to detect basic spelling and grammar issues by extracting text content. Provides a proofreading quality score and highlights common writing mistakes to help improve document clarity and correctness.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Gautam Rana
Actor stats
0
Bookmarked
62
Total users
52
Monthly active users
14 days ago
Last modified
Categories
Share
An API that analyzes PDF documents to detect basic spelling and grammar issues and provides a proofreading quality score.
Description
PDF Proofreader extracts text content from PDF files and analyzes it to identify common spelling and grammar mistakes. It also calculates a proofreading quality score to help users quickly assess the overall writing quality of their documents.
This API is useful for:
- Students reviewing assignments or reports
- Developers building document analysis tools
- Content teams checking document quality
- Automation workflows that process PDFs at scale
Features
- Extracts text from PDF files
- Detects common spelling mistakes
- Detects basic grammar issues
- Calculates total word count
- Generates a proofreading quality score
- Returns structured JSON output
- Supports multiple PDF URLs per request
Tech Stack
- Platform: Apify Actor
- Language: (your implementation language)
- PDF Parsing: (e.g., pdf-parse, PyPDF, etc.)
- Grammar Engine: Rule-based / NLP
Input Format
The API accepts a JSON input with a list of PDF URLs.
Example input.json
{"pdfUrls": [{"url": "https://raw.githubusercontent.com/Gautamrana14/pdf-test-files/main/DBMS-10%20(1).pdf"}]}
Usage
Base Endpoint (Apify Actor)
https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items
Example Request (cURL)
curl -X POST \-H "Content-Type: application/json" \-d @input.json \"https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items?token=YOUR_API_TOKEN"
Example using JavaScript
import fetch from "node-fetch";const response = await fetch("https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items?token=YOUR_API_TOKEN", {method: "POST",headers: { "Content-Type": "application/json" },body: JSON.stringify({pdfUrls: [{ url: "https://example.com/sample.pdf" }]})});const data = await response.json();console.log(data);
Example using Python
import requestsurl = "https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items"params = {"token": "YOUR_API_TOKEN"}payload = {"pdfUrls": [{"url": "https://example.com/sample.pdf"}]}response = requests.post(url, params=params, json=payload)print(response.json())
Output Format
The API returns a dataset URL containing the proofreading results.
Output Schema
{"dataset": "https://api.apify.com/v2/datasets/xxxx/items"}
Dataset Schema
Each dataset item has the following structure:
{"pdfUrl": "https://example.com/sample.pdf","wordCount": 1520,"detectedIssues": ["Misspelled word: teh","Incorrect verb tense in paragraph 3","Repeated word: the"],"issueCount": 3,"proofreadingScore": 92}
Field Description
| Field | Description |
|---|---|
| pdfUrl | URL of the analyzed PDF file |
| wordCount | Total number of words extracted |
| detectedIssues | List of detected spelling/grammar issues |
| issueCount | Total number of issues found |
| proofreadingScore | Quality score (0–100) |
Limitations
- Only detects basic grammar and spelling issues
- Accuracy depends on text extraction quality
- Not a replacement for professional proofreading
Rate Limits
Depends on your Apify plan and actor configuration.
Roadmap
- Advanced grammar detection
- Language support beyond English
- Issue categorization (spelling vs grammar)
- Suggestions for corrections
- Highlight positions in original PDF
Contributing
Contributions are welcome.
- Fork the repository
- Create a feature branch
- Commit your changes
- Open a pull request
License
MIT License
Author
Gautam Rana GitHub: https://github.com/Gautamrana14