PDF Proofreader avatar
PDF Proofreader

Pricing

Pay per usage

Go to Apify Store
PDF Proofreader

PDF Proofreader

Analyzes PDF documents to detect basic spelling and grammar issues by extracting text content. Provides a proofreading quality score and highlights common writing mistakes to help improve document clarity and correctness.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Gautam Rana

Gautam Rana

Maintained by Community

Actor stats

0

Bookmarked

62

Total users

52

Monthly active users

14 days ago

Last modified

Categories

Share

An API that analyzes PDF documents to detect basic spelling and grammar issues and provides a proofreading quality score.


Description

PDF Proofreader extracts text content from PDF files and analyzes it to identify common spelling and grammar mistakes. It also calculates a proofreading quality score to help users quickly assess the overall writing quality of their documents.

This API is useful for:

  • Students reviewing assignments or reports
  • Developers building document analysis tools
  • Content teams checking document quality
  • Automation workflows that process PDFs at scale

Features

  • Extracts text from PDF files
  • Detects common spelling mistakes
  • Detects basic grammar issues
  • Calculates total word count
  • Generates a proofreading quality score
  • Returns structured JSON output
  • Supports multiple PDF URLs per request

Tech Stack

  • Platform: Apify Actor
  • Language: (your implementation language)
  • PDF Parsing: (e.g., pdf-parse, PyPDF, etc.)
  • Grammar Engine: Rule-based / NLP

Input Format

The API accepts a JSON input with a list of PDF URLs.

Example input.json

{
"pdfUrls": [
{
"url": "https://raw.githubusercontent.com/Gautamrana14/pdf-test-files/main/DBMS-10%20(1).pdf"
}
]
}

Usage

Base Endpoint (Apify Actor)

https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items

Example Request (cURL)

curl -X POST \
-H "Content-Type: application/json" \
-d @input.json \
"https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items?token=YOUR_API_TOKEN"

Example using JavaScript

import fetch from "node-fetch";
const response = await fetch("https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items?token=YOUR_API_TOKEN", {
method: "POST",
headers: { "Content-Type": "application/json" },
body: JSON.stringify({
pdfUrls: [
{ url: "https://example.com/sample.pdf" }
]
})
});
const data = await response.json();
console.log(data);

Example using Python

import requests
url = "https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items"
params = {"token": "YOUR_API_TOKEN"}
payload = {
"pdfUrls": [
{"url": "https://example.com/sample.pdf"}
]
}
response = requests.post(url, params=params, json=payload)
print(response.json())

Output Format

The API returns a dataset URL containing the proofreading results.

Output Schema

{
"dataset": "https://api.apify.com/v2/datasets/xxxx/items"
}

Dataset Schema

Each dataset item has the following structure:

{
"pdfUrl": "https://example.com/sample.pdf",
"wordCount": 1520,
"detectedIssues": [
"Misspelled word: teh",
"Incorrect verb tense in paragraph 3",
"Repeated word: the"
],
"issueCount": 3,
"proofreadingScore": 92
}

Field Description

FieldDescription
pdfUrlURL of the analyzed PDF file
wordCountTotal number of words extracted
detectedIssuesList of detected spelling/grammar issues
issueCountTotal number of issues found
proofreadingScoreQuality score (0–100)

Limitations

  • Only detects basic grammar and spelling issues
  • Accuracy depends on text extraction quality
  • Not a replacement for professional proofreading

Rate Limits

Depends on your Apify plan and actor configuration.


Roadmap

  • Advanced grammar detection
  • Language support beyond English
  • Issue categorization (spelling vs grammar)
  • Suggestions for corrections
  • Highlight positions in original PDF

Contributing

Contributions are welcome.

  1. Fork the repository
  2. Create a feature branch
  3. Commit your changes
  4. Open a pull request

License

MIT License


Author

Gautam Rana GitHub: https://github.com/Gautamrana14