Pricing

Pay per usage

PDF Proofreader

Analyzes PDF documents to detect basic spelling and grammar issues by extracting text content. Provides a proofreading quality score and highlights common writing mistakes to help improve document clarity and correctness.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

Gautam Rana

Actor stats

Bookmarked

Total users

Monthly active users

2 months ago

Last modified

Description

PDF Proofreader extracts text content from PDF files and analyzes it to identify common spelling and grammar mistakes. It also calculates a proofreading quality score to help users quickly assess the overall writing quality of their documents.

This API is useful for:

Students reviewing assignments or reports
Developers building document analysis tools
Content teams checking document quality
Automation workflows that process PDFs at scale

Features

Extracts text from PDF files
Detects common spelling mistakes
Detects basic grammar issues
Calculates total word count
Generates a proofreading quality score
Returns structured JSON output
Supports multiple PDF URLs per request

Tech Stack

Platform: Apify Actor
Language: (your implementation language)
PDF Parsing: (e.g., pdf-parse, PyPDF, etc.)
Grammar Engine: Rule-based / NLP

Input Format

The API accepts a JSON input with a list of PDF URLs.

Example `input.json`

{
  "pdfUrls": [
    {
      "url": "https://raw.githubusercontent.com/Gautamrana14/pdf-test-files/main/DBMS-10%20(1).pdf"
    }
  ]
}

Usage

Base Endpoint (Apify Actor)

https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items

Example Request (cURL)

curl -X POST \
  -H "Content-Type: application/json" \
  -d @input.json \
  "https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items?token=YOUR_API_TOKEN"

Example using JavaScript

import fetch from "node-fetch";

const response = await fetch("https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items?token=YOUR_API_TOKEN", {
  method: "POST",
  headers: { "Content-Type": "application/json" },
  body: JSON.stringify({
    pdfUrls: [
      { url: "https://example.com/sample.pdf" }
    ]
  })
});

const data = await response.json();
console.log(data);

Example using Python

import requests

url = "https://api.apify.com/v2/acts/<your-actor-id>/run-sync-get-dataset-items"
params = {"token": "YOUR_API_TOKEN"}

payload = {
    "pdfUrls": [
        {"url": "https://example.com/sample.pdf"}
    ]
}

response = requests.post(url, params=params, json=payload)
print(response.json())

Output Format

The API returns a dataset URL containing the proofreading results.

Output Schema

{
  "dataset": "https://api.apify.com/v2/datasets/xxxx/items"
}

Dataset Schema

Each dataset item has the following structure:

{
  "pdfUrl": "https://example.com/sample.pdf",
  "wordCount": 1520,
  "detectedIssues": [
    "Misspelled word: teh",
    "Incorrect verb tense in paragraph 3",
    "Repeated word: the"
  ],
  "issueCount": 3,
  "proofreadingScore": 92
}

Field Description

Field	Description
pdfUrl	URL of the analyzed PDF file
wordCount	Total number of words extracted
detectedIssues	List of detected spelling/grammar issues
issueCount	Total number of issues found
proofreadingScore	Quality score (0–100)

Limitations

Only detects basic grammar and spelling issues
Accuracy depends on text extraction quality
Not a replacement for professional proofreading

Rate Limits

Depends on your Apify plan and actor configuration.

Roadmap

Advanced grammar detection
Language support beyond English
Issue categorization (spelling vs grammar)
Suggestions for corrections
Highlight positions in original PDF

Contributing

Contributions are welcome.

Fork the repository
Create a feature branch
Commit your changes
Open a pull request

License

MIT License

Author

Gautam Rana GitHub: https://github.com/Gautamrana14

Grammar Checker

vivid_astronaut/grammar-checker

Check grammar, spelling, and style in text. Detect errors and get improvement suggestions. Perfect for content writers, students, and professionals who need polished writing.

Fabio Suizu

PDF Scraper

onidivo/pdf-scraper

Scrape and extract text from PDF links.

Onidivo Technologies

494

Document Extractor API - AI-Powered PDF & Text Analysis

fresh_cliff/document-extractor-api

Extract text and data from PDF, Word, and image documents using AI-powered OCR. Convert documents to structured JSON, analyze content, and extract insights. No API keys required with mirror fallbacks.

Brennan Crawford

Pdf API

vivid_astronaut/pdf

Fabio Suizu

PDF to Text API | Document Extraction for LLMs & RAG

andok/pdf-text-converter

Convert bulk PDF documents via URL into clean, raw text. The perfect document scraper for LLMs, vector databases, and RAG pipelines.

Andok

Html To Pdf Api

simplifysme/html-to-pdf-api

📄 Convert any HTML page or URL to high-quality PDF documents via API. Perfect for reports, invoices, documentation, web page archiving, and automated document generation.

SimplifySME Toolbox

HTML To PDF API

igview-owner/html-to-pdf-api

Convert HTML content and webpage URLs to high-quality PDF documents instantly. HTML to PDF converter with customizable page formats (A4, Letter), scale control, background graphics, and smart waiting for dynamic content. Perfect for reports, documentation, and automated PDF generation workflows.

Sachin Kumar Yadav

Extract text from PDF

akash9078/pdf-text-extractor

Efficiently extract text content from PDF files, ideal for data processing, content analysis, and automation workflows. Supports various PDF structures and outputs clean, readable text.

Akash Kumar Naik

Fast Pdf Processor

contemporary_fruit/pdf-processor-actor

This API is a PDF Processing Service allowing users to upload a PDF to: Extract Text: Reads all text from the PDF and returns it as structured JSON data per page. Merge Pages: Creates a new PDF containing only the specific pages selected by the user. (260 characters)

Andric

Elite Document Ocr Lite

thepattyroller/elite-document-ocr-lite

Basic document text extraction and processing. Extract text from documents, analyze document structure, and extract structured data from invoices and receipts. Perfect for document automation workflows.