Docs To Rag
Pricing
Pay per usage
Docs To Rag
Transform documentation websites into RAG-ready chunks with semantic understanding, quality scoring, and direct vector database integration.
Pricing
Pay per usage
Rating
0.0
(0)
Developer

Gabriel Antony Xaviour
Actor stats
0
Bookmarked
2
Total users
1
Monthly active users
20 hours ago
Last modified
Categories
Share
Website Health Monitor - Apify Actor Reference
A comprehensive reference implementation demonstrating all major Apify Actor features using CheerioCrawler. This Actor monitors website health by checking URLs for status codes, load times, and broken links.
Purpose
This template serves as a copy-paste reference for building Apify Actors. It demonstrates every major Apify feature in a single, working Actor that you can use as a starting point for your own projects.
Features Demonstrated
1. Actor Lifecycle
// main.ts - Actor.main() handles init/exit automaticallyActor.main(async () => {// Your Actor code here// Actor.init() called automatically at start// Actor.exit() called automatically at end});
2. Input Handling
// Get typed input from Actor.getInput()const rawInput = await Actor.getInput<ActorInput>();const input = validateInput(rawInput);
Input Schema (input_schema.json):
urls(array, required) - URLs to monitormaxConcurrency(integer, default: 5) - Concurrent requestsproxyConfig(object) - Apify proxy configurationnotifyOnFailure(boolean) - Enable failure notificationsnotificationActorId(string) - Actor to call on failureswebhookUrl(string) - Webhook to trigger on completion
3. Storage - Dataset
// Push results to dataset during crawlingawait Dataset.pushData(healthCheckResult);// Access dataset infoconst dataset = await Actor.openDataset();const info = await dataset.getInfo();
Dataset Output Schema:
{url: string;status: number;loadTime: number;pageTitle: string | null;brokenLinks: string[];totalLinks: number;isHealthy: boolean;errorMessage: string | null;timestamp: string;}
4. Storage - Key-Value Store
// Open default Key-Value Storeconst kvStore = await Actor.openKeyValueStore();// Read INPUT (alternative to Actor.getInput())const input = await kvStore.getValue('INPUT');// Write OUTPUT summaryawait kvStore.setValue('OUTPUT', summary);// Write file with content typeawait kvStore.setValue('SCREENSHOT_STATUS', content, {contentType: 'application/json',});
5. Storage - Request Queue
// Open request queueconst requestQueue = await Actor.openRequestQueue();// Add requests with user dataawait requestQueue.addRequest({url: 'https://example.com',userData: { originalUrl: url, startTime: Date.now() },});
6. Crawlee Integration (CheerioCrawler)
import { CheerioCrawler, createCheerioRouter } from 'crawlee';// Create router with handlersconst router = createCheerioRouter();router.addDefaultHandler(async ({ request, response, $, log }) => {// Extract data using Cheerioconst title = $('title').text();const links = $('a[href]').map((_, el) => $(el).attr('href')).get();// Push to datasetawait Dataset.pushData({ url: request.url, title, links });});// Create crawlerconst crawler = new CheerioCrawler({requestQueue,requestHandler: router,failedRequestHandler: async ({ request, error }) => {// Handle failed requests},maxConcurrency: 5,maxRequestRetries: 3,});// Run crawlerawait crawler.run();
7. Proxy Configuration
// Create proxy from input configurationconst proxyConfiguration = await Actor.createProxyConfiguration({groups: ['RESIDENTIAL'],countryCode: 'US',});// Use in crawlerconst crawler = new CheerioCrawler({proxyConfiguration,// ...});
8. Actor-to-Actor Communication
// Call another Actor and wait for resultconst run = await Actor.call('apify/send-email', // Actor ID{ // Input for called Actorsubject: 'Alert',message: 'Something happened',},{memory: 256, // Memory in MBtimeout: 60, // Timeout in seconds});// Start Actor without waiting (fire and forget)const run = await Actor.start('apify/some-actor', input);// Call Actor and get dataset itemsconst { items } = await Actor.callTask('user/my-task', input);
9. Logging
// Different log levelsActor.log.debug('Detailed debug info');Actor.log.info('General information');Actor.log.warning('Non-critical warning');Actor.log.error('Error occurred', { error: err.message });// Log with structured dataActor.log.info('Processing URL', {url: 'https://example.com',status: 200,loadTime: 150,});
10. Status Messages
// Update Actor status (visible in Apify Console)await Actor.setStatusMessage('Processing URL 5/10...');await Actor.setStatusMessage('✓ Completed successfully');
11. Environment Information
// Get Actor environment variablesconst env = Actor.getEnv();console.log({actorId: env.actorId,actorRunId: env.actorRunId,userId: env.userId,memoryMbytes: env.memoryMbytes,isAtHome: env.isAtHome,defaultDatasetId: env.defaultDatasetId,defaultKeyValueStoreId: env.defaultKeyValueStoreId,startedAt: env.startedAt,timeoutAt: env.timeoutAt,});
12. Graceful Shutdown
// Handle Actor migration (server change)Actor.on('migrating', async () => {// Save state before migrationconst kvStore = await Actor.openKeyValueStore();await kvStore.setValue('MIGRATION_STATE', currentState);});// Handle Actor abortActor.on('aborting', async () => {// Save partial resultsawait Dataset.pushData(partialResults);});// Other events: 'persistState', 'systemInfo'
13. Standby Mode (HTTP Server)
// Create HTTP server for standby modeconst server = await Actor.createServer(async (req, res) => {const url = new URL(req.url || '/', `http://${req.headers.host}`);if (url.pathname === '/health') {res.writeHead(200, { 'Content-Type': 'application/json' });res.end(JSON.stringify({status: 'running',urlsProcessed: 10,memoryUsageMB: 128,}));} else {res.writeHead(404);res.end('Not found');}});// Server is automatically bound to Actor's port
File Structure
templates/cheerio-reference/├── src/│ ├── main.ts # Entry point with Actor.main()│ ├── routes.ts # Crawlee router handlers│ ├── types.ts # TypeScript interfaces│ └── utils.ts # Helper functions├── package.json # Dependencies├── tsconfig.json # TypeScript config├── Dockerfile # Multi-stage build├── .actor/│ ├── actor.json # Actor metadata│ └── input_schema.json # Input schema└── README.md # This file
Running Locally
# Install dependenciesnpm install# Build TypeScriptnpm run build# Run with test inputecho '{"urls": ["https://example.com"]}' | npx apify-cli run -p
Deploying to Apify
# Login to Apifynpx apify-cli login# Push to Apify platformnpx apify-cli push
Output
Dataset Items
Each checked URL produces a dataset item:
{"url": "https://example.com","status": 200,"loadTime": 523,"pageTitle": "Example Domain","brokenLinks": [],"totalLinks": 15,"isHealthy": true,"errorMessage": null,"timestamp": "2024-01-15T10:30:00.000Z"}
Key-Value Store OUTPUT
Summary of the health check run:
{"totalChecked": 10,"failedCount": 2,"successCount": 8,"avgLoadTime": 450,"totalBrokenLinks": 5,"failedUrls": ["https://broken.example.com"],"startTime": "2024-01-15T10:00:00.000Z","endTime": "2024-01-15T10:05:00.000Z","durationSeconds": 300}
Quick Reference: Common Patterns
Reading Files from Key-Value Store
const kvStore = await Actor.openKeyValueStore();const data = await kvStore.getValue('MY_DATA');
Writing Binary Files
await kvStore.setValue('image.png', buffer, {contentType: 'image/png',});
Named Stores
// Open named stores (persist across runs)const kvStore = await Actor.openKeyValueStore('my-store');const dataset = await Actor.openDataset('my-dataset');const queue = await Actor.openRequestQueue('my-queue');
Metamorph (Transform Actor)
// Transform into another Actorawait Actor.metamorph('apify/web-scraper', newInput);
Abort Run
// Abort with status messageawait Actor.fail('Critical error occurred');// Exit successfullyawait Actor.exit('Completed');
License
ISC