Apache & Nginx Log Parser
Pricing
Pay per event
Apache & Nginx Log Parser
Parse Apache, Nginx, and IIS access logs into structured JSON. Extracts IPs, timestamps, HTTP methods, paths, status codes, bytes, user agents, and referrers. Includes traffic analytics.
Pricing
Pay per event
Rating
0.0
(0)
Developer
Stas Persiianenko
Actor stats
0
Bookmarked
1
Total users
0
Monthly active users
2 days ago
Last modified
Categories
Share
🪵 Apache & Nginx Log Parser
Parse Apache access logs, Nginx logs, and IIS W3C logs in seconds. Extract structured data — IP addresses, timestamps, HTTP methods, paths, status codes, user agents — and get instant traffic analytics without installing any tools.
No proxy required. No web scraping. Pure computation.
🔍 What does it do?
Apache & Nginx Log Parser reads raw access log files and converts them into clean, structured records you can filter, sort, and export. It works with:
- Apache Combined Log Format (the default for most Apache 2.x installs)
- Nginx access logs (same format by default)
- Common Log Format (CLF) — legacy Apache without referrer/user-agent
- IIS W3C Extended Log Format — Microsoft IIS server logs
Give it a log file URL or paste raw log content, and it returns:
- One structured record per log line — IP, timestamp (ISO 8601), method, path, status code, response bytes, referrer, user agent
- A summary record — top pages, top IPs, status code distribution, hourly traffic breakdown, top user agents, HTTP method breakdown
👤 Who is it for?
🔧 SEO Auditors
Identify your top crawled pages, detect Googlebot activity, find 404s and redirect chains — all from raw access logs without Google Analytics or Search Console.
🖥️ DevOps & SysAdmins
Quickly diagnose traffic spikes, identify bad bots hammering your server, find which endpoints are slowest, and spot error patterns — without SSH access or log aggregation tools.
📊 Data Analysts
Load access logs into your data pipeline, spreadsheet, or BI tool. Get structured JSON out of messy log text in one step.
🛡️ Security Analysts
Find brute force attempts, unusual IP patterns, and suspicious user agents from raw access logs with no special software.
💡 Why use Apache & Nginx Log Parser?
Traditional log analysis requires command-line tools (awk, grep, GoAccess, AWStats) or a full ELK stack. This actor:
- ✅ Works in your browser — no SSH, no terminal
- ✅ Outputs clean JSON you can download, query via API, or pipe into n8n/Make
- ✅ Auto-detects log format — no configuration needed
- ✅ Handles large log files (millions of lines) efficiently
- ✅ Integrates with Apify's dataset storage — keep your analysis history
- ✅ Zero proxy cost — pure computation, no web requests to target sites
📋 What data is extracted?
Each parsed log line produces a structured record:
| Field | Type | Description |
|---|---|---|
ip | string | Client IP address |
timestamp | string | Raw timestamp from log (e.g. 10/Oct/2000:13:55:36 -0700) |
timestampIso | string | ISO 8601 timestamp (e.g. 2000-10-10T13:55:36-07:00) |
method | string | HTTP method (GET, POST, PUT, DELETE, etc.) |
path | string | Request path (may include query string) |
protocol | string | HTTP protocol (HTTP/1.0, HTTP/1.1, HTTP/2) |
statusCode | number | HTTP status code (200, 301, 404, 500, etc.) |
responseBytes | number | Response size in bytes |
referrer | string | HTTP Referer header value |
userAgent | string | User-Agent header value |
logFormat | string | Detected format (apache_combined, common, iis_w3c) |
parseError | string | Error message if the line couldn't be parsed |
rawLine | string | Original raw log line |
Plus a summary record (when includeStats: true):
| Field | Description |
|---|---|
topPages | Top N most-requested paths |
topIPs | Top N most-active IP addresses |
statusCodes | Count per HTTP status code |
hourlyTraffic | Request count per hour |
topUserAgents | Top N user agent strings |
topMethods | HTTP method distribution |
💰 How much does it cost to parse Apache logs?
This actor uses Pay-Per-Event (PPE) pricing — you only pay for what you use:
| Event | FREE tier | DIAMOND tier |
|---|---|---|
| Run start | $0.005 | $0.0025 |
| Per 1,000 log lines | $0.008 | $0.004 |
Example costs
| File size | Lines | Estimated cost (FREE) |
|---|---|---|
| Small log | 1,000 lines | ~$0.013 |
| Medium log | 10,000 lines | ~$0.085 |
| Large log | 100,000 lines | ~$0.805 |
| Huge log (1M lines) | 1,000,000 | ~$8.005 |
Parsing is pure computation with zero proxy cost — the only expense is the nominal per-batch charge.
Use maxLines to cap processing and control costs on very large files.
🚀 How to use it
Step 1 — Provide your log data
Option A: Paste log content
Set logText to your raw log lines. Ideal for small snippets or when testing.
Option B: Fetch from URL
Set logUrl to any publicly accessible log file URL. The actor fetches it over HTTP — no auth or proxy needed.
Step 2 — Choose format (optional)
Leave logFormat as auto to detect automatically. Set it explicitly if auto-detection fails:
apache_combined— Apache Combined Log Formatnginx— Nginx access log (same as Apache Combined by default)common— Common Log Format (CLF), no referrer/user-agentiis_w3c— IIS W3C Extended Log Format
Step 3 — Run and download results
Results are saved to the actor's dataset. Download as JSON, CSV, or JSONL — or query them via the Apify API.
⚙️ Input parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
logText | string | — | Paste raw log lines directly |
logUrl | string | — | URL of a log file to fetch |
logFormat | string | auto | Log format: auto, apache_combined, nginx, common, iis_w3c |
maxLines | integer | 0 (unlimited) | Max lines to parse (0 = parse all) |
includeStats | boolean | true | Include summary statistics record |
topN | integer | 10 | How many entries in top-N summaries |
Either logText or logUrl is required.
📤 Output format
Results are stored in two dataset views:
log-entries view
One record per parsed log line. Use this to filter, sort, or search individual requests.
summary view
A single record with aggregated statistics. Use this for the big-picture traffic breakdown.
🔧 Tips & best practices
Handling large log files
- Use
maxLinesto process only the most recent N lines (log files grow from the bottom up) - For very large files (> 1GB), consider splitting them before uploading
- Use
includeStats: falseif you only need raw entries and want to compute stats yourself
Format auto-detection
Auto-detection samples the first 20 non-comment lines. If your log file has a long header or preamble, auto-detection may fail — set logFormat explicitly in that case.
IIS logs
IIS W3C logs have a #Fields: header that defines column order. The actor reads this header automatically and adjusts field mapping accordingly. Multiple #Fields: headers in a single file (rare but valid) are handled correctly.
Filtering by status code
After parsing, filter the dataset by statusCode in the Apify console or via API:
GET https://api.apify.com/v2/datasets/{DATASET_ID}/items?fields=ip,path,statusCode&limit=1000
Nginx default format
Nginx's default access_log format is identical to Apache Combined Log Format. The actor parses it without any special configuration.
🔌 Integrations
With n8n
Use the Apify n8n node to trigger the parser, then pass the dataset URL to a HTTP Request node to fetch results. Feed into Spreadsheet File or Google Sheets nodes for pivot tables.
With Make (Integromat)
Trigger the actor via Apify > Run Actor module, then use Apify > Get Dataset Items to fetch structured records into any downstream module (Google Sheets, Airtable, Slack).
With Zapier
Use Apify > Run Actor trigger, connect to Google Sheets > Create Spreadsheet Row — each parsed log line becomes a row.
With your data pipeline
Use the dataset API to stream results:
curl "https://api.apify.com/v2/datasets/{DATASET_ID}/items?format=json" \-H "Authorization: Bearer YOUR_TOKEN" | jq '.[] | select(.statusCode == 404)'
🤖 API usage
Node.js
import { ApifyClient } from 'apify-client';const client = new ApifyClient({ token: 'YOUR_TOKEN' });const run = await client.actor('automation-lab/apache-log-parser').call({logUrl: 'https://example.com/access.log',logFormat: 'auto',includeStats: true,topN: 10,});const { items } = await client.dataset(run.defaultDatasetId).listItems();console.log(items.filter(i => i.statusCode === 404));
Python
from apify_client import ApifyClientclient = ApifyClient(token='YOUR_TOKEN')run = client.actor('automation-lab/apache-log-parser').call(run_input={'logUrl': 'https://example.com/access.log','logFormat': 'auto','includeStats': True,'topN': 10,})items = client.dataset(run['defaultDatasetId']).list_items().itemserrors = [i for i in items if i.get('statusCode', 0) >= 500]print(f"Found {len(errors)} server errors")
cURL
TOKEN="YOUR_TOKEN"# Start the runRUN=$(curl -s -X POST "https://api.apify.com/v2/acts/automation-lab~apache-log-parser/runs?token=$TOKEN" \-H "Content-Type: application/json" \-d '{"logUrl":"https://example.com/access.log","includeStats":true}')DATASET_ID=$(echo $RUN | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['defaultDatasetId'])")# Fetch resultscurl "https://api.apify.com/v2/datasets/$DATASET_ID/items?token=$TOKEN" | python3 -m json.tool
🧠 Use with Claude (MCP)
You can use this actor directly from Claude via the Apify MCP server:
Claude Desktop — Add to claude_desktop_config.json:
{"mcpServers": {"apify": {"command": "npx","args": ["-y", "@apify/mcp-server?tools=automation-lab/apache-log-parser"],"env": { "APIFY_TOKEN": "YOUR_TOKEN" }}}}
Claude Code — Run in terminal:
$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/apache-log-parser"
Example prompts:
- "Parse this Apache log and show me the top 10 pages with 404 errors"
- "Fetch my Nginx log from https://example.com/access.log and give me hourly traffic for yesterday"
- "Analyze this IIS log: [paste log content] — which IPs are hitting us hardest?"
⚖️ Legality & privacy
This actor processes log data you provide — it does not scrape any website. Log files typically contain IP addresses and user agent strings, which may be considered personal data under GDPR and similar regulations.
Ensure you:
- Have the right to process the log data you provide
- Comply with your organization's data retention policies
- Anonymize or pseudonymize IP addresses if required by your jurisdiction
❓ FAQ
Q: My log lines aren't being parsed — what format should I use?
A: Copy one line from your log and check if it matches Apache Combined Format:
IP - user [timestamp] "METHOD /path HTTP/1.1" STATUS BYTES "referrer" "user-agent"
If so, use apache_combined. If it's missing the referrer/user-agent at the end, use common. If it has a #Fields: header, use iis_w3c.
Q: How many lines can it process?
A: There is no hard limit — the actor processes all lines in memory. For very large files (millions of lines), Apify's 256 MB memory limit applies. Use maxLines to cap processing if you get out-of-memory errors.
Q: Can it parse custom Nginx log formats?
A: Currently supports the Nginx default format (which matches Apache Combined). Custom log_format directives produce non-standard output that may not parse correctly. Support for custom format strings is planned for a future version.
Q: The URL fetch fails — what can I do?
A: The log URL must be publicly accessible without authentication. If your log is behind auth or a firewall, download it locally and paste the content using logText instead.
Q: Why are some lines showing parseError? A: Lines that don't match the expected format (e.g., blank lines, comment lines in Apache config files accidentally included, or log rotation markers) produce a parseError. The actor continues processing remaining lines — one bad line doesn't stop the run.
Q: Does it support compressed (.gz) log files?
A: Not currently. Decompress the file first (gunzip access.log.gz) and then provide the plain text content via logText or host it at a URL.
🔗 Related actors from automation-lab
- JSON Schema Generator — Generate JSON Schema from sample JSON documents
- Color Contrast Checker — WCAG 2.1 AA/AAA color contrast validator
- JSON CSV Converter — Convert between JSON and CSV formats