Apache & Nginx Log Parser avatar

Apache & Nginx Log Parser

Pricing

Pay per event

Go to Apify Store
Apache & Nginx Log Parser

Apache & Nginx Log Parser

Parse Apache, Nginx, and IIS access logs into structured JSON. Extracts IPs, timestamps, HTTP methods, paths, status codes, bytes, user agents, and referrers. Includes traffic analytics.

Pricing

Pay per event

Rating

0.0

(0)

Developer

Stas Persiianenko

Stas Persiianenko

Maintained by Community

Actor stats

0

Bookmarked

1

Total users

0

Monthly active users

2 days ago

Last modified

Share

🪵 Apache & Nginx Log Parser

Parse Apache access logs, Nginx logs, and IIS W3C logs in seconds. Extract structured data — IP addresses, timestamps, HTTP methods, paths, status codes, user agents — and get instant traffic analytics without installing any tools.

No proxy required. No web scraping. Pure computation.


🔍 What does it do?

Apache & Nginx Log Parser reads raw access log files and converts them into clean, structured records you can filter, sort, and export. It works with:

  • Apache Combined Log Format (the default for most Apache 2.x installs)
  • Nginx access logs (same format by default)
  • Common Log Format (CLF) — legacy Apache without referrer/user-agent
  • IIS W3C Extended Log Format — Microsoft IIS server logs

Give it a log file URL or paste raw log content, and it returns:

  1. One structured record per log line — IP, timestamp (ISO 8601), method, path, status code, response bytes, referrer, user agent
  2. A summary record — top pages, top IPs, status code distribution, hourly traffic breakdown, top user agents, HTTP method breakdown

👤 Who is it for?

🔧 SEO Auditors

Identify your top crawled pages, detect Googlebot activity, find 404s and redirect chains — all from raw access logs without Google Analytics or Search Console.

🖥️ DevOps & SysAdmins

Quickly diagnose traffic spikes, identify bad bots hammering your server, find which endpoints are slowest, and spot error patterns — without SSH access or log aggregation tools.

📊 Data Analysts

Load access logs into your data pipeline, spreadsheet, or BI tool. Get structured JSON out of messy log text in one step.

🛡️ Security Analysts

Find brute force attempts, unusual IP patterns, and suspicious user agents from raw access logs with no special software.


💡 Why use Apache & Nginx Log Parser?

Traditional log analysis requires command-line tools (awk, grep, GoAccess, AWStats) or a full ELK stack. This actor:

  • ✅ Works in your browser — no SSH, no terminal
  • ✅ Outputs clean JSON you can download, query via API, or pipe into n8n/Make
  • ✅ Auto-detects log format — no configuration needed
  • ✅ Handles large log files (millions of lines) efficiently
  • ✅ Integrates with Apify's dataset storage — keep your analysis history
  • ✅ Zero proxy cost — pure computation, no web requests to target sites

📋 What data is extracted?

Each parsed log line produces a structured record:

FieldTypeDescription
ipstringClient IP address
timestampstringRaw timestamp from log (e.g. 10/Oct/2000:13:55:36 -0700)
timestampIsostringISO 8601 timestamp (e.g. 2000-10-10T13:55:36-07:00)
methodstringHTTP method (GET, POST, PUT, DELETE, etc.)
pathstringRequest path (may include query string)
protocolstringHTTP protocol (HTTP/1.0, HTTP/1.1, HTTP/2)
statusCodenumberHTTP status code (200, 301, 404, 500, etc.)
responseBytesnumberResponse size in bytes
referrerstringHTTP Referer header value
userAgentstringUser-Agent header value
logFormatstringDetected format (apache_combined, common, iis_w3c)
parseErrorstringError message if the line couldn't be parsed
rawLinestringOriginal raw log line

Plus a summary record (when includeStats: true):

FieldDescription
topPagesTop N most-requested paths
topIPsTop N most-active IP addresses
statusCodesCount per HTTP status code
hourlyTrafficRequest count per hour
topUserAgentsTop N user agent strings
topMethodsHTTP method distribution

💰 How much does it cost to parse Apache logs?

This actor uses Pay-Per-Event (PPE) pricing — you only pay for what you use:

EventFREE tierDIAMOND tier
Run start$0.005$0.0025
Per 1,000 log lines$0.008$0.004

Example costs

File sizeLinesEstimated cost (FREE)
Small log1,000 lines~$0.013
Medium log10,000 lines~$0.085
Large log100,000 lines~$0.805
Huge log (1M lines)1,000,000~$8.005

Parsing is pure computation with zero proxy cost — the only expense is the nominal per-batch charge.

Use maxLines to cap processing and control costs on very large files.


🚀 How to use it

Step 1 — Provide your log data

Option A: Paste log content Set logText to your raw log lines. Ideal for small snippets or when testing.

Option B: Fetch from URL Set logUrl to any publicly accessible log file URL. The actor fetches it over HTTP — no auth or proxy needed.

Step 2 — Choose format (optional)

Leave logFormat as auto to detect automatically. Set it explicitly if auto-detection fails:

  • apache_combined — Apache Combined Log Format
  • nginx — Nginx access log (same as Apache Combined by default)
  • common — Common Log Format (CLF), no referrer/user-agent
  • iis_w3c — IIS W3C Extended Log Format

Step 3 — Run and download results

Results are saved to the actor's dataset. Download as JSON, CSV, or JSONL — or query them via the Apify API.


⚙️ Input parameters

ParameterTypeDefaultDescription
logTextstringPaste raw log lines directly
logUrlstringURL of a log file to fetch
logFormatstringautoLog format: auto, apache_combined, nginx, common, iis_w3c
maxLinesinteger0 (unlimited)Max lines to parse (0 = parse all)
includeStatsbooleantrueInclude summary statistics record
topNinteger10How many entries in top-N summaries

Either logText or logUrl is required.


📤 Output format

Results are stored in two dataset views:

log-entries view

One record per parsed log line. Use this to filter, sort, or search individual requests.

summary view

A single record with aggregated statistics. Use this for the big-picture traffic breakdown.


🔧 Tips & best practices

Handling large log files

  • Use maxLines to process only the most recent N lines (log files grow from the bottom up)
  • For very large files (> 1GB), consider splitting them before uploading
  • Use includeStats: false if you only need raw entries and want to compute stats yourself

Format auto-detection

Auto-detection samples the first 20 non-comment lines. If your log file has a long header or preamble, auto-detection may fail — set logFormat explicitly in that case.

IIS logs

IIS W3C logs have a #Fields: header that defines column order. The actor reads this header automatically and adjusts field mapping accordingly. Multiple #Fields: headers in a single file (rare but valid) are handled correctly.

Filtering by status code

After parsing, filter the dataset by statusCode in the Apify console or via API:

GET https://api.apify.com/v2/datasets/{DATASET_ID}/items?fields=ip,path,statusCode&limit=1000

Nginx default format

Nginx's default access_log format is identical to Apache Combined Log Format. The actor parses it without any special configuration.


🔌 Integrations

With n8n

Use the Apify n8n node to trigger the parser, then pass the dataset URL to a HTTP Request node to fetch results. Feed into Spreadsheet File or Google Sheets nodes for pivot tables.

With Make (Integromat)

Trigger the actor via Apify > Run Actor module, then use Apify > Get Dataset Items to fetch structured records into any downstream module (Google Sheets, Airtable, Slack).

With Zapier

Use Apify > Run Actor trigger, connect to Google Sheets > Create Spreadsheet Row — each parsed log line becomes a row.

With your data pipeline

Use the dataset API to stream results:

curl "https://api.apify.com/v2/datasets/{DATASET_ID}/items?format=json" \
-H "Authorization: Bearer YOUR_TOKEN" | jq '.[] | select(.statusCode == 404)'

🤖 API usage

Node.js

import { ApifyClient } from 'apify-client';
const client = new ApifyClient({ token: 'YOUR_TOKEN' });
const run = await client.actor('automation-lab/apache-log-parser').call({
logUrl: 'https://example.com/access.log',
logFormat: 'auto',
includeStats: true,
topN: 10,
});
const { items } = await client.dataset(run.defaultDatasetId).listItems();
console.log(items.filter(i => i.statusCode === 404));

Python

from apify_client import ApifyClient
client = ApifyClient(token='YOUR_TOKEN')
run = client.actor('automation-lab/apache-log-parser').call(run_input={
'logUrl': 'https://example.com/access.log',
'logFormat': 'auto',
'includeStats': True,
'topN': 10,
})
items = client.dataset(run['defaultDatasetId']).list_items().items
errors = [i for i in items if i.get('statusCode', 0) >= 500]
print(f"Found {len(errors)} server errors")

cURL

TOKEN="YOUR_TOKEN"
# Start the run
RUN=$(curl -s -X POST "https://api.apify.com/v2/acts/automation-lab~apache-log-parser/runs?token=$TOKEN" \
-H "Content-Type: application/json" \
-d '{"logUrl":"https://example.com/access.log","includeStats":true}')
DATASET_ID=$(echo $RUN | python3 -c "import sys,json; print(json.load(sys.stdin)['data']['defaultDatasetId'])")
# Fetch results
curl "https://api.apify.com/v2/datasets/$DATASET_ID/items?token=$TOKEN" | python3 -m json.tool

🧠 Use with Claude (MCP)

You can use this actor directly from Claude via the Apify MCP server:

Claude Desktop — Add to claude_desktop_config.json:

{
"mcpServers": {
"apify": {
"command": "npx",
"args": ["-y", "@apify/mcp-server?tools=automation-lab/apache-log-parser"],
"env": { "APIFY_TOKEN": "YOUR_TOKEN" }
}
}
}

Claude Code — Run in terminal:

$claude mcp add --transport http apify "https://mcp.apify.com?tools=automation-lab/apache-log-parser"

Example prompts:

  • "Parse this Apache log and show me the top 10 pages with 404 errors"
  • "Fetch my Nginx log from https://example.com/access.log and give me hourly traffic for yesterday"
  • "Analyze this IIS log: [paste log content] — which IPs are hitting us hardest?"

⚖️ Legality & privacy

This actor processes log data you provide — it does not scrape any website. Log files typically contain IP addresses and user agent strings, which may be considered personal data under GDPR and similar regulations.

Ensure you:

  • Have the right to process the log data you provide
  • Comply with your organization's data retention policies
  • Anonymize or pseudonymize IP addresses if required by your jurisdiction

❓ FAQ

Q: My log lines aren't being parsed — what format should I use? A: Copy one line from your log and check if it matches Apache Combined Format: IP - user [timestamp] "METHOD /path HTTP/1.1" STATUS BYTES "referrer" "user-agent" If so, use apache_combined. If it's missing the referrer/user-agent at the end, use common. If it has a #Fields: header, use iis_w3c.

Q: How many lines can it process? A: There is no hard limit — the actor processes all lines in memory. For very large files (millions of lines), Apify's 256 MB memory limit applies. Use maxLines to cap processing if you get out-of-memory errors.

Q: Can it parse custom Nginx log formats? A: Currently supports the Nginx default format (which matches Apache Combined). Custom log_format directives produce non-standard output that may not parse correctly. Support for custom format strings is planned for a future version.

Q: The URL fetch fails — what can I do? A: The log URL must be publicly accessible without authentication. If your log is behind auth or a firewall, download it locally and paste the content using logText instead.

Q: Why are some lines showing parseError? A: Lines that don't match the expected format (e.g., blank lines, comment lines in Apache config files accidentally included, or log rotation markers) produce a parseError. The actor continues processing remaining lines — one bad line doesn't stop the run.

Q: Does it support compressed (.gz) log files? A: Not currently. Decompress the file first (gunzip access.log.gz) and then provide the plain text content via logText or host it at a URL.