Naver Blog Scraper - Clean Content & Review Intelligence
Pricing
Pay per usage
Naver Blog Scraper - Clean Content & Review Intelligence
Scrapes titles of websites using Crawlee and BeautifulSoup.The only Naver Blog scraper that properly extracts clean content from Naver's iframe architecture. Sponsor detection, metadata, engagement stats, and Google Sheets integration.
Pricing
Pay per usage
Rating
0.0
(0)
Developer
한프로
Actor stats
0
Bookmarked
2
Total users
0
Monthly active users
3 days ago
Last modified
Categories
Share
Naver Blog Scraper - Clean Content & Review Intelligence
The only Naver Blog scraper that properly extracts clean content from Naver's iframe-based architecture.
Why This Actor?
Naver Blog loads post content inside an iframe, which means standard scrapers return raw page HTML filled with navigation menus, ads, and sidebar widgets. This Actor:
- Detects the iframe and fetches the actual content page separately
- Parses SmartEditor 2 and 3 (SE2/SE3) markup to extract structured text
- Returns clean, readable content — no menus, no ads, no junk HTML
Other scrapers on the Apify Store charge $30/month and still can't solve the iframe problem. This one does it for free.
Key Features
- Clean content extraction — Properly handles Naver's iframe + SmartEditor 2/3 architecture
- Keyword search — Search Naver Blog by any Korean keyword
- Direct URL input — Scrape specific blog post URLs directly
- Sponsor detection — Automatically detects 협찬 (sponsored) vs 내돈내산 (organic) posts
- Sponsored content filter — Show all, organic only, or sponsored only
- Comment count — Extracted from
listNumCommentJS variable - Like count (공감) — Fetched via Naver's reaction API
- Category extraction — Blog post category from
categoryNameJS variable - Tag/hashtag extraction — Parsed from
tagsJS array in inline scripts - Blog profile — Nickname, profile image, blog ID, visitor count
- Date range filter — Filter keyword search results by publish date
- HTML or text mode — Get clean text or original HTML markup
- Auto deduplication — Skip duplicate posts by URL
- Deleted/private post handling — Automatically skips unavailable posts
- Run statistics — totalFound, uniqueCount, duplicatesRemoved, sponsoredSkipped
Output Fields
| Field | Type | Example |
|---|---|---|
title | string | "강남 맛집 추천 TOP 10" |
url | string | "https://blog.naver.com/user/123" |
author | string | "맛집탐험가" |
date | string | "2024-01-15" |
category | string | "맛집 리뷰" |
content | string | Clean blog body text (or HTML) |
tags | string[] | ["강남맛집", "데이트코스"] |
images | string[] | List of image URLs |
commentCount | integer | 12 |
likeCount | integer | 45 |
isSponsored | boolean | false |
sponsorType | string | "organic" / "sponsored" / "unknown" |
blogProfile | object | {"nickname": "J베이지", "blogId": "jbeige_review", ...} |
Quick Start
Search by keyword
{"keyword": "강남 맛집","maxPosts": 20,"sort": "date","sponsoredFilter": "organic_only"}
Scrape specific URLs
{"startUrls": [{ "url": "https://blog.naver.com/jbeige_review/224143632158" },{ "url": "https://blog.naver.com/another_user/223456789012" }]}
Filter by date range
{"keyword": "제주도 여행","maxPosts": 50,"dateStart": "2024-01-01","dateEnd": "2024-06-30"}
Get HTML content
{"keyword": "인테리어","maxPosts": 10,"contentType": "html"}
Use Cases
- Marketing analysis — Track brand mentions and sentiment across Naver Blog posts
- Competitor research — Monitor competitor product reviews and engagement metrics
- Review monitoring — Separate genuine reviews (내돈내산) from sponsored content (협찬) automatically
- NLP data collection — Collect clean Korean text data for training language models or sentiment analysis
- SEO research — Analyze top-ranking blog content for specific keywords
- Influencer discovery — Find bloggers with high engagement in your niche using comment/like counts and profile data
Input Options
| Field | Type | Default | Description |
|---|---|---|---|
startUrls | array | - | Direct blog post URLs to scrape |
keyword | string | - | Korean keyword for Naver Blog search |
maxPosts | integer | 10 | Max posts to scrape (1-500) |
sort | string | "sim" | "sim" (relevance) or "date" (newest) |
dateStart | string | - | Start date filter YYYY-MM-DD |
dateEnd | string | - | End date filter YYYY-MM-DD |
contentType | string | "text" | "text" or "html" |
sponsoredFilter | string | "all" | "all", "organic_only", "sponsored_only" |
deduplicate | boolean | true | Remove duplicate posts by URL |
Integration Guide
Python: Export to Google Sheets
import gspreadfrom apify_client import ApifyClient# 1. Run the scraperclient = ApifyClient("YOUR_APIFY_TOKEN")run = client.actor("YOUR_ACTOR_ID").call(run_input={"keyword": "강남 맛집","maxPosts": 50,"sponsoredFilter": "organic_only",})# 2. Get resultsitems = list(client.dataset(run["defaultDatasetId"]).iterate_items())# 3. Write to Google Sheetsgc = gspread.service_account(filename="credentials.json")sheet = gc.open("Naver Blog Data").sheet1headers = ["title", "url", "author", "date", "category","commentCount", "likeCount", "isSponsored", "sponsorType", "tags"]sheet.update("A1", [headers])rows = []for item in items:rows.append([item.get("title", ""),item.get("url", ""),item.get("author", ""),item.get("date", ""),item.get("category", ""),item.get("commentCount", 0),item.get("likeCount", 0),item.get("isSponsored", False),item.get("sponsorType", ""),", ".join(item.get("tags", [])),])sheet.update(f"A2:J{len(rows)+1}", rows)print(f"Exported {len(rows)} rows to Google Sheets")
Prerequisites: pip install apify-client gspread, Google Cloud service account with Sheets API enabled.
n8n Workflow
- Schedule Trigger — Set cron (e.g., every Monday 9 AM)
- HTTP Request — POST to
https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs?token=YOUR_TOKENwith input JSON body - Wait — Poll
GET /v2/actor-runs/{runId}until status isSUCCEEDED - HTTP Request — GET
https://api.apify.com/v2/datasets/{datasetId}/items?token=YOUR_TOKEN - Google Sheets — Append rows to your spreadsheet
- Slack (optional) — Send summary notification
Make (Integromat) Workflow
- Scheduler — Set interval
- Apify: Run an Actor — Select Naver Blog Scraper, configure input
- Apify: Get Dataset Items — Fetch results
- Iterator — Loop through items
- Google Sheets: Add a Row — Map fields to columns
- Slack: Send a Message (optional) — Post summary
Apify Scheduling + Slack Webhook
- Schedule: Apify Console → Actor → Schedules → Create schedule with cron (e.g.,
0 9 * * 1) - Slack Webhook: Create Slack app → Enable Incoming Webhooks → Copy webhook URL
- Connect: Apify Console → Actor → Integrations → Add webhook (Event:
Actor run succeeded, URL: Slack webhook) - Payload:
{"text": "Naver Blog Scraper completed!\nDataset: https://api.apify.com/v2/datasets/{{resource.defaultDatasetId}}/items?format=json"}