Naver Blog Scraper - Clean Content & Review   Intelligence avatar

Naver Blog Scraper - Clean Content & Review Intelligence

Pricing

Pay per usage

Go to Apify Store
Naver Blog Scraper - Clean Content & Review   Intelligence

Naver Blog Scraper - Clean Content & Review Intelligence

Scrapes titles of websites using Crawlee and BeautifulSoup.The only Naver Blog scraper that properly extracts clean content from Naver's iframe architecture. Sponsor detection, metadata, engagement stats, and Google Sheets integration.

Pricing

Pay per usage

Rating

0.0

(0)

Developer

한프로

한프로

Maintained by Community

Actor stats

0

Bookmarked

2

Total users

0

Monthly active users

3 days ago

Last modified

Share

Naver Blog Scraper - Clean Content & Review Intelligence

The only Naver Blog scraper that properly extracts clean content from Naver's iframe-based architecture.

Why This Actor?

Naver Blog loads post content inside an iframe, which means standard scrapers return raw page HTML filled with navigation menus, ads, and sidebar widgets. This Actor:

  1. Detects the iframe and fetches the actual content page separately
  2. Parses SmartEditor 2 and 3 (SE2/SE3) markup to extract structured text
  3. Returns clean, readable content — no menus, no ads, no junk HTML

Other scrapers on the Apify Store charge $30/month and still can't solve the iframe problem. This one does it for free.

Key Features

  1. Clean content extraction — Properly handles Naver's iframe + SmartEditor 2/3 architecture
  2. Keyword search — Search Naver Blog by any Korean keyword
  3. Direct URL input — Scrape specific blog post URLs directly
  4. Sponsor detection — Automatically detects 협찬 (sponsored) vs 내돈내산 (organic) posts
  5. Sponsored content filter — Show all, organic only, or sponsored only
  6. Comment count — Extracted from listNumComment JS variable
  7. Like count (공감) — Fetched via Naver's reaction API
  8. Category extraction — Blog post category from categoryName JS variable
  9. Tag/hashtag extraction — Parsed from tags JS array in inline scripts
  10. Blog profile — Nickname, profile image, blog ID, visitor count
  11. Date range filter — Filter keyword search results by publish date
  12. HTML or text mode — Get clean text or original HTML markup
  13. Auto deduplication — Skip duplicate posts by URL
  14. Deleted/private post handling — Automatically skips unavailable posts
  15. Run statistics — totalFound, uniqueCount, duplicatesRemoved, sponsoredSkipped

Output Fields

FieldTypeExample
titlestring"강남 맛집 추천 TOP 10"
urlstring"https://blog.naver.com/user/123"
authorstring"맛집탐험가"
datestring"2024-01-15"
categorystring"맛집 리뷰"
contentstringClean blog body text (or HTML)
tagsstring[]["강남맛집", "데이트코스"]
imagesstring[]List of image URLs
commentCountinteger12
likeCountinteger45
isSponsoredbooleanfalse
sponsorTypestring"organic" / "sponsored" / "unknown"
blogProfileobject{"nickname": "J베이지", "blogId": "jbeige_review", ...}

Quick Start

Search by keyword

{
"keyword": "강남 맛집",
"maxPosts": 20,
"sort": "date",
"sponsoredFilter": "organic_only"
}

Scrape specific URLs

{
"startUrls": [
{ "url": "https://blog.naver.com/jbeige_review/224143632158" },
{ "url": "https://blog.naver.com/another_user/223456789012" }
]
}

Filter by date range

{
"keyword": "제주도 여행",
"maxPosts": 50,
"dateStart": "2024-01-01",
"dateEnd": "2024-06-30"
}

Get HTML content

{
"keyword": "인테리어",
"maxPosts": 10,
"contentType": "html"
}

Use Cases

  • Marketing analysis — Track brand mentions and sentiment across Naver Blog posts
  • Competitor research — Monitor competitor product reviews and engagement metrics
  • Review monitoring — Separate genuine reviews (내돈내산) from sponsored content (협찬) automatically
  • NLP data collection — Collect clean Korean text data for training language models or sentiment analysis
  • SEO research — Analyze top-ranking blog content for specific keywords
  • Influencer discovery — Find bloggers with high engagement in your niche using comment/like counts and profile data

Input Options

FieldTypeDefaultDescription
startUrlsarray-Direct blog post URLs to scrape
keywordstring-Korean keyword for Naver Blog search
maxPostsinteger10Max posts to scrape (1-500)
sortstring"sim""sim" (relevance) or "date" (newest)
dateStartstring-Start date filter YYYY-MM-DD
dateEndstring-End date filter YYYY-MM-DD
contentTypestring"text""text" or "html"
sponsoredFilterstring"all""all", "organic_only", "sponsored_only"
deduplicatebooleantrueRemove duplicate posts by URL

Integration Guide

Python: Export to Google Sheets

import gspread
from apify_client import ApifyClient
# 1. Run the scraper
client = ApifyClient("YOUR_APIFY_TOKEN")
run = client.actor("YOUR_ACTOR_ID").call(run_input={
"keyword": "강남 맛집",
"maxPosts": 50,
"sponsoredFilter": "organic_only",
})
# 2. Get results
items = list(client.dataset(run["defaultDatasetId"]).iterate_items())
# 3. Write to Google Sheets
gc = gspread.service_account(filename="credentials.json")
sheet = gc.open("Naver Blog Data").sheet1
headers = ["title", "url", "author", "date", "category",
"commentCount", "likeCount", "isSponsored", "sponsorType", "tags"]
sheet.update("A1", [headers])
rows = []
for item in items:
rows.append([
item.get("title", ""),
item.get("url", ""),
item.get("author", ""),
item.get("date", ""),
item.get("category", ""),
item.get("commentCount", 0),
item.get("likeCount", 0),
item.get("isSponsored", False),
item.get("sponsorType", ""),
", ".join(item.get("tags", [])),
])
sheet.update(f"A2:J{len(rows)+1}", rows)
print(f"Exported {len(rows)} rows to Google Sheets")

Prerequisites: pip install apify-client gspread, Google Cloud service account with Sheets API enabled.

n8n Workflow

  1. Schedule Trigger — Set cron (e.g., every Monday 9 AM)
  2. HTTP Request — POST to https://api.apify.com/v2/acts/YOUR_ACTOR_ID/runs?token=YOUR_TOKEN with input JSON body
  3. Wait — Poll GET /v2/actor-runs/{runId} until status is SUCCEEDED
  4. HTTP Request — GET https://api.apify.com/v2/datasets/{datasetId}/items?token=YOUR_TOKEN
  5. Google Sheets — Append rows to your spreadsheet
  6. Slack (optional) — Send summary notification

Make (Integromat) Workflow

  1. Scheduler — Set interval
  2. Apify: Run an Actor — Select Naver Blog Scraper, configure input
  3. Apify: Get Dataset Items — Fetch results
  4. Iterator — Loop through items
  5. Google Sheets: Add a Row — Map fields to columns
  6. Slack: Send a Message (optional) — Post summary

Apify Scheduling + Slack Webhook

  1. Schedule: Apify Console → Actor → Schedules → Create schedule with cron (e.g., 0 9 * * 1)
  2. Slack Webhook: Create Slack app → Enable Incoming Webhooks → Copy webhook URL
  3. Connect: Apify Console → Actor → Integrations → Add webhook (Event: Actor run succeeded, URL: Slack webhook)
  4. Payload:
    {
    "text": "Naver Blog Scraper completed!\nDataset: https://api.apify.com/v2/datasets/{{resource.defaultDatasetId}}/items?format=json"
    }