🧪High-Volume Website Content & Media Scraper avatar

🧪High-Volume Website Content & Media Scraper

Pricing

$1.25 / 1,000 units

Go to Apify Store
🧪High-Volume Website Content & Media Scraper

🧪High-Volume Website Content & Media Scraper

🧪Crawling Done Right! Let me now what you think, what or where or how i can improve my actor, and i am all for constructive criticism. So please message if you have any questions. Enjoy and have a good day.

Pricing

$1.25 / 1,000 units

Rating

5.0

(2)

Developer

Jeff Halverson

Jeff Halverson

Maintained by Community

Actor stats

6

Bookmarked

138

Total users

8

Monthly active users

3 days ago

Last modified

Share

ALL Social Media/WebScraper

Extract structured content from public social profile pages, article pages, landing pages, and other JavaScript-heavy websites. This actor focuses on turning a page into a clean record of text blocks, metadata, images, video references, and outgoing links.

What it does

  • Opens each public URL in a browser session
  • Extracts the page title and basic metadata
  • Captures article-like text blocks from the page
  • Collects image URLs, embedded video URLs, direct video source URLs, and outbound links
  • Optionally filters Facebook links out of the outbound link list
  • Stores diagnostic screenshots for failed pages

Good fit

  • Public Instagram profile pages
  • Blog articles and news pages
  • Marketing sites and landing pages
  • Content research and competitor monitoring
  • Collecting media/link inventories from public pages

Not a good fit

  • Logged-in or private content
  • Full API-style social scraping for each platform
  • Comments, followers, or hidden profile data
  • Sites that require persistent authenticated sessions

Input example

{
"startUrls": [
{ "url": "https://instagram.com/muddlemix_" },
{ "url": "https://example.com/blog/example-article" }
],
"includeFacebookLinks": true,
"headless": true,
"maxConcurrency": 3,
"requestHandlerTimeoutSecs": 90,
"navigationTimeoutSecs": 90,
"waitAfterLoadSecs": 0.5,
"saveErrorScreenshots": true
}

Output fields

Each dataset item can include:

  • url
  • title
  • meta
  • articles
  • images
  • videos
  • links
  • scraped
  • scrapeTime
  • processingTimeMs
  • contentType
  • error
  • diag
  • status

Notes

  • The default dataset is the main output.
  • Failed pages are still pushed into the dataset with status, error, and optional diagnostic screenshot URL so runs stay debuggable.
  • This actor is best positioned as a public-page media and content extractor, not a full per-platform private-data scraper.