Your Better Instagram Scraper avatar
Your Better Instagram Scraper

Pricing

$1.10 / 1,000 results

Go to Store
Your Better Instagram Scraper

Your Better Instagram Scraper

Developed by

Austin C

Austin C

Maintained by Community

Robust scraping: - Support Posts, Comments, Replies - Will add support for Reels and Profile-based scraping

5.0 (2)

Pricing

$1.10 / 1,000 results

0

Total users

5

Monthly users

5

Runs succeeded

0%

Last modified

3 days ago

Instagram Hashtag Scraper

Developed by: @hvgupta and @austinmyc

⚡ What does this Actor do?

This Actor searches Instagram hashtags and extracts:

  • Posts from hashtag pages
  • Comments on each post
  • Replies to comments
  • User information (username, content, timestamps)
  • Hierarchical data structure showing relationships between posts, comments, and replies

All data is filtered by your specified date range to get only relevant content.

📋 Input Parameters

ParameterTypeRequiredDescriptionDefault
keywordStringHashtag keyword to search (without #)-
date_fromDateStart date for content filtering (YYYY-MM-DD)1 day ago
date_toDateEnd date for content filtering (YYYY-MM-DD)Today
hashtag_limitIntegerNumber of hashtags to explore1
post_limitIntegerPosts to scrape per hashtag2
comment_limitIntegerComments to scrape per post10

📊 Output Format

The Actor returns structured JSON data with the following format:

{
"user_id": "username",
"datetime": "2024-06-24T10:30:00+08:00",
"content": "Post or comment content",
"post_id": "unique_post_identifier",
"url": "https://instagram.com/p/...",
"category": "Original Post",
"current_hash": "abc123def456...",
"parent_hash": "parent_content_hash_if_applicable"
}

Category Types:

  • "Original Post" - Main hashtag posts
  • "Comment" - Comments on posts
  • "Reply" - Replies to comments

Hash Fields (for hierarchical structure):

  • current_hash - hash for deduplication and identification
  • parent_hash - Hash of the parent

These hash fields enable the hierarchical relationship structure:

  • Original Posts: parent_hash is null
  • Comments: parent_hash references the original post's current_hash
  • Replies: parent_hash references the parent comment's current_hash

⚙️ How it works

  1. Search: Finds hashtags related to your keyword
  2. Content Extraction:
    • Opens each hashtag page
    • Extracts posts within your date range
    • Collects comments and replies for each post
    • Handles Instagram's navigation
  3. Data Processing: Structures data hierarchically with relationships
  4. Output: Returns structured JSON data

📈 Performance & Features

  • Smart Navigation: Handles Instagram's dynamic content loading
  • Hierarchical Structure: Maintains relationships between posts, comments, and replies
  • Memory Efficient: Processes data incrementally to handle large datasets

📝 Example Usage

Using Apify SDK (Python)

from apify_client import ApifyClient
# Initialize the ApifyClient with your API token
client = ApifyClient("YOUR_APIFY_TOKEN")
# Prepare the Actor input
run_input = {
"keyword": "travel",
"date_from": "2024-06-01",
"date_to": "2024-06-24",
"hashtag_limit": 2,
"post_limit": 5,
"comment_limit": 15
}
# Run the Actor and wait for it to finish
run = client.actor("YOUR_ACTOR_ID").call(run_input=run_input)
# Fetch and print Actor results from the run's dataset
scraped_output_list = client.dataset(run["defaultDatasetId"]).list_items()
for single_scraped_output in scraped_output_list.items():
print(single_scraped_output)

Using Apify SDK (JavaScript)

import { ApifyApi } from 'apify-client';
// Initialize the ApifyApi with your API token
const client = new ApifyApi({
token: 'YOUR_APIFY_TOKEN',
});
// Prepare the Actor input
const input = {
keyword: "travel",
date_from: "2024-06-01",
date_to: "2024-06-24",
hashtag_limit: 2,
post_limit: 5,
comment_limit: 15
};
// Run the Actor and wait for it to finish
const run = await client.actor('YOUR_ACTOR_ID').call(input);
// Fetch and log Actor results from the run's dataset
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
console.dir(item);
});

Basic JSON Input

{
"keyword": "travel",
"date_from": "2024-06-01",
"date_to": "2024-06-24",
"hashtag_limit": 2,
"post_limit": 5,
"comment_limit": 15
}

inputs to the keyword should not contain any " "s

🔍 Data Quality Features

  • Automatic Deduplication: Uses content hashing to prevent duplicate entries
  • Input Validation: Validates date ranges and parameter constraints
  • Comprehensive Logging: Detailed logs for monitoring and debugging
  • Data Integrity: Maintains accurate relationships between posts, comments, and replies
  • Time Zone Handling: Properly handles Instagram's timestamp formats

📊 Output Structure

The Actor returns a flat array of JSON objects, but maintains hierarchical relationships through the previously mentioned hash linking:

Example Relationship:

[
{
"current_hash": "post123",
"parent_hash": null,
"category": "Original Post",
"content": "Amazing sunset today! #travel",
"user_id": "traveler_jane",
"datetime": "2024-06-24T10:30:00+08:00",
"post_id": "p/ABC123",
"url": "https://instagram.com/p/ABC123"
},
{
"current_hash": "comment456",
"parent_hash": "post123",
"category": "Comment",
"content": "Beautiful photo!",
"user_id": "photo_lover",
"datetime": "2024-06-24T11:15:00+08:00",
"post_id": "p/ABC123/comment456",
"url": "https://instagram.com/p/ABC123"
},
{
"current_hash": "reply789",
"parent_hash": "comment456",
"category": "Reply",
"content": "I agree, stunning!",
"user_id": "nature_fan",
"datetime": "2024-06-24T12:00:00+08:00",
"post_id": "p/ABC123/comment456/reply789",
"url": "https://instagram.com/p/ABC123"
}
]