Your Better Instagram Scraper avatar

Your Better Instagram Scraper

Pricing

$1.10 / 1,000 results

Go to Apify Store
Your Better Instagram Scraper

Your Better Instagram Scraper

Robust scraping: - Support Posts, Comments, Replies - Will add support for Reels and Profile-based scraping

Pricing

$1.10 / 1,000 results

Rating

5.0

(6)

Developer

Austin C

Austin C

Maintained by Community

Actor stats

2

Bookmarked

53

Total users

2

Monthly active users

3 months ago

Last modified

Share

Instagram Hashtag Scraper

Developed by: @hvgupta and @austinmyc

โšก What does this Actor do?

This Actor searches Instagram hashtags and extracts:

  • Posts from hashtag pages
  • Comments on each post
  • Replies to comments
  • User information (username, content, timestamps)
  • Hierarchical data structure showing relationships between posts, comments, and replies

All data is filtered by your specified date range to get only relevant content.

๐Ÿ“‹ Input Parameters

ParameterTypeRequiredDescriptionDefault
keywordStringโœ…Hashtag keyword to search (without #)-
hashtag_limitIntegerโŒNumber of hashtags to explore1
post_limitIntegerโŒPosts to scrape per hashtag2
comment_limitIntegerโŒComments to scrape per post10

๐Ÿ“Š Output Format

The Actor returns structured JSON data with the following format:

{
"user_id": "username",
"datetime": "2024-06-24T10:30:00+08:00",
"content": "Post or comment content",
"post_id": "unique_post_identifier",
"url": "https://instagram.com/p/...",
"category": "Original Post",
"current_hash": "abc123def456...",
"parent_hash": "parent_content_hash_if_applicable"
}

Category Types:

  • "Original Post" - Main hashtag posts
  • "Comment" - Comments on posts
  • "Reply" - Replies to comments

Hash Fields (for hierarchical structure):

  • current_hash - hash for deduplication and identification
  • parent_hash - Hash of the parent

These hash fields enable the hierarchical relationship structure:

  • Original Posts: parent_hash is null
  • Comments: parent_hash references the original post's current_hash
  • Replies: parent_hash references the parent comment's current_hash

โš™๏ธ How it works

  1. Search: Finds hashtags related to your keyword
  2. Content Extraction:
    • Opens each hashtag page
    • Extracts posts within your date range
    • Collects comments and replies for each post
    • Handles Instagram's navigation
  3. Data Processing: Structures data hierarchically with relationships
  4. Output: Returns structured JSON data

๐Ÿ“ˆ Performance & Features

  • Smart Navigation: Handles Instagram's dynamic content loading
  • Hierarchical Structure: Maintains relationships between posts, comments, and replies
  • Memory Efficient: Processes data incrementally to handle large datasets

๐Ÿ“ Example Usage

Using Apify SDK (Python)

from apify_client import ApifyClient
# Initialize the ApifyClient with your API token
client = ApifyClient("YOUR_APIFY_TOKEN")
# Prepare the Actor input
run_input = {
"keyword": "travel",
"hashtag_limit": 2,
"post_limit": 5,
"comment_limit": 15
}
# Run the Actor and wait for it to finish
run = client.actor("YOUR_ACTOR_ID").call(run_input=run_input)
# Fetch and print Actor results from the run's dataset
scraped_output_list = client.dataset(run["defaultDatasetId"]).list_items()
for single_scraped_output in scraped_output_list.items():
print(single_scraped_output)

Using Apify SDK (JavaScript)

import { ApifyApi } from 'apify-client';
// Initialize the ApifyApi with your API token
const client = new ApifyApi({
token: 'YOUR_APIFY_TOKEN',
});
// Prepare the Actor input
const input = {
keyword: "travel",
hashtag_limit: 2,
post_limit: 5,
comment_limit: 15
};
// Run the Actor and wait for it to finish
const run = await client.actor('YOUR_ACTOR_ID').call(input);
// Fetch and log Actor results from the run's dataset
const { items } = await client.dataset(run.defaultDatasetId).listItems();
items.forEach((item) => {
console.dir(item);
});

Basic JSON Input

{
"keyword": "travel",
"hashtag_limit": 2,
"post_limit": 5,
"comment_limit": 15
}

inputs to the keyword should not contain any " "s

๐Ÿ” Data Quality Features

  • Automatic Deduplication: Uses content hashing to prevent duplicate entries
  • Input Validation: Validates date ranges and parameter constraints
  • Comprehensive Logging: Detailed logs for monitoring and debugging
  • Data Integrity: Maintains accurate relationships between posts, comments, and replies
  • Time Zone Handling: Properly handles Instagram's timestamp formats

๐Ÿ“Š Output Structure

The Actor returns a flat array of JSON objects, but maintains hierarchical relationships through the previously mentioned hash linking:

Example Relationship:

[
{
"current_hash": "post123",
"parent_hash": null,
"category": "Original Post",
"content": "Amazing sunset today! #travel",
"user_id": "traveler_jane",
"datetime": "2024-06-24T10:30:00+08:00",
"post_id": "p/ABC123",
"url": "https://instagram.com/p/ABC123"
},
{
"current_hash": "comment456",
"parent_hash": "post123",
"category": "Comment",
"content": "Beautiful photo!",
"user_id": "photo_lover",
"datetime": "2024-06-24T11:15:00+08:00",
"post_id": "p/ABC123/comment456",
"url": "https://instagram.com/p/ABC123"
},
{
"current_hash": "reply789",
"parent_hash": "comment456",
"category": "Reply",
"content": "I agree, stunning!",
"user_id": "nature_fan",
"datetime": "2024-06-24T12:00:00+08:00",
"post_id": "p/ABC123/comment456/reply789",
"url": "https://instagram.com/p/ABC123"
}
]