
Your Better Instagram Scraper
Pricing
$1.10 / 1,000 results

Your Better Instagram Scraper
Robust scraping: - Support Posts, Comments, Replies - Will add support for Reels and Profile-based scraping
5.0 (2)
Pricing
$1.10 / 1,000 results
0
Total users
5
Monthly users
5
Runs succeeded
0%
Last modified
3 days ago
Instagram Hashtag Scraper
Developed by: @hvgupta and @austinmyc
⚡ What does this Actor do?
This Actor searches Instagram hashtags and extracts:
- Posts from hashtag pages
- Comments on each post
- Replies to comments
- User information (username, content, timestamps)
- Hierarchical data structure showing relationships between posts, comments, and replies
All data is filtered by your specified date range to get only relevant content.
📋 Input Parameters
Parameter | Type | Required | Description | Default |
---|---|---|---|---|
keyword | String | ✅ | Hashtag keyword to search (without #) | - |
date_from | Date | ❌ | Start date for content filtering (YYYY-MM-DD) | 1 day ago |
date_to | Date | ❌ | End date for content filtering (YYYY-MM-DD) | Today |
hashtag_limit | Integer | ❌ | Number of hashtags to explore | 1 |
post_limit | Integer | ❌ | Posts to scrape per hashtag | 2 |
comment_limit | Integer | ❌ | Comments to scrape per post | 10 |
📊 Output Format
The Actor returns structured JSON data with the following format:
{"user_id": "username","datetime": "2024-06-24T10:30:00+08:00","content": "Post or comment content","post_id": "unique_post_identifier","url": "https://instagram.com/p/...","category": "Original Post","current_hash": "abc123def456...","parent_hash": "parent_content_hash_if_applicable"}
Category Types:
"Original Post"
- Main hashtag posts"Comment"
- Comments on posts"Reply"
- Replies to comments
Hash Fields (for hierarchical structure):
current_hash
- hash for deduplication and identificationparent_hash
- Hash of the parent
These hash fields enable the hierarchical relationship structure:
- Original Posts:
parent_hash
isnull
- Comments:
parent_hash
references the original post'scurrent_hash
- Replies:
parent_hash
references the parent comment'scurrent_hash
⚙️ How it works
- Search: Finds hashtags related to your keyword
- Content Extraction:
- Opens each hashtag page
- Extracts posts within your date range
- Collects comments and replies for each post
- Handles Instagram's navigation
- Data Processing: Structures data hierarchically with relationships
- Output: Returns structured JSON data
📈 Performance & Features
- Smart Navigation: Handles Instagram's dynamic content loading
- Hierarchical Structure: Maintains relationships between posts, comments, and replies
- Memory Efficient: Processes data incrementally to handle large datasets
📝 Example Usage
Using Apify SDK (Python)
from apify_client import ApifyClient# Initialize the ApifyClient with your API tokenclient = ApifyClient("YOUR_APIFY_TOKEN")# Prepare the Actor inputrun_input = {"keyword": "travel","date_from": "2024-06-01","date_to": "2024-06-24","hashtag_limit": 2,"post_limit": 5,"comment_limit": 15}# Run the Actor and wait for it to finishrun = client.actor("YOUR_ACTOR_ID").call(run_input=run_input)# Fetch and print Actor results from the run's datasetscraped_output_list = client.dataset(run["defaultDatasetId"]).list_items()for single_scraped_output in scraped_output_list.items():print(single_scraped_output)
Using Apify SDK (JavaScript)
import { ApifyApi } from 'apify-client';// Initialize the ApifyApi with your API tokenconst client = new ApifyApi({token: 'YOUR_APIFY_TOKEN',});// Prepare the Actor inputconst input = {keyword: "travel",date_from: "2024-06-01",date_to: "2024-06-24",hashtag_limit: 2,post_limit: 5,comment_limit: 15};// Run the Actor and wait for it to finishconst run = await client.actor('YOUR_ACTOR_ID').call(input);// Fetch and log Actor results from the run's datasetconst { items } = await client.dataset(run.defaultDatasetId).listItems();items.forEach((item) => {console.dir(item);});
Basic JSON Input
{"keyword": "travel","date_from": "2024-06-01","date_to": "2024-06-24","hashtag_limit": 2,"post_limit": 5,"comment_limit": 15}
inputs to the keyword
should not contain any " "s
🔍 Data Quality Features
- Automatic Deduplication: Uses content hashing to prevent duplicate entries
- Input Validation: Validates date ranges and parameter constraints
- Comprehensive Logging: Detailed logs for monitoring and debugging
- Data Integrity: Maintains accurate relationships between posts, comments, and replies
- Time Zone Handling: Properly handles Instagram's timestamp formats
📊 Output Structure
The Actor returns a flat array of JSON objects, but maintains hierarchical relationships through the previously mentioned hash linking:
Example Relationship:
[{"current_hash": "post123","parent_hash": null,"category": "Original Post","content": "Amazing sunset today! #travel","user_id": "traveler_jane","datetime": "2024-06-24T10:30:00+08:00","post_id": "p/ABC123","url": "https://instagram.com/p/ABC123"},{"current_hash": "comment456","parent_hash": "post123","category": "Comment","content": "Beautiful photo!","user_id": "photo_lover","datetime": "2024-06-24T11:15:00+08:00","post_id": "p/ABC123/comment456","url": "https://instagram.com/p/ABC123"},{"current_hash": "reply789","parent_hash": "comment456","category": "Reply","content": "I agree, stunning!","user_id": "nature_fan","datetime": "2024-06-24T12:00:00+08:00","post_id": "p/ABC123/comment456/reply789","url": "https://instagram.com/p/ABC123"}]
On this page
Share Actor: