Actor picture

Instagram Hashtag Scraper

zuzka/instagram-hashtag-scraper

Scrape Instagram hashtags the fast and easy way. Just add one or more hashtags and the scraper will extract posts, images, URLs, comments, likes, users, locations, timestamps, and more. Download structured data in JSON, CSV, XML, Excel, and HTML to use in applications, reports, and spreadsheets.

No credit card required

Author's avatarZuzka Pelechová
  • Modified
  • Users6
  • Runs51
Actor picture

Instagram Hashtag Scraper

Features

Our free Instagram Hashtag Scraper allows you to scrape posts from a hashtag search.

This unofficial Instagram API is designed to give you back the functionality to access public data that was removed from the Instagram API in 2020. It also enables anyone to extract public data from Instagram without imposing limits on whether you are an Instagram Business or Creator, or whether you are accessing public consumer account data.

Why scrape Instagram?

Instagram has about 1 billion monthly active users and is especially popular with younger users, a demographic that can otherwise be difficult for brands to reach. With so many active users, you can imagine that there is a lot of useful data on the site.

So what could you do with that data? Here are some ideas:

  • Scrape hashtags and likes to see what's becoming popular. Maybe you can get involved early or create a niche product to take advantage of short-term trends.
  • Get data based on location to discover opportunities or risks that might affect your investment or business decisions.
  • Find Instagram influencers who could help you promote your products, and track their engagement in real time.
  • Collect a constantly updated dataset on your industry, city, or interests and gain insights into ongoing change.
  • Carry out market or academic research that goes beyond surveys and polls.

If you want more ideas, check out our industries pages for ways web scraping is already being used in a wide range of companies.

Cost of usage

There are two main factors to take into account if you want to run Instagram Scraper on the Apify platform:

Using proxies

Instagram now aggressively blocks scrapers and redirects them to a login page. Currently, the only reliable solution to this problem is to use residential proxies.

Apify residential proxies

The Apify platform provides residential proxies if you have a paid subscription. These proxies are only available to be run within actors on the Apify platform, not externally. If you are interested in using residential proxies for this scraper, contact support@apify.com via email or in-app chat to get the proxies enabled.

Posts scraping

Scraping 1,000 posts requires about:

  • 5 Compute units
  • 0.24 GB of proxy traffic

Example pricing

Based on Apify's pricing at the time of writing, scraping 1,000 posts would cost 5 CU * $0.25 + 0.24 GB * 12.5 GB, which is a total of $4.25. The Personal plan ($49) would allow you to scrape about 11,500 Instagram posts monthly.

Input parameters

The input of this scraper should be JSON containing the hashtag/list of hashtags that should be visited. Required fields are:

Field Type Description
hashtags Array (required) Hashtags to search Instagram for
resultsLimit Integer How many posts should be loaded from each hashtag (limit is per hashtag)

Scrolling through large profiles or posts

Instagram imposes rate limits that will block scrolling if you want to scroll for more than 1,000 posts or comments. To work around this issue, the scraper starts injecting randomized wait times once you reach 1,000 posts or comments. This is configurable by means of the scrollWaitSecs input parameter. If you get the message that you were rate limited, consider increasing this parameter for the specific profile or post.

Instagram hashtag scraper Input example

{
    "hashtags": ["apify", "webscraping"],
    "resultsLimit": 100
}

During the actor run

During the run, the actor will output messages letting you know what's going on. Each message always contains a short label specifying which page from the provided list is currently being scraped. When items are loaded from the page, you should see a message about this event with a loaded item count and total item count for each page, in most cases.

If you provide incorrect input to the actor, it will immediately stop with a failure state and output an explanation of what is wrong.

Instagram output format

The actor stores its results in a dataset. Each item is a separate item in the dataset.

You can manage the results in any language (Python, PHP, Node JS/NPM). See our API reference to learn more about getting results from the Instagram Scraper.

The structure of each item in Instagram posts when scrolling looks like this:

{
"queryTag": "apify",
"position": 5,
"type": "Image",
"shortCode": "CPX3fh-sCAu",
"caption": "SEO GRILL MASTER🥩🔥\n•\n•\n#startuplife #apify #teambuilding",
"hashtags": [
"startuplife",
"apify",
"teambuilding"
],
"mentions": [],
"url": "https://www.instagram.com/p/CPX3fh-sCAu",
"commentsCount": 0,
"latestComments": [],
"dimensionsHeight": 1080,
"dimensionsWidth": 1080,
"displayUrl": "https://scontent-iev1-1.cdninstagram.com/v/t51.2885-15/e35/s1080x1080/191303163_3815835068538270_8324023149941865199_n.jpg?_nc_ht=scontent-iev1-1.cdninstagram.com&_nc_cat=106&_nc_ohc=exAwEYfHv2AAX8Wv4L_&edm=ABZsPhsBAAAA&ccb=7-4&oh=c785300c3ce8c893579c0be6bc418513&oe=61A4054C&_nc_sid=4efc9f",
"images": [],
"id": "2582776970667368494",
"alt": "Photo by Apify in Přístav 18600. May be an image of one or more people and outdoors.",
"likesCount": 6,
"timestamp": "2021-05-27T10:23:49.000Z",
"locationName": null,
"locationId": null,
"ownerUsername": "apifytech",
"ownerId": "29230178602",
"fullName": "Apify"
},

Personal data

You should be aware that your results could contain personal data. Personal data is protected by GDPR in the European Union and by other regulations around the world. You should not scrape personal data unless you have a legitimate reason to do so. If you're unsure whether your reason is legitimate, consult your lawyers. You can also read our blog post on the legality of web scraping.