Instagram Reel Scraper avatar
Instagram Reel Scraper
Try for free

No credit card required

View all Actors
Instagram Reel Scraper

Instagram Reel Scraper

apify/instagram-reel-scraper
Try for free

No credit card required

Scrape data from Instagram reels. Just add one or more Instagram usernames and get your data in seconds including hashtags, mentions, comments, images, likes, locations, and metadata. Export scraped data, run the scraper via API, schedule and monitor runs or integrate with other tools.

User avatar

New instagram profiles are scraped that weren't requested

Closed

heliotrope_rumor opened this issue
5 months ago

When I run a scraping job of 100 reels per profile and give a list of 500 profiles, I expect to download 50000 reels ideally. However, my profile list is not current, and there could be profiles that do not exist or have gone private. Therefore, I see around 25k reels from the scraping job. However, I notice that there are 581 unique profiles scraped in the results, with 224 new profile handles that I did not provide. I wanted to know how these new profile handles are decided. Also, I see in my results that most of the profiles have around 50 reels per profile when I set the limit to 100, and there are some profiles with more than 800 reels/videos. I wanted to know if there is something I am missing.

User avatar

heliotrope_rumor

5 months ago

Also, I cannot download videos from most of the videoURLs after 24hrs. Is this expected? What is the link expiration duration?

User avatar

Hi, thanks for reaching out, we're looking into it and will let you know!

User avatar

Hi! Total amount of reels looks correct because some profiles have less than 100 reels so actor getting as much as available, but not 100. 445 unique profiles evaluated by https://console.apify.com/view/runs/29z150BhIzRqrp7LS - can you please provide example when reel is not from input profile? From my evaluation I can not find such cases. Media URLs expiring after several hours, we don't know exact amount, its normal and expected for all Meta static data (images or videos)

User avatar

heliotrope_rumor

4 months ago

Hi, following is the analysis from this run using the Python API. There are 695 unique profiles in the resulting dataset, with 329 new profiles that did not exist in the input I gave when calling the actor. You can see some examples in the attached screenshot.

User avatar

Hi! I stripped dataset to https://api.apify.com/v2/datasets/cikzqPO3KoSJoLYK1/items?clean=true&fields=inputUrl,shortCode,ownerUsername&format=json From manual testing looks like mismatches is when reel includes multiple users, i.e.

1{
2  "inputUrl": "https://www.instagram.com/travelholicsouls",
3  "shortCode": "CzV7rOgM4Qp",
4  "ownerUsername": "travellovebirds_"
5}

and https://www.instagram.com/reel/CzV7rOgM4Qp/ "travellovebirds_ and travelholicsouls" so actor just follow logic in reel itself, if "owner" with "other profile" then reel appears in other profile reels feed as well (see screen attached)

User avatar

heliotrope_rumor

3 months ago

Hi. Thank you for clarifying that. Does that mean there will be duplicate entries for a reel from all the profiles in "other profile"?

I have one more question: Is there a way to control what reels will be scraped? I want to scrape more reels from the same profile but avoid scraping what I collected from previous runs.

User avatar

Hi! Reels deduplicated by URL per default dataset, so i.e. if both travellovebirds_ and travelholicsouls scraped in single run then reel instagram.com/p/SHORTCODE_NNN will be saved to dataset once on first match.

More options to control what should be scraped provided by separate actor https://apify.com/apify/instagram-scraper however logic is not related to previous run or runs. You can use "newer than" and get new posts by schedule.

User avatar

Follow up: added coauthorProducers, see sample run https://console.apify.com/view/runs/yHUTAotERpbITXa2U

I´m going to close the issue now, but if there would be anything else we could help with, please let us know.

Developer
Maintained by Apify
Actor metrics
  • 296 monthly users
  • 100.0% runs succeeded
  • 1.1 days response time
  • Created in Nov 2022
  • Modified 3 days ago