
Fast Instagram Post Scraper 🚀
No credit card required

Fast Instagram Post Scraper 🚀
No credit card required
Instagram Post Scraper. Post data: hashtags, comment_count, like_count, usertags, images, videos, shortcode and etc. Scrape Instagram Posts with Ease and Speed
Actor Metrics
31 monthly users
No reviews yet
1 bookmark
98% runs succeeded
2.3 days response time
Created in Feb 2025
Modified 13 days ago
FEATURE REQUEST / View count
Hey,
Is it possible to add view_count to the post item?
Best
Labsed
And also the video duration.
Instagram Scraper (instagram-scraper)
view_count has been added, but it looks like it's basically null, and I need to make sure that the video duration is available via the video_dash_manifest info.
Labsed
Thanks for considering the feedback. Attaching my crawler in case it can help you with the manifest manipulation, or any other means.
1import { 2 createHttpRouter, 3 HttpCrawler, 4 HttpCrawlerOptions, 5 HttpCrawlingContext, 6 ProxyConfiguration, 7} from "crawlee"; 8import jp from "jsonpath"; 9import * as cheerio from "cheerio"; 10import Video from "../../entities/video"; 11import Channel from "../../entities/channel"; 12import ChannelStats from "../../entities/channel-stats"; 13import VideoStats from "../../entities/video-stats"; 14import { CrawlerInterface, Platform } from "../../types"; 15 16export class InstagramCrawler 17 extends HttpCrawler<any> 18 implements CrawlerInterface 19{ 20 constructor(options?: HttpCrawlerOptions) { 21 const router = createHttpRouter(); 22 23 router.addDefaultHandler(async ({ request, json }: HttpCrawlingContext) => { 24 const nodes: Record<string, any>[] = jp.query( 25 json, 26 "$.data.user.edge_owner_to_timeline_media.edges[?(@.node.__typename=='GraphVideo')].node", 27 ); 28 29 for (const node of nodes) { 30 const video = Video.create({ 31 channel: request.userData.channel, 32 videoId: node.shortcode, 33 title: node.edge_media_to_caption?.edges[0].node.text ?? "", 34 duration: extractDurationFromManifest( 35 node.dash_info.video_dash_manifest, 36 ), 37 publishedAt: new Date(node.taken_at_timestamp * 1000), 38 }); 39 40 await Video.upsert(video, ["channel", "videoId"]); 41 42 await VideoStats.save({ 43 video, 44 views: node.video_view_count, 45 comments: node.edge_media_to_comment.count, 46 reactions: node.edge_media_preview_like.count, 47 topReactions: [ 48 { name: "Like", count: node.edge_media_preview_like.count }, 49 ], 50 }); 51 } 52 }); 53 54 router.addHandler( 55 "stats", 56 async ({ json, request }: HttpCrawlingContext) => { 57 const data: Record<string, any> = jp.value( 58 json, 59 "$..data.xdt_shortcode_media", 60 ); 61 62 await Video.update( 63 { id: request.userData.video.id }, 64 { 65 title: data.edge_media_to_caption?.edges[0].node.text ?? "", 66 duration: parseInt(data.video_duration), 67 publishedAt: new Date(data.taken_at_timestamp * 1000), 68 }, 69 ); 70 71 await VideoStats.save({ 72 video: request.userData.video, 73 views: data.video_view_count, 74 comments: data.edge_media_to_parent_comment.count, 75 reactions: data.edge_media_preview_like.count, 76 topReactions: [ 77 { name: "Like", count: data.edge_media_preview_like.count }, 78 ], 79 }); 80 }, 81 ); 82 83 router.addHandler("cadd", async ({ log, json }: HttpCrawlingContext) => { 84 const data: Record<string, any> = jp.value(json, "$..data.user"); 85 86 log.info("Got channel data", data); 87 88 const channel = Channel.create({ 89 platform: Platform.INSTAGRAM, 90 channelId: data.id, 91 username: data.username, 92 name: data.full_name, 93 }); 94 95 await Channel.upsert(channel, ["platform", "channelId"]); 96 97 log.info("Channel saved", channel); 98 }); 99 100 router.addHandler( 101 "cstats", 102 async ({ request, json }: HttpCrawlingContext) => { 103 await ChannelStats.save({ 104 channel: request.userData.channel, 105 followers: json.data.user.edge_followed_by.count, 106 }); 107 }, 108 ); 109 110 let proxyConfiguration: ProxyConfiguration | undefined; 111 112 if (process.env.IG_PROXY_URL) { 113 proxyConfiguration = new ProxyConfiguration({ 114 proxyUrls: [process.env.IG_PROXY_URL], 115 }); 116 } 117 118 super({ 119 ...options, 120 proxyConfiguration, 121 requestHandler: router, 122 preNavigationHooks: [ 123 async (_: HttpCrawlingContext, gotOptions: any) => { 124 gotOptions.headers = { 125 "X-IG-App-ID": "936619743392459", 126 }; 127 }, 128 ], 129 }); 130 } 131 132 addChannelAddRequest(url: string) { 133 return this.addRequests([{ url, label: "cadd" }]); 134 } 135 136 addChannelStatsRequests(channels: Channel[]) { 137 return this.addRequests( 138 channels.map((channel) => ({ 139 url: `https://www.instagram.com/api/v1/users/web_profile_info/?username=${channel.username}`, 140 label: "cstats", 141 userData: { channel }, 142 })), 143 ); 144 } 145 146 addChannelVideoRequests(channels: Channel[]) { 147 return this.addRequests( 148 channels.map((channel) => ({ 149 url: `https://www.instagram.com/api/v1/users/web_profile_info/?username=${channel.username}`, 150 userData: { channel }, 151 })), 152 ); 153 } 154 155 addVideoStatsRequests(videos: Video[]) { 156 return this.addRequests( 157 videos.map((video) => ({ 158 url: "https://www.instagram.com/graphql/query", 159 method: "POST", 160 headers: { "Content-Type": "application/x-www-form-urlencoded" }, 161 payload: new URLSearchParams({ 162 doc_id: "8845758582119845", 163 variables: JSON.stringify({ shortcode: video.videoId }), 164 }).toString(), 165 useExtendedUniqueKey: true, 166 label: "stats", 167 userData: { video }, 168 })), 169 ); 170 } 171} 172 173function extractDurationFromManifest(manifestXml: string): number { 174 const match = cheerio 175 .load(manifestXml, { xmlMode: true })("MPD") 176 .attr("mediaPresentationDuration") 177 ?.match(/^PT([\d.]+)S$/); 178 179 return match?.[1] ? parseInt(match[1]) : 0; 180}
Instagram Scraper (instagram-scraper)
Thanks a lot, I found the video_duration parameter, I grabbed the post collection and the parameter was not as comprehensive as the single page information