
Fast Instagram Post Scraper 🚀
Pricing
Pay per usage

Fast Instagram Post Scraper 🚀
Instagram Post Scraper. Post data: hashtags, comment_count, like_count, usertags, images, videos, shortcode and etc. Scrape Instagram Posts with Ease and Speed
5.0 (1)
Pricing
Pay per usage
3
Total users
108
Monthly users
55
Runs succeeded
98%
Issue response
2.3 days
Last modified
21 days ago
FEATURE REQUEST / View count
Closed
Hey,
Is it possible to add view_count to the post item?
Best
Labsed
And also the video duration.
Instagram Scraper (instagram-scraper)
view_count has been added, but it looks like it's basically null, and I need to make sure that the video duration is available via the video_dash_manifest info.
Labsed
Thanks for considering the feedback. Attaching my crawler in case it can help you with the manifest manipulation, or any other means.
import {createHttpRouter,HttpCrawler,HttpCrawlerOptions,HttpCrawlingContext,ProxyConfiguration,} from "crawlee";import jp from "jsonpath";import * as cheerio from "cheerio";import Video from "../../entities/video";import Channel from "../../entities/channel";import ChannelStats from "../../entities/channel-stats";import VideoStats from "../../entities/video-stats";import { CrawlerInterface, Platform } from "../../types";export class InstagramCrawlerextends HttpCrawler<any>implements CrawlerInterface{constructor(options?: HttpCrawlerOptions) {const router = createHttpRouter();router.addDefaultHandler(async ({ request, json }: HttpCrawlingContext) => {const nodes: Record<string, any>[] = jp.query(json,"$.data.user.edge_owner_to_timeline_media.edges[?(@.node.__typename=='GraphVideo')].node",);for (const node of nodes) {const video = Video.create({channel: request.userData.channel,videoId: node.shortcode,title: node.edge_media_to_caption?.edges[0].node.text ?? "",duration: extractDurationFromManifest(node.dash_info.video_dash_manifest,),publishedAt: new Date(node.taken_at_timestamp * 1000),});await Video.upsert(video, ["channel", "videoId"]);await VideoStats.save({video,views: node.video_view_count,comments: node.edge_media_to_comment.count,reactions: node.edge_media_preview_like.count,topReactions: [{ name: "Like", count: node.edge_media_preview_like.count },],});}});router.addHandler("stats",async ({ json, request }: HttpCrawlingContext) => {const data: Record<string, any> = jp.value(json,"$..data.xdt_shortcode_media",);await Video.update({ id: request.userData.video.id },{title: data.edge_media_to_caption?.edges[0].node.text ?? "",duration: parseInt(data.video_duration),publishedAt: new Date(data.taken_at_timestamp * 1000),},);await VideoStats.save({video: request.userData.video,views: data.video_view_count,comments: data.edge_media_to_parent_comment.count,reactions: data.edge_media_preview_like.count,topReactions: [{ name: "Like", count: data.edge_media_preview_like.count },],});},);router.addHandler("cadd", async ({ log, json }: HttpCrawlingContext) => {const data: Record<string, any> = jp.value(json, "$..data.user");log.info("Got channel data", data);const channel = Channel.create({platform: Platform.INSTAGRAM,channelId: data.id,username: data.username,name: data.full_name,});await Channel.upsert(channel, ["platform", "channelId"]);log.info("Channel saved", channel);});router.addHandler("cstats",async ({ request, json }: HttpCrawlingContext) => {await ChannelStats.save({channel: request.userData.channel,followers: json.data.user.edge_followed_by.count,});},);let proxyConfiguration: ProxyConfiguration | undefined;if (process.env.IG_PROXY_URL) {proxyConfiguration = new ProxyConfiguration({proxyUrls: [process.env.IG_PROXY_URL],});}super({...options,proxyConfiguration,requestHandler: router,preNavigationHooks: [async (_: HttpCrawlingContext, gotOptions: any) => {gotOptions.headers = {"X-IG-App-ID": "936619743392459",};},],});}addChannelAddRequest(url: string) {return this.addRequests([{ url, label: "cadd" }]);}addChannelStatsRequests(channels: Channel[]) {return this.addRequests(channels.map((channel) => ({url: `https://www.instagram.com/api/v1/users/web_profile_info/?username=${channel.username}`,label: "cstats",userData: { channel },})),);}addChannelVideoRequests(channels: Channel[]) {return this.addRequests(channels.map((channel) => ({url: `https://www.instagram.com/api/v1/users/web_profile_info/?username=${channel.username}`,userData: { channel },})),);}addVideoStatsRequests(videos: Video[]) {return this.addRequests(videos.map((video) => ({url: "https://www.instagram.com/graphql/query",method: "POST",headers: { "Content-Type": "application/x-www-form-urlencoded" },payload: new URLSearchParams({doc_id: "8845758582119845",variables: JSON.stringify({ shortcode: video.videoId }),}).toString(),useExtendedUniqueKey: true,label: "stats",userData: { video },})),);}}function extractDurationFromManifest(manifestXml: string): number {const match = cheerio.load(manifestXml, { xmlMode: true })("MPD").attr("mediaPresentationDuration")?.match(/^PT([\d.]+)S$/);return match?.[1] ? parseInt(match[1]) : 0;}
Instagram Scraper (instagram-scraper)
Thanks a lot, I found the video_duration parameter, I grabbed the post collection and the parameter was not as comprehensive as the single page information