Pricing

$30.00/month + usage

Go to Store

Linkedin Posts Informations Scraper

Try for free

Developed by

SASWAVE

Scrape linkedin posts from linkedin post search results, url post or linkedin member. Supports advanced linkedin search filters. Extract posts data at scale.

3.2 (3)

Pricing

$30.00/month + usage

Total users

713

Monthly users

Runs succeeded

89%

Issues response

13 days

Last modified

11 days ago

Automation

Lead generation

Back to issues Create new issue

Broken UTF-8 encoding

Closed

openjoy opened this issue

Hello. There's another issue we found and it's a bit weird. We see some issues with text encoding (i.e. post contents). Some unicode characters are represented incorrectly, mostly emojis but also some punctuation marks and even non-breakable spaces. But it's not like 100% of unicode chars are broken. Some emojis, for example, are represented correctly. We tried using different tools/libs to fix the encoding but without success. And we see the broken chars already in APIFY datasets so our guess is that the issue is somewhere in the actor (or surrounding libs/infra). Could you please take a look?

Example input:

{
  "cookies": [...],
  "days_since_post": 14,
  "max_posts": 0,
  "url_search": "https://www.linkedin.com/in/danielmoka/"
}

Example post: https://www.linkedin.com/posts/danielmoka_50-off-black-friday-deal-on-learning-activity-7267795633724407808-epJX

What was scraped (copied from APIFY web UI but we see the same picture from other tools): What youâll get: â¢ 4+ hours of hands-on 𝐯𝐢𝐝𝐞𝐨 𝐭𝐮𝐭𝐨𝐫𝐢𝐚𝐥𝐬 on TDD â¢ A 𝐓𝐃𝐃 𝐞-𝐛𝐨𝐨𝐤 packed with 10+ years of experience â¢ Pro tips on mastering 𝐭𝐞𝐬𝐭𝐢𝐧𝐠 𝐚𝐧𝐝 𝐫𝐞𝐟𝐚𝐜𝐭𝐨𝐫𝐢𝐧𝐠 â¢ 3 𝐫𝐞𝐚𝐥-𝐰𝐨𝐫𝐥𝐝 projects written in C#/.NET

SASWAVE (saswave)

Added to the todo for tomorrow morning , But we found that text saved in apify dataset isn't always the same encoding as the one you print for logs before saving (if this makes sense to you)

We will check if it's code related or apify related when we push to the storage

openjoy

Thank you for quick response, as always. I understand that the encoding can get broken in various places. Admittedly, we haven't checked the run logs to investigate. In the end, we, of course, just want to grab a dataset file so I hope this can be fixed. I don't think we had this issue with other actors but it could be because the results there were within ASCII charset.

If it's any help, one low-level example of broken encoding: This character "♻️" was encoded as C3 A2 C2 99 C2 BB C3 AF C2 B8 C2 8F instead of E2 99 BB EF B8 8F. My not very educated guess is that this could be double-encoding but I don't think this explains all the symptoms.

openjoy

I just checked and it's indeed double-encoding. We can probably do some post-processing on our side but it's better to fix the root cause of course.

SASWAVE (saswave)

We have updated the actor, have a try

Probably related to the way we were decoding linkedin text content, we removed the decoding step and return what linkedin returns

openjoy

Thank you for looking into it. I tried re-running one of the tasks with the latest build (0.0.119) and the issue still persists unfortunately. Seems to be on the same level as before.

SASWAVE (saswave)

If it didn't help we can't do much.

At this point we return what linkedin returns us as content

I know the encoding in dataset is not always the same as the data we initial want to save (we faced this kind of issue with another actor)

Do you want us to handle unicode cleaning (ignore out of scope char / emojis ) ?

openjoy

My assumption was that this could be some configuration issue on the http client, headless browser (if that's how this works), crawling library, etc. The content returned from LinkedIn to the browser seems to be correct (I double-checked the binary representation) so if you're saying the response was incorrect, could be worth investigating the differences in how the data is requested. Or, like you said, if the issue is with dataset then maybe APIFY devs can suggest something.

In any case, while this is not ideal, we can implement an ugly workaround on our side. Please don't filter out the content as the data is still there. Also, I was wrong at calling the encoding "broken". It's perfectly valid UTF-8, just with some double encoding here and there. Filtering it would be as difficult as fixing the data.

Add comment

Linkedin post scraper

curious_coder/linkedin-post-search-scraper

Scrape linkedin posts or updates from linkedin post search results. Supports advanced linkedin search filters. Extract posts from any linkedin member

Curious Coder

6.9K

3.5

LinkedIn Posts Scraper

pratikdani/linkedin-posts-scraper

Scrape LinkedIn posts data from LinkedIn Post URLs.

Pratik Dani

340

5.0

Linkedin Post Data Scraper | No Cookies

apimaestro/linkedin-post-detail

Linkedin Posts Data Scraper: Extract detailed information from LinkedIn profiles including work experience, education history, and location details.

API Maestro

152

5.0

Linkedin Post Scraper ✅ No cookies ✅ $2 per 1k posts

supreme_coder/linkedin-post

Scrape unlimited Linkedin posts without risking your Linkedin account. Live data, Super fast scraping at affordable cost. High success rate

Supreme Coder

3.5

Linkedin post reactions scraper

curious_coder/linkedin-post-reactions-scraper

Scrape reactions or likes from linkedin post

Curious Coder

500

Linkedin Profile Posts Scraper [NO COOKIES]

apimaestro/linkedin-profile-posts

Scrape LinkedIn posts data for a given LinkedIn profile including post content, reactions, comments count, and media attachments

API Maestro

4.7K

4.1

Linkedin Posts Search Scraper | No Cookies

apimaestro/linkedin-posts-search-scraper-no-cookies

Scrape LinkedIn posts by keyword without login. Get post content, reactions, author details, and media. Sort by relevance or date. Perfect for research, analysis, and monitoring trends.

API Maestro

1.3K

4.3

Linkedin Posts Reactions Scraper

saswave/linkedin-posts-interactions-parser

Extract people who comments, mentions and likes from linkedin post or article. Allows you to extract all interactions / reactions from a url. Input can be a /posts url or article url. Also provide a /company or /in url and it will parse multiple posts from the source (organic and promoted LinkedIn)

SASWAVE

298

5.0

Linkedin Post Comments Scraper

bhansalisoft/linkedin-post-comments-scraper

Linkedin Post Comments Scraper - Easily extract comments from any LinkedIn post with our LinkedIn Post Comments Scraper. Fast, secure, and no coding required.

bhansalisoft

Linkedin Posts Scraper (users,companies,groups) ✅ No cookies ✅

scraping_solutions/linkedin-posts-scraper-users-groups-schools-no-cookies

LinkedIn Timelines Posts Scraper Scrape LinkedIn timeline posts from users, groups, showcase pages, and schools quickly and efficiently. This actor extracts detailed post data, including content, timestamps, engagement metrics (likes, comments, shares), and media (images, videos, links).