Video Transcript Scraper: Youtube, X (twitter), Tiktok, etc. avatar

Video Transcript Scraper: Youtube, X (twitter), Tiktok, etc.

Try for free

No credit card required

View all Actors
Video Transcript Scraper: Youtube, X (twitter), Tiktok, etc.

Video Transcript Scraper: Youtube, X (twitter), Tiktok, etc.

invideoiq/video-transcript-scraper
Try for free

No credit card required

Scrapes transcripts from any online video / audio content on any plateform (Youtube, X, ..) in any available language. It delivers outputs in both JSON and LLM-ready formats, making it ideal for analytics, and AI-based applications. Perfect for research and building intelligent conversational agents

SM

Failure to get any response due to codec encoding issue

Closed

samdesign opened this issue
a day ago

I tested on 2 random videos and they both fail:

This one got https://youtu.be/ULSqoPsXqhA?si=2w23aQaOelRswxVb the following eror: Traceback (most recent call last): File "d:\documents\dev\telegram-bots\youtube-analyzer\test\apify", line 15, in print(response.json()) File "C:\Users\ssipa\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\U0001f4fa' in position 186: character maps to

This one https://www.youtube.com/watch?v=PCt243ogcd8 got the following error Traceback (most recent call last): File "d:\documents\dev\telegram-bots\youtube-analyzer\test\apify", line 16, in print(json.dumps(response.json(), indent=4, ensure_ascii=False)) ^^^^^^^^^^^^^^^^^^^^^^ File "C:\Users\ssipa\AppData\Local\Programs\Python\Python311\Lib\encodings\cp1252.py", line 19, in encode return codecs.charmap_encode(input,self.errors,encoding_table)[0] ^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^^ UnicodeEncodeError: 'charmap' codec can't encode character '\uff1a' in position 530: character maps to

HO

hatem_mezlini-owner

13 hours ago

Sorry for these errors. I will fix it in a couple of hours

invideoiq avatar

Dear Sam, I passed a bug fix yesterday at 4 pm CET time to fix these issues. I tried to run the Actor with the urls provided and I can't reproduce the errors you are experiencing. Maybe you have specified a language ? In such case, would you mind sending me the whole input please ?

SM

samdesign

7 hours ago

Thanks for your reply, here is the simple python script I used to test and the associated error output in the screenshot. I just tested it today, and I have the same error.

invideoiq avatar

Thank you for sharing your code Sam!

I tested your code and it works just fine for me. So here is what I suspect: the unicdeencode error is happening because Python tries to print unicode characters to a console that doesn't support it. Are you using windows console by any chance ? Do you think you can try it on a Linux or wsl console ?

If that doesn't work, perhaps you can specify the encoding of your stdout to utf-8 by adding these both lines import sys sys.stdout.reconfigure(encoding='utf-8')

If that doesn't work, you can always encode the output in utf-8 like following: print(json.dumps(response.json(), indent=4, ensure_ascii=False).encode('utf-8') This would give you an encoded result in utf-8, try decoding it again with .decode('utf-8')

Finally, your result is probably already in a python dict object in the response.json() part. Can you make sure that my_dict = response.json() actually works and contains the data ?

Thank you for your patience. And I hope this helps !

SM

samdesign

3 hours ago

Thank you for your help and patience. You are right! I tested on MacOS and it works fine too. I think your hypothesis is right, the issue is with the terminal I was using on Windows. Your response was definitely helpful. Sorry for the false alert.

Developer
Maintained by Community
Actor metrics
  • 90 monthly users
  • 23 stars
  • 98.7% runs succeeded
  • 3.7 hours response time
  • Created in Oct 2024
  • Modified about 2 hours ago