OSINT Scraper
3 days trial then $10.00/month - No credit card required now
OSINT Scraper
3 days trial then $10.00/month - No credit card required now
Harness the power of OSINT data with our advanced OSINT Scraper. Discover keywords and leaked information from platforms like Ideone, Dumpz, Github Gist, Pastebin, Pasteorg and Textbin. You can specify search terms, customize and retrieve OSINT data out of the box.
Actor - OSINT Scraper
OSINT scraper
This actor should help you to retrieve sensitive data from many websites which might leak the data.
The OSINT data scraper supports the following features:
-
Search any keyword - You can search any keyword you would like to have and get the results
-
Scrape multiple websites - Scrape Codepad, Dumpz, Github Gist, Ideone, Pastebin, Pasteorg, Textbin and many more websites.
-
Customizable - Create a custom scraping function that will run on the results within your needs.
Bugs, fixes, updates, and changelog
This scraper is under active development. If you have any feature requests you can create an issue from here.
Input Parameters
The input of this scraper should be JSON containing the list of pages on OSINT that should be visited. Possible fields are:
-
searchKeywords
: (Required) (String) Keyword array that you want to search on the websites. -
codepad
: (Optional) (Boolean) This will enable thecodepad
module which will go to scrape http://codepad.org/. -
dumpz
: (Optional) (Boolean) This will enable thedumpz
module which will go to scrape https://dumpz.org/. -
githubgist
: (Optional) (Boolean) This will enable theGithub Gist
module which will go to scrape https://gist.github.com/. -
ideone
: (Optional) (Boolean) This will enable theideone
module which will go to scrape https://ideone.com/. -
pastebin
: (Optional) (Boolean) This will enable thepastebin
module which will go to scrape https://pastebin.com/. -
pasteorg
: (Optional) (Boolean) This will enable thepasteorg
module which will go to scrape https://www.paste.org/. -
textbin
: (Optional) (Boolean) This will enable thetextbin
module which will go to scrape https://textbin.net/. -
proxy
: (Required) (Proxy Object) Proxy configuration. -
extendOutputFunction
: (Optional) (String) Function that takes a JQuery handle ($) as an argument and returns an object with data.
This solution requires the use of Proxy servers, either your own proxy servers or you can use Apify Proxy.
Tip
When you want to have a scrape only a couple of modules, you can set true
on the module flags and the actor will initiate to scrape only these websites.
If you want to scrape Pastebin, please try to use US-based proxies because Pastebin is restricted in many countries.
Compute Unit Consumption
The actor is optimized to run blazing fast and scrape many listings as possible. Therefore, it forefronts all listing detail requests. If the actor doesn't block very often run consumes ~0.01-0.03 compute units per 100 pages.
OSINT Scraper Input example
1{ 2 "searchKeywords":[ 3 "@gmail", 4 "db_pass" 5 ], 6 "codepad": true, 7 "dumpz": true, 8 "githubgist": true, 9 "ideone": true, 10 "pastebin": true, 11 "pasteorg": true, 12 "textbin": true, 13 "proxy": { 14 "useApifyProxy": true 15 } 16}
During the Run
During the run, the actor will output messages letting you know what is going on. Each message always contains a short label specifying which page from the provided list is currently specified. When items are loaded from the page, you should see a message about this event with a loaded item count and total item count for each page.
If you provide incorrect input to the actor, it will immediately stop with a failure state and output an explanation of what is wrong.
OSINT Export
During the run, the actor stores results into a dataset. Each item is a separate item in the dataset.
You can manage the results in any language (Python, PHP, Node JS/NPM). See the FAQ or our API reference to learn more about getting results from this OSINT actor.
Scraped Output
The structure of each item in OSINT Scraper looks like this:
Item Detail
1{ 2 "keyword": "@gmail", 3 "url": "https://dumpz.org/bmwhk2yRt45M" 4}
Contact
Please visit us through epctex.com to see all the products that are available for you. If you are looking for any custom integration or so, please reach out to us through the chat box in epctex.com. In need of support? devops@epctex.com is at your service.
Actor Metrics
21 monthly users
-
11 stars
95% runs succeeded
Created in Apr 2022
Modified 9 hours ago