Apify Crawler is being phased out:
Please read this
to find out why we are retiring the Crawler product, what it means for you and how you can migrate your
crawlers to the new actor, including the integrations.
- App: Users can now upload custom profile picture in account settings.
- App: Users can upload image to published actor. This image will soon appear at its public page in Apify Store.
- Actor: Startup times of actors were optimized using CPU boost during the first 10 seconds of run.
- Actor: Actor run along with its data can be now shared using public link that is available under the tab "info".
Actor: Tasks now support only JSON encoded input. This also affects API which returns actor task input directly
as object under the
input property instead of JSON-encoded pair of
See API documentation of a
get actor task endpoint.
- Actor: Actor task input can be overloaded in scheduler.
- Actor: Added limit of 300 characters for description.
- Actor: New Dockerfile templates for multifile allow faster builds.
- API: Rate limit for dataset push items endpoint increased to 300 req/s per store.
- API: Added actor author username to list actor tasks.
- API: Added input schema to build detail.
Scheduler: Schedules that use a predefined CRON expression such as
@hourly randomly change the base times
to ensure that schedules with the same expression will not all start at the same time.
This measure is aimed at improving startup times and the performance of your actors and crawlers.
Actor: Tasks can be now easily published as actor. Check out
knowledge base article to learn more.
Webhooks: Request payload can be now modified in webhook configuration. Check out
webhooks documentation to learn more.
Webhooks: Ad hoc webhooks now support idempotency key to ensure that duplicate webhooks won't get created when actor gets restarted. Check out
webhooks documentation to learn more.
Actor: Web server running in actor is not required to start in 120s
but can start at any time during the lifespan of its container.
Actor: Git deployment key is now available via API (get actor endpoint).
- Actor: "Use spare CPU capacity" configuration was removed.
Actor: Increased maximum memory for actor runs to 32 GB.
Actor: Input UI for actor now validates proxy configuration.
API: Added set of API endpoints to
manage webhooks and
retrieve webhook dispatches.
Actor: New validation options added to actor input schema field definitions.
min length of string and array field or regular expression
pattern for values of string list field.
Proxy: URL of Apify proxy now supports new parameter
country that restricts proxy IPs selection to given country.
Actor: Run can now metamorph into run of another actor.
General: Original Apify crawler has been open sourced as actor apify/legacy-phantomjs-crawler.
This actor has the same input as original Apify crawler and also the same output format.
API: New set of API endpoints to retrieve and manage the last actor (task) run and its default storages. Check
API documentation for more information.
Actor: Source code editor extended with multifile support, more in
Actor: Runs with
RUNNING state are now pinned to the top of the actor runs list.
Actor: New input UI fields added (key-value pairs, string list, hidden fields). All the field types now support
See documentation page for more information.
Actor: Improved actor publication page.
January 2019 🎆
Actor: New webhooks component enables integration of actors with external services
and orchestration of multiple actors into single pipeline.
Actor: Run console was improved and provides quick overview of actor run storages.
Actor: Published actors have new title that is displayed at its public library page.
Dataset: Added support for hidden fields (i.e. fields starting with the # character). These fields may be used to store debug
information such as errors, response codes, etc. that might be easily omitted from output.
Dataset: Added new parameters to API endpoints returning dataset items -
skipEmpty=true to omit empty items,
skipHidden=1 to omit hidden fields and
clean=true a shortcut for
API: All endpoints with
[username]~[resourceName] parameter in URL now support also
App: Code editor used at Apify app was replaced with modern
Monaco editor that supports all ES6 features.
Actor: Memory limit for free accounts increased to 8GB.
Actor: Input UI for request list now supports web
hosted or uploaded file with a list of URLs. Try out
Crawler - cheerio to see it in action (Start URLs field).
Actor: Publicated actor can be now marked as deprecated. Deprecated actor will be omitted from public library search
and flagged as deprecated. Use this feature to tell people your actor is no longer being developed, since removing it might
break integrations that depend on the actor.
App: Replaced code editor with Monaco Editor
API: Removed the
meta.clientIp field from several API endpoints due to privacy concerns
Actor: Updated base Apify Docker images
CMD rather than
ENTRYPOINT instruction to launch the code.
If you're using a custom Dockerfile that is based on Apify base images, make sure your
instruction is correct.
See Dockerfile example for more information.
Web: Added featured actors and crawlers to library. Added input schema and example run to actor detail page.
App: Added new section with third-party login services to Account page
General: Numerous performance and stability improvements, bugfixes
App: Dataset detail page now shows preview of the data.
CLI: Added new commands to manage secrets environment variables, check
apify secrets help for more details.
apify.json file structure. It will be updated automatically before execution
apify run and
apify push command. Read more in the documentation.
App: Added new Orders section
to enable customers to keep track of their custom projects.
Read more in a blog post.
App: A large number of user interface and performance improvements.
Billing: Now you can set an additional billing email address that will receive copy of all invoices.
To set it, just go to your Subscription page,
click Edit, set Billing email and click Update subscription.
API: Apify Storage API endpoints (i.e.
that use other than GET HTTP method
are now authorized using API token of user. Please see API documentation
for more information.
Note that we made a special exception in the system that will ensure that affected
users will be able to continue using the API the old way. We’ll send additional information to these users.
API: New endpoints providing
access to particular version of actor added.
API: Actor task input can be now overloaded via API. See
for more information.
Actor: Private Git repositories are now supported. Check
documentation for more information.
Actor: Improved actor UI - run console and source page has been redesigned for better developer experience.
Web: Improved search in library.
Web: A new page with awesome case studies was published.
Web: Actors and crawlers in library are now organized by categories.
Actor: "Is exclusive" functionality of scheduler now supports actor also. If this options is checked then scheduler won't start another run as long as previous is still running.
SDK: New documentation for Apify SDK is now available at https://sdk.apify.com.
Actor: Input of an actor and its input UI can be now described in input schema.
Actor: Many new public actors with UI for input released in library:
... checkout library for more.
Tasks: Released Apify actor tasks. Using them, you can create multiple configurations of a single actor and
then run the selected configuration directly from Apify Platform, schedule or API.
Proxy: New documentation of Apify proxy released.
Contains examples in multiple languages and detailed description of all provided proxies -
and Google SERP.
SDK: Released new major version v0.7 of
apify NPM package.
Check changelog for more information.
CLI: Changed behaviour of
apify run command and apify local storage directory name.
Check migration guide if you are updating from version v0.1.*.
- Actor: Added actor live view that enables connecting to running containers - read more on Apify Blog
- App: Major internal code consolidation and performance improvements
- API: Various bugfixes and improvements in code and documentation
- Proxy: Improvements in Google SERP proxies, adding additional providers
- Keboola integration: Added support for input file from other steps.
- Actor: Memory option for actor runs now supports only values that are power of 2 (ie. 128MB, 256MB, 512MB, 1024MB, 2048MB, ...)!
- Crawler: Proxy configuration of crawler now offers "automatic" mode that rotates all the proxies available for a user.
- Actor: Each actor run can now start a web server accessible at a certain unique URL. This enables you to run a web server inside the actor to provide real-time snapshots or receive tasks on the fly. See documentation for more details.
- API: Added API endpoints to abort Actor run and build.
- Proxy: New Apify Proxy service launched!
- Keboola integration: Added support for running Actors. Check knowledge article for more information.
- Actor: Minimum memory for actor runs is now 128MB.
- CLI: Added log streaming for apify push and apify call commands.
- CLI: Added parameter to clean stores before runs actor locally. Check doc for more information.
- SDK: Bunch of improvements and new features. Check the changelog.
- Crawler: Now it is not possible to combine custom proxies and Apify proxy groups.
- Actor: Run console now shows information about current/max/avegare CPU and memory.
- Actor: Actors are now notified 120s before migration to another worker machine. Check documentation for more information.
- API: Added a new API end-point to obtain information about a user account
- API: Storage API now also supports use of
[username]~[storage-name] instead of Dataset ID and Key-value store ID.
- CLI: We have just released an Apify CLI (command line tool) to simplify local development, debugging and deployment to Apify.
- Request queue: New storage type for Actor platform that helps to manage dynamic queue of URLs to be processed. Check storage documentation for more information.
apify NPM package contains a lot of new features. Check its changelog for details.
- Actor: limit for number of processes per actor run was increased to
2 x [memory megabytes] so with 2 GB memory your limit is 4000 processes.
- Actor: host machine now sends
migrating event to actor process in a case of upcoming restart or shutdown. Check documentation.
- Actor: actor runs have now fixed amount of CPU capacity reserved and therefore each run should take about the same time. We also added a new checkbox "Use spare CPU capacity" in actor settings allowing actors to use spare CPU capacity at host machine as free boost.
Community: we released a new version of our open souce
apify npm package containing a lot of new stuff to help you with your web scraping and automation projects.
Check its npm page,
source code at GitHub repository
and the documentation.
apify/actor-node-puppeteer Docker image is now deprecated. Use
apify/actor-node-chrome image instead.
- Actor: we have added
apify/actor-node-chrome-xvfb image that supports non-headless Chrome. If you choose this image then
Apify.launchPuppeteer() opens Puppeteer with non-headless Chrome by default.
- Actor: we did improvements of our infrastructure to improve actor starts and overall performance.
- Actor: logs are now rate-limited. Each actor run and build has 10 000 lines log credit with 10 lines added each second. Log lines over the limit won't be available in both UI and API.
- Web: launched Page Analyzer tool to enable setting
up crawlers with less manual steps. Read more on
- Infrastructure: Major improvements to our Linux server configuration
to improve stability and performance of the system
- Actor: actors can now run with 16GB memory (available for users with Medium and large plans see https://apify.com/docs/actor#limits
- Actor: actor runs and their default key-value stores and datasets are now being deleted after data retention period.
- App: We've added support for PayPal payments for all subscription plans
- Actor: the actor source code can now come from a GitHub Gist, which is much
simpler than having a full Git repository (read the docs)
- Support: We have re-launched the Knowledge
base with a new design and much better search options.
- API: Added API
endpoint to run an actor and get its output in a single HTTP request.
- Actor: We've added a new storage type Dataset. This enables you to
store results in a way similar to Apify Crawler.
- Actor: Actor usage statistics are now available in user account.
- Community: Released the proxy-chain NPM package as open
- Actor: Smarter allocation of tasks to servers to improve performance
- Actor: Environment variables can now also be passed to actor builds (as docker
- Actor: Added option to automatically restart actor runs on error
- Crawler: Fixed URL in the
link element of RSS formatted last
crawler execution result. This bug was causing that some RSS readers never
refreshed the data
- Crawler: Added support for automatic
rotation of user agents
- Open source: Released a new NPM package called proxy-chain
to support usage of proxies with password from headless Chrome
- API: Added support for XLSX
output Format for crawler results
- App: Upgraded the web app to Meteor 1.6 and thus greatly improved the
speed of the app
- Internal: Improved internal notifications, performance and infrastructure
- Actor: Added feature to enable actor to be anonymously runnable
- Apifier is dead, long live Apify! On 9th October we launched our
biggest upgrade yet.
- The old website at www.apifier.com was
public static website www.apify.com and the app running at my.apify.com
- A new product called Actor was
introduced. Read more in our blog
- Added actor support to scheduler.
- Git and Zip file source type added to actor.
- API: API endpoint providing results in XML format now allows to set XML tag
- API: Added support for JSONL output format
- Web: Created Crawler request form
to help customers specify the crawlers they would like to have built
- Crawler: Added finish webhook
data feature that enables sending of additional info in webhook request
- Web: Added a feature to delete user account
- Internal: Improvements in logging system
- General: Officially launched Zapier integration
- Crawler: Added a new
context.actId property that enables users to
fetch information about their crawler.
- Internal: Consolidated logging in the web application, improvements in Admin
- Crawler: Added proxy groups crawler setting to simplify usage of proxy
- Web: Added Schedule button to the crawler details page to simplify
scheduling of the crawlers
- Internal: Improvements in administration interface
- Web: Performance optimizations in UI
- Web: Added a tool to test the crawler on a single URL only (see Run
console on the crawler details page)
- Internal: Improved reports in admin section
- Web: Changed Twitter handle from @ApifierInfo to @apifier.
- Crawler: Bugfix - cookies set in the last page function were not persisted
- Internal: Deployed some upgrades in data storage infrastructure to improve
performance and reduce costs
- Web: Added sorting to Community crawlers.
- Web: Bugfixes, performance and cosmetic improvements.
- Internal: improvements in administration interface.
- Web: Extended public user profile pages in Community
- API: Bugfix in exports of results in XML format.
- Crawler: Added a new
context.actExecutionId property that enables
users to stop crawler during its execution, fetch results etc.
- Web: Improvements in internal administration interface.
- Web: Launched an external Apifier status page
page to keep our users informed
about system status and potential outages.
- Web: Numerous improvements on Community
crawlers page, added user profile page, enabled anonymous sharing
- API: Improved sorting of columns in CSV/HTML results table - values are now
sorted according to numerical indexes (e.g. "val/0", ..., "val/9", "val/10")
- Web: Launched Apifier community page
- General: Invoices are now in the PDF format and are sent to customers by email
- We didn't launch anything today, just wishing you a happy Valentine's Day
- Web: New testimonials from ePojisteni.cz
on our Customers page. Thanks Dušan and Andy!
- Web: Released a major upgrade of billing and invoicing infrastructure to support
European value-added tax (VAT)
- Web: Added a new Video tutorials page
- Crawler: Improved normalization of URLs which is used by the crawler to
determine whether a page has already been visited
(see Request.uniqueKey property in
docs for more details)
- Infrastructure: changed CDN provider from CloudFlare to AWS CloudFront to
improve performance of web and API
- API: Bugfix in the start
execution API endpoint -
synchronous wait would sometimes time out after 60 seconds
- Internal: further improvements in administration interface
- Web: improved aggregation of usage statistics, now it refreshes automatically
- Crawler: Request.proxy is now
available even inside of the page function
- Web: improved Invoices page
- Internal: improvements in administration interface
- Web: displaying snapshot of the crawling queue in the Run console
- API: all paginated API
endpoints now support
desc=1 query parameter to sort records in
- API: added support for XML
attributes in results
- General: added support for RSS output format to enable creating RSS feeds
for any website
- General: launched a new discussion forum
- Crawler: custom proxy used by a particular request is now saved in
(see Custom proxies in docs)
- Crawler: performance improvements
- API: enabled rate limiting
Major API upgrades:
- added new
endpoints to update and delete crawlers
- support for synchronous
execution of crawlers
- all endpoints that return lists now support pagination
- API Reference was greatly improved
- Web: Added new Tag and Do not start crawler if previous still
running settings to schedules
- General: Added new Initial
setting to enable users to edit cookies used by their crawlers
- Web: Added a list of
invoices to Account page
- Web: Added a new usage stats chart to Account page
- Internal: Large improvements in the deployment system completed
- General: Increased the length limit for Start URLs to 2000 characters
- Web: Showing more relevant statistics in crawler progress bar
- Web: Released a new shiny API reference
- Internal: Performance and usability improvements in admin interface
- Internal: Migrated our main database to MongoDB 3.2, deployed new integration
test suite, new metrics in admin interface
- Web: Showing current service limits on the Account page, various internal
improvements in user handling code
- Web: Added new example crawlers to demonstrate how to use page's internal
- New feature: Released Schedules
enable to automatically run crawlers at certain times.
- Web: Switched to Intercom to manage communication with our
- Web: Added functionality to test finish webhooks
- Web: Security fix - added
rel="noopener" to all external links
in order to avoid exploitation of the
- Web: Displaying Internal ID field on crawler details page, and
User ID and API token token on the
Account page to simplify setup of
- Web: Added a new Jobs page, because we're hiring!
- Web: Deployed various performance optimizations and bugfixes
- Internal: Updated our Meteor application to use ES2015 modules
- Web: Published a new testimonial from Shopwings
our Customers page. Thanks Guillaume!
queuePosition can now also be overridden in
- Web: Performance improvements of results exports
- Web: Added new example crawler to demonstrate a basic SEO analysis tool
- Internal: Upgraded Meteor platform from version 1.3 to 1.4
- Docs: Added API property name and type next to each crawler settings (see docs)
- Crawler: Added a new
context.stats property to pass statistics from
the current crawler to user code
- Crawler: Added a new signature for
that enables placing new pages
to beginning of the crawling queue and overriding
- Crawler: Enabled users to define custom User-Agent HTTP header, updated the
default value to
resemble latest Chrome on Windows.
- Web: Implemented optimization that enables user to export even large result sets
to CSV/HTML format.
- Web: Created this wonderful page to keep our users up-to-date with new