Profesia.sk Scraper
3 days trial then $25.00/month - No credit card required now
Profesia.sk Scraper
3 days trial then $25.00/month - No credit card required now
One-stop-shop for all data on Profesia.sk Extract job offers, list of companies, positions, locations... Job offers include salary, textual info, company, and more
Dataset type
datasetType
EnumOptional
Use this option if you want to scrape a whole dataset,
not just specific URLs.
This option is ignored if Start URLs: are given
Value options:
"jobOffers": string"industries": string"professions": string"companies": string"languages": string"locations": string"partners": string
Default value of this property is "jobOffers"
Detailed
jobOfferDetailed
booleanOptional
If checked, the scraper will obtain more detailed info
for job offers by visit the details page of each job offer.
If un-checked, only the data from the listing page is extracted.
For details, please refer to https://apify.com/jurooravec/profesia-sk-scraper#output
Default value of this property is true
Search keywords (full-text search)
jobOfferFilterQuery
stringOptional
Comma-separated list of keywords. If given, only entries matching the keywords will be retrieved (full-text search)
Min salary
jobOfferFilterMinSalaryValue
integerOptional
If set, only entries offering this much or more will be extracted
Min salary per hour/month
jobOfferFilterMinSalaryPeriod
EnumOptional
Choose if the minimum salary is in per hour or per month format
Value options:
"month": string"hour": string
Default value of this property is "month"
Type of employment
jobOfferFilterEmploymentType
EnumOptional
If set, only entries with this employment filter will be extracted
Value options:
"fte": string"pte": string"selfemploy": string"voluntary": string"internship": string
Remote vs On-site
jobOfferFilterRemoteWorkType
EnumOptional
If set, only entries with this type of remote work filter will be extracted
Value options:
"fullRemote": string"partialRemote": string"noRemote": string
Last N days
jobOfferFilterLastNDays
integerOptional
If set, only entries up to this much days old will be extracted. E.g. 7 = max 1 week old, 31 = max 1 month old, ...
Count the matched job offers
jobOfferCountOnly
booleanOptional
If checked, no data is extracted. Instead, the count of matched job offers is printed in the log.
Default value of this property is false
Extend Actor input from URL
inputExtendUrl
stringOptional
Extend Actor input with a config from a URL.
For example, you can store your actor input in a source control, and import it here.
In case of a conflict (if a field is defined both in Actor input and in imported input) the Actor input overwrites the imported fields.
The URL is requested with GET method, and must point to a JSON file containing a single object (the config).
If you need to send a POST request or to modify the response further, use inputExtendFromFunction
instead.
Extend Actor input from custom function
inputExtendFromFunction
stringOptional
Extend Actor input with a config defined by a custom function.
For example, you can store your actor input in a source control, and import it here.
In case of a conflict (if a field is defined both in Actor input and in imported input) the Actor input overwrites the imported fields.
The function must return an object (the config).
Start URLs from Dataset
startUrlsFromDataset
stringOptional
Import URLs to scrape from an existing Dataset.
The dataset and the field to import from should be written as {datasetID}#{field}
.
Example: datasetid123#url
will take URLs from dataset datasetid123
from field url
.
Start URLs from custom function
startUrlsFromFunction
stringOptional
Import or generate URLs to scrape using a custom function.
Include personal data
includePersonalData
booleanOptional
By default, fields that are potential personal data are censored. Toggle this option on to get the un-uncensored values.
WARNING: Turn this on ONLY if you have consent, legal basis for using the data, or at your own risk. Learn more
Default value of this property is false
Limit the number of requests
requestMaxEntries
integerOptional
If set, only at most this many requests will be processed.
The count is determined from the RequestQueue that's used for the Actor run.
This means that if requestMaxEntries
is set to 50, but the associated queue already handled 40 requests, then only 10 new requests will be handled.
Transform requests
requestTransform
stringOptional
Freely transform the request object using a custom function.
If not set, the request will remain as is.
Transform requests - Setup
requestTransformBefore
stringOptional
Use this if you need to run one-time initialization code before requestTransform
.
Transform requests - Teardown
requestTransformAfter
stringOptional
Use this if you need to run one-time teardown code after requestTransform
.
Filter requests
requestFilter
stringOptional
Decide which requests should be processed by using a custom function.
If not set, all requests will be included.
This is done after requestTransform
.
Filter requests - Setup
requestFilterBefore
stringOptional
Use this if you need to run one-time initialization code before requestFilter
.
Filter requests - Teardown
requestFilterAfter
stringOptional
Use this if you need to run one-time teardown code after requestFilter
.
RequestQueue ID
requestQueueId
stringOptional
By default, requests are stored in the default request queue.
Set this option if you want to use a non-default queue.
Learn more
NOTE: RequestQueue name can only contain letters 'a' through 'z', the digits '0' through '9', and the hyphen ('-') but only in the middle of the string (e.g. 'my-value-1')
Limit the number of scraped entries
outputMaxEntries
integerOptional
If set, only at most this many entries will be scraped.
The count is determined from the Dataset that's used for the Actor run.
This means that if outputMaxEntries
is set to 50, but the associated Dataset already has 40 items in it, then only 10 new entries will be saved.
Rename dataset fields
outputRenameFields
objectOptional
Rename fields (columns) of the output data.
If not set, all fields will have their original names.
This is done before outputPickFields
.
Keys can be nested, e.g. "someProp.value[0]"
.
Nested path is resolved using Lodash.get().
Pick dataset fields
outputPickFields
arrayOptional
Select a subset of fields of an entry that will be pushed to the dataset.
If not set, all fields on an entry will be pushed to the dataset.
This is done after outputRenameFields
.
Keys can be nested, e.g. "someProp.value[0]"
.
Nested path is resolved using Lodash.get().
Transform entries
outputTransform
stringOptional
Freely transform the output data object using a custom function.
If not set, the data will remain as is.
This is done after outputPickFields
and outputRenameFields
.
Transform entries - Setup
outputTransformBefore
stringOptional
Use this if you need to run one-time initialization code before outputTransform
.
Transform entries - Teardown
outputTransformAfter
stringOptional
Use this if you need to run one-time teardown code after outputTransform
.
Filter entries
outputFilter
stringOptional
Decide which scraped entries should be included in the output by using a custom function.
If not set, all scraped entries will be included.
This is done after outputPickFields
, outputRenameFields
, and outputTransform
.
Filter entries - Setup
outputFilterBefore
stringOptional
Use this if you need to run one-time initialization code before outputFilter
.
Filter entries - Teardown
outputFilterAfter
stringOptional
Use this if you need to run one-time teardown code after outputFilter
.
Dataset ID
outputDatasetId
stringOptional
By default, data is written to Default dataset.
Set this option if you want to write data to non-default dataset.
Learn more
NOTE: Dataset name can only contain letters 'a' through 'z', the digits '0' through '9', and the hyphen ('-') but only in the middle of the string (e.g. 'my-value-1')
Cache ID
outputCacheStoreId
stringOptional
Set this option if you want to cache scraped entries in Apify's Key-value store.
This is useful for example when you want to scrape only NEW entries. In such case, you can use the outputFilter
option to define a custom function to filter out entries already found in the cache.
Learn more
NOTE: Cache name can only contain letters 'a' through 'z', the digits '0' through '9', and the hyphen ('-') but only in the middle of the string (e.g. 'my-value-1')
Cache primary keys
outputCachePrimaryKeys
arrayOptional
Specify fields that uniquely identify entries (primary keys), so entries can be compared against the cache.
NOTE: If not set, the entries are hashed based on all fields
Cache action on result
outputCacheActionOnResult
EnumOptional
Specify whether scraped results should be added to, removed from, or overwrite the cache.
- add - Adds scraped results to the cache
- remove - Removes scraped results from the cache
- set - First clears all entries from the cache, then adds scraped results to the cache
NOTE: No action happens when this field is empty.
Value options:
"add": string"remove": string"overwrite": string
maxRequestRetries
maxRequestRetries
integerOptional
Indicates how many times the request is retried if BasicCrawlerOptions.requestHandler fails.
maxRequestsPerMinute
maxRequestsPerMinute
integerOptional
The maximum number of requests per minute the crawler should run. We can pass any positive, non-zero integer.
maxRequestsPerCrawl
maxRequestsPerCrawl
integerOptional
Maximum number of pages that the crawler will open. The crawl will stop when this limit is reached.
NOTE: In cases of parallel crawling, the actual number of pages visited might be slightly higher than this value.
minConcurrency
minConcurrency
integerOptional
Sets the minimum concurrency (parallelism) for the crawl.
WARNING: If we set this value too high with respect to the available system memory and CPU, our crawler will run extremely slow or crash. If not sure, it's better to keep the default value and the concurrency will scale up automatically.
maxConcurrency
maxConcurrency
integerOptional
Sets the maximum concurrency (parallelism) for the crawl.
navigationTimeoutSecs
navigationTimeoutSecs
integerOptional
Timeout in which the HTTP request to the resource needs to finish, given in seconds.
requestHandlerTimeoutSecs
requestHandlerTimeoutSecs
integerOptional
Timeout in which the function passed as BasicCrawlerOptions.requestHandler needs to finish, in seconds.
keepAlive
keepAlive
booleanOptional
Allows to keep the crawler alive even if the RequestQueue gets empty. With keepAlive: true the crawler will keep running, waiting for more requests to come.
ignoreSslErrors
ignoreSslErrors
booleanOptional
If set to true, SSL certificate errors will be ignored.
additionalMimeTypes
additionalMimeTypes
arrayOptional
An array of MIME types you want the crawler to load and process. By default, only text/html and application/xhtml+xml MIME types are supported.
suggestResponseEncoding
suggestResponseEncoding
stringOptional
By default this crawler will extract correct encoding from the HTTP response headers. There are some websites which use invalid headers. Those are encoded using the UTF-8 encoding. If those sites actually use a different encoding, the response will be corrupted. You can use suggestResponseEncoding to fall back to a certain encoding, if you know that your target website uses it. To force a certain encoding, disregarding the response headers, use forceResponseEncoding.
forceResponseEncoding
forceResponseEncoding
stringOptional
By default this crawler will extract correct encoding from the HTTP response headers. Use forceResponseEncoding to force a certain encoding, disregarding the response headers. To only provide a default for missing encodings, use suggestResponseEncoding.
Batch requests
perfBatchSize
integerOptional
If set, multiple Requests will be handled by a single Actor instance.
Example: If set to 20, then up to 20 requests will be handled in a single "go", after which the actor instance will reset.
See Apify documentation.
Wait (in seconds) between processing requests in a single batch
perfBatchWaitSecs
integerOptional
How long to wait between entries within a single batch.
Increase this value if you're using batching and you're sending requests to the scraped website too fast.
Example: If set to 1, then after each entry in a batch, wait 1 second before continuing.
Log Level
logLevel
EnumOptional
Select how detailed should be the logging.
Value options:
"off": string"debug": string"info": string"warn": string"error": string
Default value of this property is "info"
Error reporting dataset ID
errorReportingDatasetId
stringOptional
Dataset ID to which errors should be captured.
Default: 'REPORTING'
.
NOTE: Dataset name can only contain letters 'a' through 'z', the digits '0' through '9', and the hyphen ('-') but only in the middle of the string (e.g. 'my-value-1')
Default value of this property is "REPORTING"
Send errors to Sentry
errorSendToTelemetry
booleanOptional
Whether to report actor errors to telemetry such as Sentry.
This info is used by the author of this actor to identify broken integrations,
and track down and fix issues.
Default value of this property is true
Metamorph actor ID - metamorph to another actor at the end
metamorphActorId
stringOptional
Use this option if you want to run another actor with the same dataset after this actor has finished (AKA metamorph into another actor). Learn more
New actor is identified by its ID, e.g. "apify/web-scraper".
Metamorph actor build
metamorphActorBuild
stringOptional
Tag or number of the target actor build to metamorph into (e.g. 'beta' or '1.2.345')
Metamorph actor input
metamorphActorInput
objectOptional
Input object passed to the follow-up (metamorph) actor. Learn more
Actor Metrics
4 monthly users
-
1 star
85% runs succeeded
Created in Apr 2023
Modified a year ago