Website Content Crawler avatar
Website Content Crawler
Try for free

No credit card required

View all Actors
Website Content Crawler

Website Content Crawler

apify/website-content-crawler
Try for free

No credit card required

Crawl websites and extract text content to feed AI models, LLM applications, vector databases, or RAG pipelines. The Actor supports rich formatting using Markdown, cleans the HTML, downloads files, and integrates well with 🦜🔗LangChain, LlamaIndex, and the wider LLM ecosystem.

KG

passthrough variables?

Open

knowledgeable_grimace opened this issue
2 months ago

Is it possible to add variables when a run is triggered (via Make.com), so that they are included in the payload when the dataset is complete and loaded? Apify action doesn't need to do anything with these variables ... just pay them along to the output.

JV

jan_van_schaffelaar

2 months ago

Did you find a solution for this? I am struggling with the exact same scenario.

IN

ianpoley

2 months ago

Would love an answer to this as well!

KG

knowledgeable_grimace

2 months ago

nothing so far. I have contemplated some workarounds, but they are all messy

jindrich.bar avatar

Hello and thank you (all) for your interest in this Actor!

I've relayed this question to our Integrations developers and am currently waiting for the answer. Unfortunately, our team is only responsible for the Actor development itself, so I cannot provide much help (I'm unfamiliar with Make.com).

I'll keep you posted once I get info from the Integration devs.

Thank you for your patience!

jindrich.bar avatar

Maybe in the meantime - can you describe your use case in more detail? What are you trying to achieve? Which Apify integrations (and what Make.com modules / actions) are you using?

KG

knowledgeable_grimace

2 months ago

Hello! Certainly … I’m am kicking off a run in Make.com using its integration with Apify. There’s a module for “Run a task”. Then, once the task is complete, Apify triggers a webhook at Make.com. I then use “get dataset” to get all the data from the run and do all kinds of things.

The purpose is that I want to take different kinds of actions following the run depending on various conditions that are determined at the start of the run.

What I’d like to do is, depending on various factors, pass a variable into the Apify run like {“custom_variables”:[{“update_level”:”full”}]}. Then, I’d like the Apify tasks to “preserve” this custom_variables array, just storing it. And eventually, when I run “get dataset” in Make.com, I’d like to get access to that array (along with all the usual dataset data). Then, in Make.com, using those variables, I will do different things, depending on what’s in the variables.

IN

ianpoley

2 months ago

The use case I'm trying to accomplish is to simply pass along a few custom variables to the Website Content Crawler actor from our API call. Ultimately I want to pass those variables through to our Pinecone integration, so that they're included in the metadata of the newly-inserted records.

JV

jan_van_schaffelaar

2 months ago

Perhaps my workaround works for you too ianpoley.

I make sure that every scrape request to apify also represents a row in airtable. There it is easy to collect all the data you want together in 1 record and use it later for any sort of purpose.

Make.com flow 1 : When I trigger the apify actor in make.com you get a datasetID. I then add a module in my scenario to store this apify dataset ID in Airtable together with all my metadata.

Make.com flow 2: Once the actor is done with the run in apify it triggers my make.com scenario via a webhook message. I then make an api call to apify to retrieve the scraped data for the specific dataset ID. Next step I make a search to Airtable to find a record based on dataset ID, you can do this in a formula in the Airtable module. In the response I receive all my meta data.

I am now struggling a bit with pinecone as a next step, but at least all my data is available in my make.com flow to pass along.

KG

knowledgeable_grimace

2 months ago

thanks, Jan ... I thought of doing something like this, too, as my originating data is also in Airtable. I just didn't want to have to maintain that metadata if there was a way for Apify to help

jindrich.bar avatar

Hello again,

after talking to my colleagues, they were only able to come up with "hacky" solutions like yours... but it's perhaps easier than what you're doing (but just a bit).

In the Actor input, you can pass an arbitrary JSON object - of course, it should contain the required input options, but you can also pass extra stuff there - in make, you can do this with the "Input JSON overrides" input option on the Run Task node (see the first attached image).

Later on (once the Apify webhook calls your other Make flow), you can read the Actor's input with the HTTP - Download File action. Here, you can make an API call to the key-value store at Apify and retrieve the INPUT.JSON file (containing both the Actor inputs and your passed custom metadata). See the second screenshot for details. You can parse this as JSON, get your values from there, and use it as you wish.

Unfortunately, we couldn't come up with anything more elegant... but we'll keep thinking about it.

If you feel like this answers your question, feel free to close this issue. Or, if you have any ideas on how we could make this more user-friendly - let us know! Cheers!

Developer
Maintained by Apify
Actor metrics
  • 2.8k monthly users
  • 317 stars
  • 100.0% runs succeeded
  • 4 days response time
  • Created in Mar 2023
  • Modified 1 day ago