Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Return plan for extraction #389

Open
danthegoodman1 opened this issue Jan 9, 2025 · 9 comments
Open

Return plan for extraction #389

danthegoodman1 opened this issue Jan 9, 2025 · 9 comments
Assignees
Labels
extract These changes pertain to the extract function

Comments

@danthegoodman1
Copy link

The extract functionality is really cool, but it would be even cooler if it could export the static code/parsing that it did so it can be reused across many instances without wastefully invoking the LLM again.

For example in the provided loom video where you extract the description of the Github repo, but had to do 100 more, it would be wasteful to have the AI do that 100 times, when it could figure out how to do it once and then the developer can have that action repeated on many pages.

@kamath kamath added the extract These changes pertain to the extract function label Jan 9, 2025
@kamath kamath added this to Stagehand Jan 9, 2025
@kamath
Copy link
Contributor

kamath commented Jan 9, 2025

This is really interesting; definitely something we'll take a look at. Adding @seanmcguire12 here, who's spent considerable time in making extract better

@drewbaker
Copy link

Would be nice if there was some sort of fallback, so first time the LLM is used, second time it uses the pre-determined path, but if that failed then it tried the LLM again. That would protect against DOM changes breaking the predetermined path.

@seanmcguire12
Copy link
Collaborator

@danthegoodman1 Thank you for adding this issue! I think you bring up a great point, & we've been chatting about this internally as of late. I think the tough part about this is finding a way to balance resilience to website changes with the cost of re-processing the DOM with every extract call.

One possible approach is hashing the state of the DOM, and then re-hashing it & comparing it with the most recent hash to detect whether it has changed... If it has changed, then we proceed with re-processing the DOM from scratch as usual.

Afaik, the core challenge of this approach is generalizing a hashing function that will be specifically focused on the parts of the DOM that are relevant to the extraction task.

e.g., if a webpage has elements a, b, c, x, y, z, and you you need to extract data from elements x, y, and z, we dont really care if a, b, and c change... but how do we build a hashing function such that the hash will only change if the relevant elements/candidate elements change?

Again, thanks for calling this out -- reducing cost & increasing accuracy are two things that are very high on our priority list right now.

Would love to hear any additional thoughts/opinions you may have on tackling this issue

@seanmcguire12
Copy link
Collaborator

@drewbaker thank you for your comment!! To echo what I mentioned above: definitely keen on the idea reuse and reducing calls to an LLM. Can you explain more on what you mean by a pre-determined path here? Dynamically evaluating whether extraction has failed is definitely challenging to generalize beyond a specific website where expected values are known... i.e., how do we know when we should re-process the DOM/feed it back into the LLM?

@drewbaker
Copy link

Can you explain more on what you mean by a pre-determined path here?

I mean the “the static code/parsing ” that OP is referring to.

Perhaps the “has failed” state can be determined by a function. For example, our use case is to login and download a statement and return that statement as JSON. So if the function fails to return JSON of a certain shape, then we know that it failed.

For example:

extract("find the price of pad thai", (result)=>{
// if return is true, then extract is deemed a success
return result.includes(“$”)
})

@filip-michalsky
Copy link
Collaborator

filip-michalsky commented Jan 9, 2025

@drewbaker do you suggest that a user of stagehand would themselves define what "a failed state of extract" means for a particular use case?

@danthegoodman1
Copy link
Author

Would be nice if there was some sort of fallback, so first time the LLM is used, second time it uses the pre-determined path, but if that failed then it tried the LLM again. That would protect against DOM changes breaking the predetermined path.

This is what I would expect. Naive version might be giving it methods storeProceedure and fetchProceedure so we can integrate it with an existing DB (and thus we can use it across many instances as well)

@danthegoodman1
Copy link
Author

Or they could operate independently, but if you have 1000 instances running that could get pretty chaotic, but maybe that's a separate problem domain

@drewbaker
Copy link

drewbaker commented Jan 10, 2025

@drewbaker do you suggest that a user of stagehand would themselves define what "a failed state of extract" means for a particular use case?

Yes, in code I'd write something like this:

extract("find the price of pad thai", (result)=>{
    // if this returns true, then extract is deemed a success.
    return result.includes(“$”)
})

An alternative idea would be to have a new function that get returns the non-LLM recipe that the LLM found.

const recipe = recipe("find the price of pad thai", (result)=>{
    // if this returns true, then extract is deemed a success.
    return result.includes(“$”)
})

And the returned result would be a Playwright generated function that could be feed into extract the next time?

const successCondition = (result)=> {
    // if this returns true, then extract is deemed a success.
    return result.includes(“$”)
}

const recipe = recipe("find the price of pad thai", {
    successCondition
})

const data = extract("find the price of pad thai", {
    recipe,
    successCondition    
})

Then extract would first try the recipe, and if it fails, it would use the LLM.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
extract These changes pertain to the extract function
Projects
Status: No status
Development

No branches or pull requests

5 participants