Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

salesforce update query function #1092

Open
wants to merge 18 commits into
base: main
Choose a base branch
from
Open

salesforce update query function #1092

wants to merge 18 commits into from

Conversation

mtuchi
Copy link
Collaborator

@mtuchi mtuchi commented Mar 21, 2025

Summary

Refactor query function in salesforce adaptor to use the latest options in connection.query()

Fixes #1030

Details

  • Update query function to use thew new jsforce v3 connection. The new connection.query() support query options for fetching more data we reached query limit hence removing the current implementation of autoFetch
  • Update query options documentation link. Newly supported options are [maxFetch, headers, responseTarget, scanAll]

AI Usage

Please disclose how you've used AI in this work (it's cool, we just want to
know!):

  • Code generation (copilot but not intellisense)
  • Learning or fact checking
  • Strategy / design
  • Optimisation / refactoring
  • Translation / spellchecking / doc gen
  • Other
  • I have not used AI

You can read more details in our
Responsible AI Policy

Review Checklist

Before merging, the reviewer should check the following items:

  • Does the PR do what it claims to do?
  • NA: If this is a new adaptor, added the adaptor on marketing website ?
  • NA: If this PR includes breaking changes, do we need to update any jobs in
    production? Is it safe to release?
  • Are there any unit tests?
  • Is there a changeset associated with this PR? Should there be? Note that
    dev only changes don't need a changeset.
  • Have you ticked a box under AI Usage?

@github-project-automation github-project-automation bot moved this to New Issues in v2 Mar 21, 2025
@mtuchi
Copy link
Collaborator Author

mtuchi commented Mar 21, 2025

I will wait for #1070 to be merged so that in can fix unit test for query function

@mtuchi
Copy link
Collaborator Author

mtuchi commented Mar 25, 2025

This is almost done, i just need to re-think the query options documentation. Because by default autoFetch: false and maxFetch: 10000 and jsdoc doesn't have good documentation. They have note section that says When maxFetch option is not set, the default value (10,000) is applied. See docs here.

I would like to point users to this query opts docs to see supported options, but it doesn't say the default values for each options

@mtuchi mtuchi marked this pull request as ready for review March 25, 2025 14:53
@mtuchi mtuchi requested a review from hunterachieng March 25, 2025 14:54
@mtuchi
Copy link
Collaborator Author

mtuchi commented Mar 25, 2025

Hiya @hunterachieng please have a look at changes when you got time

mtuchi and others added 3 commits March 31, 2025 14:14
* update jsforce to v3

* add changeset
* wip refactor create connection

* fix unit test

* wip fix intergration test

* remove log

* revert use of flatten function

remove removeConnection function

* improve basicAuth and tokenAuth

moved createConnection to Adaptor.js

* setup mock connection

* remove .skip and .only

* cleanup

* remove logLevel

* remove default configuration

* add changeset

* use checkConnection in basicAuth and tokenAuth only

* move basicAuth, tokenAuth and checkConnection to Adaptor.js

* remove array in execute() function

* remove flatten from lodash

* remove lodash package

* update connect function
* wip refactor create connection

* fix unit test

* wip fix intergration test

* remove log

* revert use of flatten function

remove removeConnection function

* improve basicAuth and tokenAuth

moved createConnection to Adaptor.js

* setup mock connection

* remove .skip and .only

* cleanup

* remove logLevel

* remove default configuration

* add changeset

* use checkConnection in basicAuth and tokenAuth only

* move basicAuth, tokenAuth and checkConnection to Adaptor.js

* remove array in execute() function

* remove flatten from lodash

* remove lodash package

* move http function to http.js

* build docs

* update unit test

* use request helper methods

* add unit test for http

* add integration test for http

* improve logs

* add test for query params

* add unit test for http.post

* add salesforceRequest function

use this function in http.js

* update test

* add changeset

* remove json option in FullRequestOption

* remove sinon

* remove sinon package

* move request into http

---------

Co-authored-by: Joe Clark <[email protected]>
@mtuchi mtuchi force-pushed the 1030-sf-query branch 2 times, most recently from c7e3fc4 to 7b4c24c Compare March 31, 2025 11:55
@mtuchi mtuchi requested a review from josephjclark March 31, 2025 11:58
console.log(`Executing query: ${resolvedQuery}`);
const autoFetch = resolvedOptions.autoFetch || resolvedOptions.autofetch;
const { autoFetch, maxFetch } = resolvedOptions;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is autoFetch actually doing anything now?

Presumably if you pass maxFetch: 10m, it'll just download 10m records regardless of whether autofetch is enabled? This, by the way, is the correct pagination behaviour. Ask for a million records and you should be given a million records.

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Okay that make sense, but i wonder if we need a better way of querying for more results. Because making the query by default autoPage make sense but you still have limitation of 10000 max records by default, if you need more you just need to adjust maxFetch which still is not a guarantee if it will return all results

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think autoFetch is still useful for when user wants to get the default maximum number of results and get the nextRecordsUrl incase they need to use it in their job code

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

But autofetch isn't doing anything right now. So if you explicitly want all records, what do you do?

Also we haven't documented the 10k default? It's probably worth us duplicating documentation on options we think are going to be super common. Like maxFetch right now (which in other adaptors we call limit, so maybe we should think about that) is basically essential to large queries

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

autoFetch allow you to fetch max of 10k records by default now. If it's not enabled it will return 2k max of results.
Regarding documentation i am happy to create our own typedef for query options but i don't think we need to remap these options to match other adaptor. I think they just need a clear description of what they do

console.log('total in database : ' + totalSize);
console.log('total fetched : ' + totalFetched);
}
const allResults = { done, records, totalSize, totalFetched };
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this still the correct data structure to return?

  • done doesn't seem very useful. Especially now that we're using jsforce for pagination
  • totalFetched === records.length so isn't adding a lot
  • I am not sure I see the value of totalSize. Is it actually needed by anyone for anything?

Copy link
Collaborator

@josephjclark josephjclark Mar 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm not necessarily asking for a change here, just sounding out the question

Copy link
Collaborator Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If autoFetch: false and we have 8000 records in the database

done: false,
totalFetched: 2000, 
records: [{}] // 2000 objects
totalSize: 8000,
nextRecordsUrl: 'url-for-next-batch'

If autoFetch: true

done: true,
totalFetched: 8000, 
records: [] // 8000 records
totalSize: 8000,

I can remove totalFetched and leave the log only that says total fetched : records.length. I thought it's useful because query has a limitation of 2000 records if autFetch: false and 10000 max records if autoFetch: true and maxFetch value is not specified

Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Ok. Let me go back and remind myself about this data structure - I seem to remember talking about it before.

My instinct is that we should just have an API that says "get me all records" or "get me this specific set of records", and nothing else. And in that API you'd just have a flat state.data results array.

I don't know if we've talked about this, but if we were to expose the jsforce object directly, we'd have a lot of license to have more opinionated APIs for 90% of use-case, and point users to the client for the edge-cases that aren't covered by by this.

I suppose that depends on how common it is to use nextRecordsUrl

@josephjclark
Copy link
Collaborator

@mtuchi how about this:

We make the signature query(q, { limit }). We always autoFetch, and we explicitly default the limit to 10k. Pass limit: false or limit:Infinity to download everything.

No other jsforce options are supported.

We return a result array to state.data.

We return the { done, totalSize, records, nextRecordsUrl } object returned by jsforce to state.response

Users wanting to implement their own pagination need to do http.get($.response.nextRecordsUrl). I don't really want them to do this (because it's too hard in openfn). They should be using a cursor to query for the correct range of data, and processing that entire range in the step.

This is not a jsforce query wrapper. This is a salesforce query operation.

This would be a breaking change and should be released in the next major (not 6), along with the bulk API changes.

@mtuchi mtuchi mentioned this pull request Apr 2, 2025
Base automatically changed from epic/salesforce to main April 2, 2025 08:18
@mtuchi mtuchi self-assigned this Apr 3, 2025
@mtuchi mtuchi linked an issue Apr 3, 2025 that may be closed by this pull request
@mtuchi
Copy link
Collaborator Author

mtuchi commented Apr 3, 2025

Okay got you @josephjclark, I will work on your suggestion and fix the conflicts in the next sprint

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
Status: New Issues
Development

Successfully merging this pull request may close these issues.

salesforce refactor query function
3 participants