-
Notifications
You must be signed in to change notification settings - Fork 230
feat(compass-collection): Process schema into format for LLM submission for Mock Data Generator – CLOUDP-337090 #7205
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
… CLOUDP-337090_v2
packages/compass-collection/src/transform-schema-to-field-info.spec.ts
Outdated
Show resolved
Hide resolved
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Copilot encountered an error and was unable to review this pull request. You can try again by re-requesting a review.
|
||
const result = processSchema(schema); | ||
|
||
expect(result).to.deep.equal({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah this is a good example of how the dot notation combined with the constraints we've placed simplifies the parsing and surface area for the LLM to write to
{ | ||
name: 'Array', | ||
bsonType: 'Array', | ||
path: ['cube'], |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
To confirm my understanding, the path stays as ['cube'] because the named field/path captures arrays-within-arrays at all levels (and until there's a document)?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Yes, from my understanding
|
||
const result = processSchema(schema); | ||
|
||
expect(result).to.deep.equal({ |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
good example here and below 👍🏼
fieldProbability?: number, | ||
arraySampleValues?: unknown[] | ||
): void { | ||
if (type.name === 'Array' || type.bsonType === 'Array') { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should use the type-predicate validators like isArraySchemaType
in compass-schema/src/components/field.tsx
Type guards will give you the branch in a program enough type information to prevent casts like type as ArraySchemaType
See https://www.typescriptlang.org/docs/handbook/2/narrowing.html#using-type-predicates
|
||
const arrayPath = `${currentPath}[]`; | ||
const sampleValues = | ||
arraySampleValues || getSampleValues(arrayType).slice(0, 3); // Limit full-context array sample values to 3 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: instead of the magic 3
, use a constant
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
had some minor feedback; lgtm overall
@@ -30,9 +29,16 @@ export type SchemaAnalysisErrorState = { | |||
error: SchemaAnalysisError; | |||
}; | |||
|
|||
export interface FieldInfo { | |||
type: string; // MongoDB type (eg. String, Double, Array, Document) |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
could type
be a string union type?
@@ -66,7 +66,8 @@ | |||
"react": "^17.0.2", | |||
"react-redux": "^8.1.3", | |||
"redux": "^4.2.1", | |||
"redux-thunk": "^2.4.2" | |||
"redux-thunk": "^2.4.2", | |||
"bson": "^6.10.1" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
since bson
is only used for its types, it should be a dev dependency so it doesn't get bundled to prod
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The dependency checker was still complaining even with using import type
.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I tried adding 'bson'
to the ignores
in packages/compass-collection/.depcheckrc
. But not sure
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
|
||
function isPrimitiveSchemaType(type: SchemaType): type is PrimitiveSchemaType { | ||
return ( | ||
!isConstantSchemaType(type) && |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
[nit] null
and undefined
classify as primitives too
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
weird, let's go with consistency then. I disagree with that source's typing but it may be working around some type issue
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Left a comment on where this helper lands. Not really a blocker. Approving incase this blocks other work, but it would be nice to have this in the shared spot if possible and y'all think its worth having there.
* Transforms a raw mongodb-schema Schema into a flat Record<string, FieldInfo> | ||
* using dot notation for nested fields and bracket notation for arrays. | ||
*/ | ||
export function processSchema(schema: Schema): Record<string, FieldInfo> { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Should we add this to the https://github.com/mongodb-js/mongodb-schema package?
That way places like the VSCode extension, and other parts of Compass can use it more easily.
Looks like there's already a getSchemaPaths
we have on the schema analysis that we're using in Compass' import-export already. It's similar but not the same.
https://github.com/mongodb-js/mongodb-schema/blob/6f22cb72aaaf97a0bbbb6ae92a4fdadc4290db67/src/schema-analyzer.ts#L640
In import-export:
async function _gatherFields({ |
Maybe we add some options to the getSchemaPaths
or have it as a separate function on the SchemaAnalyzer
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
That's a good idea. How does this sound: if/when this code could be re-used effectively, we can consider whether moving it to the https://github.com/mongodb-js/mongodb-schema package would be good. (Note also that we are releasing this feature as an experiment (A/B test) that we may iterate on)
Description
Converts MongoDB raw schema analysis object into a flat, LLM-friendly format for the Mock Data Generator feature.
Notation examples:
Checklist
Motivation and Context
The existing mongodb-schema structure is overly verbose for our feature and contains nested structures that are both difficult for LLMs to parse and do not correspond to the requirements of LLM structured outputs (eg. no optional fields).
Types of changes