Skip to content

Conversation

@alzarei
Copy link

@alzarei alzarei commented Sep 28, 2025

Modernize GoogleTextSearch connector with ITextSearch interface

Problem Statement

The GoogleTextSearch connector currently only implements the legacy ITextSearch interface, forcing users to use clause-based TextSearchFilter instead of modern type-safe LINQ expressions. This creates runtime errors from property name typos and lacks compile-time validation for Google search operations.

Technical Approach

This PR modernizes the GoogleTextSearch connector to implement the generic ITextSearch interface alongside the existing legacy interface. The implementation provides LINQ-to-Google-API conversion with support for equality, contains, NOT operations, FileFormat filtering, and compound AND expressions.

Implementation Details

Core Changes

  • Implement ITextSearch interface with full generic method support
  • Add LINQ expression analysis supporting equality, contains, NOT operations, and compound AND expressions
  • Map LINQ expressions to Google Custom Search API parameters (exactTerms, orTerms, excludeTerms, fileType, siteSearch)
  • Support advanced filtering patterns with type-safe property access

Property Mapping Strategy
The Google Custom Search API supports substantial filtering through predefined parameters:

  • exactTerms: Exact title/content match
  • siteSearch: Site/domain filtering
  • fileType: File extension filtering
  • excludeTerms: Negation filtering
  • Additional parameters: country restrict, language, date filtering

Code Examples

Before (Legacy Interface)

var options = new TextSearchOptions
{
    Filter = new TextSearchFilter().Equality("siteSearch", "microsoft.com")
};

After (Generic Interface)

// Simple filtering
var options = new TextSearchOptions<GoogleWebPage>
{
    Filter = page => page.DisplayLink.Contains("microsoft.com")
};

// Complex filtering
var complexOptions = new TextSearchOptions<GoogleWebPage>
{
    Filter = page => page.DisplayLink.Contains("microsoft.com") &&
                    page.Title.Contains("AI") &&
                    page.FileFormat == "pdf" &&
                    !page.Snippet.Contains("deprecated")
};

Implementation Benefits

Type Safety & Developer Experience

  • Compile-time validation of GoogleWebPage property access
  • IntelliSense support for all GoogleWebPage properties
  • Eliminates runtime errors from property name typos in filters

Enhanced Filtering Capabilities

  • Equality filtering: page.Property == "value"
  • Contains filtering: page.Property.Contains("text")
  • NOT operations: !page.Property.Contains("text")
  • FileFormat filtering: page.FileFormat == "pdf"
  • Compound AND expressions with multiple conditions

Validation Results

Build Verification

  • Command: dotnet build --configuration Release --interactive
  • Result: Build succeeded in 3451.8s (57.5 minutes) - all projects compiled successfully
  • Status: ✅ PASSED (0 errors, 0 warnings)

Test Results
Full Test Suite:

  • Passed: 7,177 (core functionality tests)
  • Failed: 2,421 (external API configuration issues)
  • Skipped: 31
  • Duration: 4 minutes 57 seconds

Core Unit Tests:

  • Semantic Kernel unit tests: 1,574/1,574 tests passed (100%)
  • Google Connector Tests: 29 tests passed (23 legacy + 6 generic)

Test Failure Analysis
The 2,421 test failures are infrastructure/configuration issues, not code defects:

  • Azure OpenAI API Configuration: Missing API keys for external service integration tests
  • AWS Bedrock Configuration: Integration tests requiring live AWS services
  • Docker Dependencies: Vector database containers not available in development environment
  • External Service Dependencies: Integration tests requiring live API services (Bing, Google, etc.)

These failures are expected in development environments without external API configurations.

Method Ambiguity Resolution
Fixed compilation issues when both legacy and generic interfaces are implemented:

// Before (ambiguous):
await textSearch.SearchAsync("query", new() { Top = 4, Skip = 0 });

// After (explicit):
await textSearch.SearchAsync("query", new TextSearchOptions { Top = 4, Skip = 0 });

Files Modified

dotnet/src/Plugins/Plugins.Web/Google/GoogleWebPage.cs (NEW)
dotnet/src/Plugins/Plugins.Web/Google/GoogleTextSearch.cs (MODIFIED)
dotnet/samples/Concepts/TextSearch/Google_TextSearch.cs (ENHANCED)
dotnet/samples/GettingStartedWithTextSearch/Step1_Web_Search.cs (FIXED)

Breaking Changes

None. All existing GoogleTextSearch functionality preserved. Method ambiguity issues resolved through explicit typing.

Multi-PR Context

This is PR 4 of 6 in the structured implementation approach for Issue #10456. This PR extends LINQ filtering support to the GoogleTextSearch connector, following the established pattern from BingTextSearch modernization.

@moonbox3 moonbox3 added the .NET Issue or Pull requests regarding .NET code label Sep 28, 2025
@alzarei alzarei marked this pull request as ready for review September 28, 2025 07:59
@alzarei alzarei requested a review from a team as a code owner September 28, 2025 07:59
@alzarei alzarei marked this pull request as draft October 3, 2025 07:58
@alzarei alzarei force-pushed the feature-text-search-linq-pr4 branch from 87cc4ba to 7576d94 Compare October 4, 2025 08:01
@alzarei alzarei marked this pull request as ready for review October 4, 2025 08:25
{
Top = 4,
Skip = 0,
Filter = page => page.FileFormat == "pdf" && page.Title != null && page.Title.Contains("AI") && page.Snippet != null && !page.Snippet.Contains("deprecated")
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some tests to verify the filter url that is created from the different linq expressions would be good.
I'm assuming these tests are just checking that the code doesn't fail, but doesn't actually verify the output filter query is correct?

#region ITextSearch<GoogleWebPage> Implementation

/// <inheritdoc/>
public async Task<KernelSearchResults<object>> GetSearchResultsAsync(string query, TextSearchOptions<GoogleWebPage>? searchOptions = null, CancellationToken cancellationToken = default)
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would it make sense to have the return type here be Task<KernelSearchResults<GoogleWebPage>>.
So on ITextSearch<TRecord> it would be Task<KernelSearchResults<TRecord>>

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@westey-m - Thank you for catching this! Correct observation about the return type.

Current Issue

Task<KernelSearchResults<object>> GetSearchResultsAsync(...)  // Forces casting

Fix

I'll create a separate PR to fix the interface (since we're targeting a feature branch), then rebase this PR onto it.

Plan:

  1. New PR: Update interface to return KernelSearchResults<TRecord>
  2. This PR: Rebase and implement with correct type
  3. Other PRs: Rebase onto interface fix

Benefits:

  • Eliminates casting overhead
  • Provides type safety and IntelliSense
  • Keeps interface change isolated for clean rebases

After fix:

// Before
KernelSearchResults<object> results = await search.GetSearchResultsAsync(...);
var page = (GoogleWebPage)results.Results.First();

// After
KernelSearchResults<GoogleWebPage> results = await search.GetSearchResultsAsync(...);
var page = results.Results.First();

Will create the interface fix PR first.


Thoughts?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

PR created: #13318

Copy link
Contributor

@westey-m westey-m Oct 31, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm sure it's fine to check in this one in first as well, and then fix after, just as you wish. I'll check out the other PR as well anyway.

…c interface tests

- Fix CS0121 compilation errors by explicitly specifying TextSearchOptions instead of new()
- Add 3 comprehensive tests for ITextSearch<GoogleWebPage> generic interface:
  * GenericSearchAsyncReturnsSuccessfullyAsync
  * GenericGetTextSearchResultsReturnsSuccessfullyAsync
  * GenericGetSearchResultsReturnsSuccessfullyAsync
- All 22 Google tests now pass (19 legacy + 3 generic)
- Validates both backward compatibility and new type-safe functionality
- Add Contains() operation support for string properties (Title, Snippet, Link)
- Implement intelligent mapping: Contains() -> orTerms for flexible matching
- Add 2 new test methods to validate LINQ filtering with Contains and equality
- Fix method ambiguity (CS0121) in GoogleTextSearchTests by using explicit TextSearchOptions types
- Fix method ambiguity in Google_TextSearch.cs sample by specifying explicit option types
- Enhance error messages with clear guidance on supported LINQ patterns and properties

This enhancement extends the basic LINQ filtering (equality only) to include
string Contains operations, providing more natural and flexible filtering
patterns while staying within Google Custom Search API capabilities.

All tests passing: 25/25 Google tests (22 existing + 3 new)
- Add ITextSearch<GoogleWebPage> interface implementation
- Support equality, contains, NOT operations, and compound AND expressions
- Map LINQ expressions to Google Custom Search API parameters
- Add GoogleWebPage strongly-typed model for search results
- Support FileFormat filtering via Google's fileType parameter
- Add comprehensive test coverage (29 tests) for all filtering patterns
- Include practical examples demonstrating enhanced filtering capabilities
- Maintain backward compatibility with existing ITextSearch interface

Resolves enhanced LINQ filtering requirements for Google Text Search plugin.
- Add UsingGoogleTextSearchWithEnhancedLinqFilteringAsync method to Google_TextSearch.cs
  * Demonstrates 6 practical LINQ filtering patterns
  * Includes equality, contains, NOT operations, FileFormat, compound AND examples
  * Shows real-world usage of ITextSearch<GoogleWebPage> interface

- Fix method ambiguity in Step1_Web_Search.cs
  * Explicitly specify TextSearchOptions type instead of target-typed new()
  * Resolves CS0121 compilation error when both legacy and generic interfaces implemented
  * Maintains tutorial clarity for getting started guide

These enhancements complete the sample code demonstrating the new LINQ filtering
capabilities while ensuring all existing tutorials continue to compile correctly.
1. Added LINQ Filter Verification Tests
- Added 7 test methods verifying LINQ expressions produce correct Google API URLs
- Tests cover equality, contains, inequality, compound filters with URL validation
- Expanded test suite from 29 to 36 tests (all passing)
- Addresses reviewer comment: 'Some tests to verify the filter url that is created from the different linq expressions would be good'

2. Fixed Documentation Standards
- Updated all property summaries in GoogleWebPage.cs to use 'Gets or sets the' convention
- Applied to all 11 properties following .NET documentation standards
- Addresses reviewer comment: 'These property summaries should start with Gets or sets the to conform to the documentation standard'

3. Performance Optimization
- Added static readonly s_supportedPatterns array to avoid allocations in error paths
- Moved error messages from inline array allocation to static field
- Addresses reviewer comment: 'Consider making this a static field on the class. No need to allocate a new array of strings for each failed invocation'

4. Code Consolidation
- Extracted shared LINQ processing logic into helper methods
- Eliminated duplication between ConvertLinqExpressionToGoogleFilter and CollectAndCombineFilters
- Applied DRY principles throughout LINQ expression processing
- Addresses reviewer comment: 'This code seems very similar to that in CollectAndCombineFilters. Can this be consolidated?'

Validation:
- Build: Release configuration successful
- Tests: 36/36 passing
- Format: dotnet format compliance verified
- Regression: All existing functionality preserved

Note: API design question about return type consistency deferred for architectural discussion
…ions

Enhance GoogleTextSearch LINQ filter processing to explicitly detect and reject collection Contains operations (e.g., array.Contains(page.Property)) with a clear, actionable error message.

Changes:
- Added IsMemoryExtensionsContains helper method to detect C# 14+ span-based Contains resolution
- Enhanced TryProcessSingleExpression to distinguish between:
  * String.Contains (supported for substring search)
  * Enumerable.Contains / MemoryExtensions.Contains (not supported - now explicitly rejected)
- Throws NotSupportedException with guidance on alternatives when collection Contains is detected
- Handles both C# 13- (Enumerable.Contains) and C# 14+ (MemoryExtensions.Contains) resolution paths

Test Coverage:
- Added CollectionContainsFilterThrowsNotSupportedExceptionAsync test
- Verifies exception is thrown for collection Contains operations
- Validates error message contains actionable guidance about OR logic limitations
- Ensures consistent behavior across C# language versions (13 vs 14 Contains resolution)

Rationale:
Google Custom Search API does not support OR logic across multiple values. Collection Contains filters would require OR semantics that cannot be expressed via Google's query parameters. This change provides clear, early feedback to developers attempting to use unsupported filter patterns.

Related to reviewer feedback on LINQ expression filter validation and C# 14 compatibility. Fixes microsoft#10456
@alzarei alzarei force-pushed the feature-text-search-linq-pr4 branch from 9466ecb to 96a241b Compare October 31, 2025 06:20
@alzarei
Copy link
Author

alzarei commented Oct 31, 2025

Rebased on top of feature branch to resolve conflicts. Ready for review.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

.NET Issue or Pull requests regarding .NET code

Projects

None yet

Development

Successfully merging this pull request may close these issues.

3 participants