How to wait for Azure Search to finish indexing document? For integration testing purpose

azure search highlight
azure search query examples
azure search example c#
azure search query examples c#
azure search documentation
azure cognitive search
azure search index
azure search create index
Scenario

I'm building a suite of automated integration tests. Each test push data into the Azure Search index before querying it and verifying the expected results.

Problem

The indexation happens asynchronously in the service and data aren't immediatly available after the indexing call returns successfully. The test execute of course too rapidly most of the time.

What I've tried

I've tried querying the document until it's found:

// Wait for the indexed document to become available
while (await _index.Documents.SearchAsync("DocumentId").Results.SingleOrDefault() == null) { }

But oddly enough, a search query just behind won't generally find anything:

// Works 10% of the time, even after the above loop
await _index.Documents.SearchAsync(query.Text);

Using an arbitrary pause works, but it's not guaranteed and I'd like the tests to execute as fast as possible.

Thread.Sleep(3000);

Azure Search documentation:

Finally, the code in the example above delays for two seconds. Indexing happens asynchronously in your Azure Search service, so the sample application needs to wait a short time to ensure that the documents are available for searching. Delays like this are typically only necessary in demos, tests, and sample applications.

Aren't there any solution without scarifying tests performance?

If your service has multiple search units, there is no way to determine when a document has been fully indexed. This is a deliberate decision to favor increased indexing/query performance over strong consistency guarantees.

If you're running tests against a single unit search service, the approach (keep checking for document existence with a query rather than a lookup) should work.

Note that on a free tier search service this will not work as it's hosted on multiple shared resources and does not count as a single unit. You'll see the same brief inconsistency that you would with a dedicated multi-unit service

Otherwise, one possible improvement would be to use retries along with a smaller sleep time.

How to wait for Azure Search to finish indexing document , Each test push data into the Azure Search index before querying it and verifying the expected results. For integration testing purpose. 11月. 03. 2019. By 月下凤仪亭 Wait for the indexed document to become available while (await _index. Currently, we encountered an issue that when update a index document, we have to wait a few seconds before we could search out the updated result correctly (the IndexAsync api is asynchronously). This will introduce a concurrent issue when update the same index document frequently.

The other answer by @HeatherNakama was very helpful. I want to add to it, but first a paraphrased summary:

There is no way to reliably know a document is ready to be searched on all replicas, so the only way a spin-lock waiting until a document is found could work is to use a single-replica search service. (Note: the free tier search service is not single-replica, and you can't do anything about that.)

With that in mind, I've created a sample repository with Azure Search integration tests that roughly works like this:

private readonly ISearchIndexClient _searchIndexClient;

private void WaitForIndexing(string id)
{
    // For the free tier, or a service with multiple replicas, resort to this:
    // Thread.Sleep(2000);

    var wait = 25;

    while (wait <= 2000)
    {
        Thread.Sleep(wait);
        var result = fixture.SearchService.FilterForId(id);
        if (result.Result.Results.Count == 1) return;
        if (result.Result.Results.Count > 1) throw new Exception("Unexpected results");
        wait *= 2;
    }

    throw new Exception("Found nothing after waiting a while");
}

public async Task<DocumentSearchResult<PersonDto>> FilterForId(string id)
{
    if (string.IsNullOrWhiteSpace(id) || !Guid.TryParse(id, out var _))
    {
        throw new ArgumentException("Can only filter for guid-like strings", nameof(id));
    }

    var parameters = new SearchParameters
    {
        Top = 2, // We expect only one, but return max 2 so we can double check for errors
        Skip = 0,
        Facets = new string[] { },
        HighlightFields = new string[] { },
        Filter = $"id eq '{id}'",
        OrderBy = new[] { "search.score() desc", "registeredAtUtc desc" },
    };

    var result = await _searchIndexClient.Documents.SearchAsync<PersonDto>("*", parameters);

    if (result.Results.Count > 1)
    {
        throw new Exception($"Search filtering for id '{id}' unexpectedly returned more than 1 result. Are you sure you searched for an ID, and that it is unique?");
    }

    return result;
}

This might be used like this:

[SerializePropertyNamesAsCamelCase]
public class PersonDto
{
    [Key] [IsFilterable] [IsSearchable]
    public string Id { get; set; } = Guid.NewGuid().ToString();

    [IsSortable] [IsSearchable]
    public string Email { get; set; }

    [IsSortable]
    public DateTimeOffset? RegisteredAtUtc { get; set; }
}
[Theory]
[InlineData(0)]
[InlineData(1)]
[InlineData(2)]
[InlineData(3)]
[InlineData(5)]
[InlineData(10)]
public async Task Can_index_and_then_find_person_many_times_in_a_row(int count)
{
    await fixture.SearchService.RecreateIndex();

    for (int i = 0; i < count; i++)
    {
        var guid = Guid.NewGuid().ToString().Replace("-", "");
        var dto = new PersonDto { Email = $"{guid}@example.org" };
        await fixture.SearchService.IndexAsync(dto);

        WaitForIndexing(dto);

        var searchResult = await fixture.SearchService.Search(dto.Id);

        Assert.Single(searchResult.Results, p => p.Document.Id == dto.Id);
    }
}

I have tested and confirmed that this reliably stays green on a Basic tier search service with 1 replica, and intermittently becomes red on the free tier.

[QUERY] How to wait for Azure Search to finish indexing document , Query/Question Currently, we encountered an issue that when update a index document, we have to wait a few seconds before we could� Tutorial: Optimize indexing with the push API. 05/05/2020; 12 minutes to read; In this article. Azure Cognitive Search supports two basic approaches for importing data into a search index: pushing your data into the index programmatically, or pointing an Azure Cognitive Search indexer at a supported data source to pull in the data.

Use a FluentWaitDriver or similar component to wait in tests, if waiting is needed only for tests. I wouldn't pollute the app with thread delays. Azure indexer will have a few acceptable milliseconds-seconds delay, provided the nature of your search instance.

Use Microsoft.Azure.Search (v10) in .NET, Search.Documents from the Azure SDK team. If you have existing or inflight to query or update documents in your indexes, use the Microsoft.Azure.Search. to wait a short time to ensure that the documents are available for searching. Delays like this are typically only necessary in demos, tests, and sample applications. This tutorial uses Azure Cognitive Search for indexing and queries, Cognitive Services on the backend for AI enrichment, and Azure Blob storage to provide the data. This tutorial stays under the free allocation of 20 transactions per indexer per day on Cognitive Services, so the only services you need to create are search and storage.

Rebuild a search index, How to rebuild an index in Azure Cognitive Search For push-mode indexing, call Add, Update or Delete Documents to push the would run finished code to publish a revised index on your original Azure Cognitive Search service. For broader testing, you should wait until the index is fully loaded, and� 3 How to wait for Azure Search to finish indexing document? For integration testing purpose Oct 18 '16. 3 How to access the viewstate of a programmatically added

Elasticsearch Resiliency Status, We put a tremendous amount of effort into testing Elasticsearch to simulate An optimization which disabled the existence check for documents indexed with the primary needs to wait for the master to acknowledge replica shard failures do in search, which ensures transparency in the consistency of write operations. Azure Cognitive Search does not run indexing tasks in the background. If your service handles query and indexing workloads concurrently, take this into account by either introducing indexing jobs into your query tests, or by exploring options for running indexing jobs during off peak hours.

Strapi full text search, Full Text Search Testing and Debugging Text Search 12. Apr 04, 2018 � Full- Text Search in SQL Server and Azure SQL Database lets the Name field; Click Finish; Click the Save button and wait for Strapi to restart # 4. Its capabilities include document indexing, full text search, archiving and retention, integration, and� Azure Cognitive Search is available in combinable search units that include reliable storage and throughput to set up and scale a cloud search experience quickly and cost-effectively. Add search units to increase queries per second, to enable high availability, or for faster data ingestion.

Comments
  • Additionally, consider changing your tests to have a common set of data that they depend upon that gets indexed in a test class initialization section
  • I'm using the free pricing tier, where resources are shared. I suppose this count as a single unit? Is it possible that, just after indexing, a query by document key will find the document but not a more complex query using regular fields?
  • @Sean Saleh, Yes, I refactored the tests to populate the index once by test run.
  • @Clément, if it's a free service, it's hosted on multiple shared resources and does not count as a single unit. You'll see the same brief inconsistency that you would with a dedicated multi-unit service.
  • And while it is possible for the first query by document key to succeed and the second query by keyword to fail, it's not a function of the complexity of the query. It's because there is a brief window of inconsistency while the indexing operation completes across all replicas.