Skip to content

Clarify how document data is shared across fetch subphases. #62511

Description

@jtibshirani

Several fetch subphases make use of the same data. For example, fields retrieval and highlight both use the _source, and docvalue_fields and script_fields can both use field data. As much as possible, the fetch phase tries to share this data across the subphases to avoid reloading it.

Although it's important to fetch performance, the sharing logic is complex and not tested:

  • For _source, we load it once at the beginning of the fetch phase and share it through HitContext#sourceLookup.
  • The field data cache is shared through a SearchLookup passed to each subphase. (Note that this lookup is created through QueryShardContext#newFetchLookup, which is a bit confusing).
  • Some script-based subphases like script_fields use separate _source and field data from QueryShardContext#lookup.

It would be great to refactor to make the strategy clear + robust. Some functionality we should make sure to preserve:

Metadata

Metadata

Assignees

Type

No type

Fields

No fields configured for issues without a type.

Projects

No projects

Milestone

No milestone

Relationships

None yet

Development

No branches or pull requests

Issue actions