Several fetch subphases make use of the same data. For example, fields retrieval and highlight both use the _source, and docvalue_fields and script_fields can both use field data. As much as possible, the fetch phase tries to share this data across the subphases to avoid reloading it.
Although it's important to fetch performance, the sharing logic is complex and not tested:
- For _source, we load it once at the beginning of the fetch phase and share it through
HitContext#sourceLookup.
- The field data cache is shared through a
SearchLookup passed to each subphase. (Note that this lookup is created through QueryShardContext#newFetchLookup, which is a bit confusing).
- Some script-based subphases like
script_fields use separate _source and field data from QueryShardContext#lookup.
It would be great to refactor to make the strategy clear + robust. Some functionality we should make sure to preserve:
Several fetch subphases make use of the same data. For example,
fieldsretrieval andhighlightboth use the _source, anddocvalue_fieldsandscript_fieldscan both use field data. As much as possible, the fetch phase tries to share this data across the subphases to avoid reloading it.Although it's important to fetch performance, the sharing logic is complex and not tested:
HitContext#sourceLookup.SearchLookuppassed to each subphase. (Note that this lookup is created throughQueryShardContext#newFetchLookup, which is a bit confusing).script_fieldsuse separate _source and field data fromQueryShardContext#lookup.It would be great to refactor to make the strategy clear + robust. Some functionality we should make sure to preserve:
inner_hits. (Avoid reloading _source for every inner hit. #60494)