Followup to #62511
The various Fetch subphases need to access a number of document-specific values: stored fields, docvalue fields, _source, script outputs. This is currently done via a mix of access to a SearchLookup object (for docvalue fields, _source and script) and a special fields visitor for stored fields. The stored fields logic also handles loading source, and is dealt with specially in FetchPhase. Nested documents also handle loading source differently, and some of this logic is in FetchPhase. We then have value fetchers, which are built by passing a QueryShardContext (which refers to a SearchLookup) and then used by passing a SourceLookup. This all makes things very difficult to reason about, as responsibility for loading data, caching across multiple calls, and advancing to new documents is split across several different classes.
I'd like to try consolidating all access to document values to be through SearchLookup.
- StoredFieldsLookup should be able to load more than one field at a time (currently we do everything via a SingleFieldVisitor) and should make use of the fast stored fields reader where possible
- SourceLookup should use StoredFieldsLookup, again to reduce the number of calls to IndexReader#document
- SourceLookup should be aware of nested mappings and automatically return the correct source when positioned on nested or root documents
- Adding stored fields to a search hit should be done via a dedicated fetch sub phase that reads data from SearchLookup
- Positioning of the search lookup should be done by whatever owns it - probably the root collector in the query phase, or the FetchPhase main loop in the fetch phase.
- ValueFetchers should load data via a passed-in SearchLookup and not have to worry about settings reader contexts.
I think organising things in this way will make reasoning about how data is fetched and which classes have responsibility for what much easier.
Followup to #62511
The various Fetch subphases need to access a number of document-specific values: stored fields, docvalue fields, _source, script outputs. This is currently done via a mix of access to a SearchLookup object (for docvalue fields, _source and script) and a special fields visitor for stored fields. The stored fields logic also handles loading source, and is dealt with specially in FetchPhase. Nested documents also handle loading source differently, and some of this logic is in FetchPhase. We then have value fetchers, which are built by passing a QueryShardContext (which refers to a SearchLookup) and then used by passing a SourceLookup. This all makes things very difficult to reason about, as responsibility for loading data, caching across multiple calls, and advancing to new documents is split across several different classes.
I'd like to try consolidating all access to document values to be through SearchLookup.
I think organising things in this way will make reasoning about how data is fetched and which classes have responsibility for what much easier.