Fix overflow/underflow in CompositeBytesReference#78893
Conversation
Today you can construct a `CompositeBytesReference` comprising one or more empty components, but the resulting object is somewhat broken since `getOffsetIndex` won't always return the index of the component containing the index. You can also compose a sequence of bytes that is longer than `Integer.MAX_VALUE` which also yields a pretty broken object (`offsets` is no longer sorted and `length` may be negative). This commit fixes these two corner cases.
|
Pinging @elastic/es-core-infra (Team:Core/Infra) |
DaveCTurner
left a comment
There was a problem hiding this comment.
There's a bunch of ways to fix this issue - this is kinda the most obvious but alternatively:
- we could just forbid it and rely on the caller only adding nonempty refs to the array (but a gap in testing might mean we don't discover this until too late)
- we could actually just shuffle the empty ones to the end of the array since we never call
getOffsetIndexto find anything there (but this is kinda nonobvious and makes the invariants of the class more complicated, also is it ok to shuffle the input array or should we work on a copy?)
It looks fine in all call sites today FWIW, maybe this is fine. |
original-brownbear
left a comment
There was a problem hiding this comment.
LGTM, I think this is a fine solution in terms of being defensive.
I could only find some cases in the repository tests where we'd actually compose these objects containing empty references and I think we're fine today in production code. Still the change seems like a reasonable safe-guard :)
Also, merging this would technically allow for an interesting performance improvement in org.elasticsearch.common.bytes.BytesReferenceStreamInput#moveToNextSlice where we could drop the slice.length check and loop which when we benchmarked this for aggs turned out to be an actual performance improvement. (we do have other implementations but this is theoretically the only one that could hurt us here today).
|
|
||
| private final BytesReference[] references; | ||
| private final int[] offsets; | ||
| private final int[] offsets; // we use the offsets to seek into the right BytesReference for random access and slicing |
There was a problem hiding this comment.
Random comment, not related to this work direct, but this array is actually one longer than it needs to be isn't it? It seems to me offsets[0] == 0 always holds?
There was a problem hiding this comment.
Technically yes but it's really ugly if offsets and references are misaligned like that, everything is off by one and you have to branch to simulate the missing zero everywhere too.
(I was contemplating a cleverer slice() implementation that works by putting something negative in offsets[0] instead of slicing the individual refs, thereby letting us share the complete references array between slices, but let's not go there for now...)
ee7c9e1 to
a59efca
Compare
DaveCTurner
left a comment
There was a problem hiding this comment.
I decided I didn't like the extra allocation that the previous idea did (effectively 3 arrays when 1 would do) so I'm proposing this instead.
|
|
||
| private final BytesReference[] references; | ||
| private final int[] offsets; | ||
| private final int[] offsets; // we use the offsets to seek into the right BytesReference for random access and slicing |
There was a problem hiding this comment.
Technically yes but it's really ugly if offsets and references are misaligned like that, everything is off by one and you have to branch to simulate the missing zero everywhere too.
(I was contemplating a cleverer slice() implementation that works by putting something negative in offsets[0] instead of slicing the individual refs, thereby letting us share the complete references array between slices, but let's not go there for now...)
This reverts commit a59efca.
|
Changed my mind again. We have to genuinely permute the array, we can't just shuffle everything downwards and then fill in the tail with copies of |
Today you can construct a `CompositeBytesReference` comprising one or more empty components, but the resulting object is somewhat broken since `getOffsetIndex` won't always return the index of the component containing the index. You can also compose a sequence of bytes that is longer than `Integer.MAX_VALUE` which also yields a pretty broken object (`offsets` is no longer sorted and `length` may be negative). This commit fixes these two corner cases.
💚 Backport successful
|
Today you can construct a `CompositeBytesReference` comprising one or more empty components, but the resulting object is somewhat broken since `getOffsetIndex` won't always return the index of the component containing the index. You can also compose a sequence of bytes that is longer than `Integer.MAX_VALUE` which also yields a pretty broken object (`offsets` is no longer sorted and `length` may be negative). This commit fixes these two corner cases. Co-authored-by: Elastic Machine <elasticmachine@users.noreply.github.com>
Today you can construct a
CompositeBytesReferencecomprising one ormore empty components, but the resulting object is somewhat broken since
getOffsetIndexwon't always return the index of the componentcontaining the index. You can also compose a sequence of bytes that is
longer than
Integer.MAX_VALUEwhich also yields a pretty broken object(
offsetsis no longer sorted andlengthmay be negative).This commit fixes these two corner cases.