Use fixed size memory allocation in IndicesPermission#77748
Conversation
This changes the implementation of the Role.authorize method so that data structures can be constructed with a known size. Previously, when authorizing a request with a large number of indices, the HashMaps would need to resize themselves multiple times, at a noticeable performance cost. In order to know how large these maps need to be, we now expand all named resource (indices, aliases, data-streams) upfront and then set the initial size of the maps accordingly.
|
Pinging @elastic/es-security (Team:Security) |
|
On a large cluster (250k indices) with a large request (50k indices) this cuts the execution time of |
albertzaharovits
left a comment
There was a problem hiding this comment.
LGTM I confirm that the interface has not changed.
ywangd
left a comment
There was a problem hiding this comment.
LGTM
Expanding all concrete indices upfront is likely to increase heap usage to some extend. It may not be worth concerning unless the cluster and request are configured really badly (e.g. tons of aliases pointing to the same indices/data streams). But I wonder whether we could still be better prepared for this situation if we change IndexResource to return concreteIndices as an Iterator. This can be done by retaining a reference of IndexAbstraction in the create method. The change should(?) be relative small. But I'll leave it up to you for consideration.
| assert concreteIndices.isEmpty() || concreteIndices.contains(name) : "An object of type " | ||
| + type | ||
| + " must reference itself"; |
There was a problem hiding this comment.
Why do we allow both an empty collection and a singleton collection for concrete Index type? I'd prefer to pick either one of them. IIUC, the subtle difference between empty and singleton is that an empty collection indicates the concrete index does not exist in the cluster state yet. But do we care this difference for authorization?
There was a problem hiding this comment.
Because I was trying to avoid implementing behavioural changes, and that's what the existing code does - a missing index is treated like a concrete index with no expanded names.
| for (String indexOrAlias : requestedIndicesOrAliases) { | ||
| final IndexResource resource = IndexResource.create(indexOrAlias, lookup.get(indexOrAlias)); | ||
| resources.add(resource); | ||
| totalResourceCount += resource.size(); |
There was a problem hiding this comment.
It is possible that concreteIndices of multiple IndexResource overlap. For example:
requestedIndicesOrAliases = Set.of("alias1", "alias2")
where both aliases point to index1.
So the total count of distinct resources is 3 instead of 4.
There was a problem hiding this comment.
Yes, that is possible. The important thing here is to avoid reallocating the maps. If we over-size them that's a tiny heap cost with essentially no performance cost.
| final Map<String, Set<FieldPermissions>> fieldPermissionsByIndex = new HashMap<>(totalResourceCount); | ||
| final Map<String, DocumentLevelPermissions> roleQueriesByIndex = new HashMap<>(totalResourceCount); |
There was a problem hiding this comment.
Unlike grantedBuilder, these two Maps may not be populated for every concrete index because they are only calculated if there is at least one group grants the permission. So we could be allocating more than needed. But when there are indices rejected by all permission groups, maybe it is always the unhappy path (request rejected) and we don't really care that much about a bit more memory consumption?
Yes it does, but given we build multiple maps, each of which has those indices as keys, we're already using a lot of heap for this. We can probably get some good gains in heap usage by having a single Map with a richer object as the value. e.g. I don't know that it's worth it, but if we're worried about heap, that's probably a better place to make savings.
I did consider that, but it felt like it would be more complex than that. I'll try it out and see. |
It is more complex, but it works out OK, so I've merged it to this PR. |
|
@elasticmachine update branch |
This changes the implementation of the Role.authorize method so that data structures can be constructed with a known size. Previously, when authorizing a request with a large number of indices, the HashMaps would need to resize themselves multiple times, at a noticeable performance cost. In order to know how large these maps need to be, we now expand all named resource (indices, aliases, data-streams) upfront and then set the initial size of the maps accordingly.
This changes the implementation of the Role.authorize method so that data structures can be constructed with a known size. Previously, when authorizing a request with a large number of indices, the HashMaps would need to resize themselves multiple times, at a noticeable performance cost. In order to know how large these maps need to be, we now expand all named resource (indices, aliases, data-streams) upfront and then set the initial size of the maps accordingly. Co-authored-by: Tim Vernum <tim.vernum@elastic.co>
This changes the implementation of the IndicesPermission.authorize
method so that data structures can be constructed with a known size.
Previously, when authorizing a request with a large number of indices,
the HashMaps would need to resize themselves multiple times, at a
noticeable performance cost.
In order to know how large these maps need to be, we now expand all
named resource (indices, aliases, data-streams) upfront and then set
the initial size of the maps accordingly.