Skip to content

[Transform] respect timeout parameters in all API's#79468

Merged
hendrikmuhs merged 5 commits into
elastic:masterfrom
hendrikmuhs:transfrom-timeouts
Oct 25, 2021
Merged

[Transform] respect timeout parameters in all API's#79468
hendrikmuhs merged 5 commits into
elastic:masterfrom
hendrikmuhs:transfrom-timeouts

Conversation

@hendrikmuhs

Copy link
Copy Markdown

make timeout parameter explicit in all transform actions and pass timeout values in sub-calls.

fixes #79268

@hendrikmuhs hendrikmuhs marked this pull request as ready for review October 19, 2021 15:59
@elasticmachine elasticmachine added the Team:ML Meta label for the ML team label Oct 19, 2021
@elasticmachine

Copy link
Copy Markdown
Collaborator

Pinging @elastic/ml-core (Team:ML)

@hendrikmuhs

Copy link
Copy Markdown
Author

Pinging @elastic/es-core-infra

Please have a look at the proposed change to AcknowledgedRequest.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Shouldn't the superclass state be considered for equality (and hashcode)?

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This question gets to the heart of many of the changes in this PR.

The superclass state is basically the master timeout and parent task ID. The superclasses do not currently have equals or hashCode methods, and nor did this class before this PR.

In high level terms this was saying that, say, a create index request would be equal to another create index request if they both had the same arguments related to creating indices but a different timeout, different master node timeout and different parent task ID.

This PR is changing that policy for the transform requests and making it ambiguous for future maintainers about whether timeouts should be considered for equality of requests or not.

Given that timeouts haven't been considered part of equality for years and nobody has complained I think it would be best to:

  1. Remove these new equals and hashCode methods
  2. Don't call them from the transform request equals and hashCode
  3. Add comments to AcknowledgedRequest, MasterNodeRequest and TransportRequest saying it's deliberate that their data members are not considered part of request equality

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

TL;DR I'm ok with the above suggestion by @droberts195.

The decision here somewhat boils down to "what is a sensible default".

Subclasses have access to all the superclasses state, masterNodeTimeout, parentTaskId and remoteAddress, so can choose to consider it in equality or not. Subclasses can always choose to not delegate to the super implementation so can opt-out of the default, or subclasses can choose to always initially delegate to the super implementation and then perform subclass specific additional constraints (blissfully ignorant of the superclass's state).

I don't have enough context here, but maybe it would be worth starting at the bottom (or top, depends on your perspective ;-) ) of the hierarchy to determine reasonable equality semantics, e.g. are two abstract TransportMessage objects considered equal if they have the same remoteAddress? Is this even a reasonable question to ask.

@droberts195 droberts195 Oct 20, 2021

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It seems there are really two aspects to this PR.

The first is adding timeout options to all the transform API requests. This is a non-controversial and necessary improvement based on real world problems we have seen that are caused by not having a configurable timeout.

Then the second aspect is the can of worms around equality comparisons for the hierarchy of request classes. I think it's best to open a separate discuss issue for that and not mess with it in this PR. The lack of equals and hashCode for the base classes dates back 10 years, for example: https://github.com/elastic/elasticsearch/blob/a8fd2d48b8f3f17d68ed27c3104e2c9e2eb6cc9c/src/main/java/org/elasticsearch/action/support/master/MasterNodeOperationRequest.java

Maybe it is worth revisiting, but I think any changes that are decided on should be separated out from this transforms PR into a core PR. I also don't think any changes to the base request classes should go into 7.16, because the transport client is still supported for 7.16 and uses these classes. The last thing we want to do is cause some subtle problem for someone who's still using the transport client in the very last version where it will work.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If this is controversial I rather remove the added code. As @droberts195 said, the lack of equality and hashCode is all over the place. In other places e.g. https://github.com/elastic/elasticsearch/blob/master/x-pack/plugin/core/src/main/java/org/elasticsearch/xpack/core/transform/action/StopTransformAction.java#L171 I added a workaround. I will use the same workaround if I am asked to remove this.

+1 for doing any bigger re-factorings in a separate PR. Anything else is to risky and shouldn't target 7.16.

Copy link
Copy Markdown
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Comment on lines +90 to +91
// the base class does not implement equals, therefore we need to check timeout ourselves
return Objects.equals(id, other.id) && force == other.force && timeout().equals(other.timeout());

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This is still a discrepancy with every other request class in the Elasticsearch codebase - all the others consider requests to be equal if they have different timeouts but the same arguments related to the actual action. For example, the create index request docs show both timeout and master_timeout are supported. But the create index request code checks neither when considering equality:

@Override
public boolean equals(Object obj) {
if (this == obj) return true;
if (obj == null || getClass() != obj.getClass()) return false;
CreateIndexRequest that = (CreateIndexRequest) obj;
return Objects.equals(cause, that.cause) &&
Objects.equals(index, that.index) &&
Objects.equals(settings, that.settings) &&
Objects.equals(mappings, that.mappings) &&
Objects.equals(aliases, that.aliases) &&
Objects.equals(waitForActiveShards, that.waitForActiveShards) &&
Objects.equals(origin, that.origin);
}

I've searched for another request class that considers timeout part of equality and cannot find one.

You are assuming that's a bug across the whole Elasticsearch codebase and by changing it in transforms you're making a start on fixing it. It may well be, as the pattern for checking request equality dates back to when there was only one person working on the codebase and it may have been a simple mistake that's been propagated repeatedly over the years. But what if it's the intention that timeouts are not considered part of equality for requests? Then you've introduced a bug into transforms. It's a shame there are no comments to indicate.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've searched for another request class that considers timeout part of equality and cannot find one.

Actually I searched a bit harder and found three. ClusterRerouteRequest:


UpdateSettingsRequest:
and DeleteDataFrameAnalyticsAction.Request:

But that's three out of several hundred, so it still deserves wider discussion.

Copy link
Copy Markdown
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

CreateIndex is an interesting one, if you go back in history:
b3337c3#diff-f919ea00a6a4a7c7c33395ac0a46d492e09a57a8fb827db92e184aa5c62d2513
you see that it originally had timeout but equals and hashCode has not been implemented. I think it has not been important back than.

I therefore think it has not been intentionally left out, but has been forgotten or as said, wasn't important.

It seems to me that equals and hashCode are only indirectly important for the product. It helps us in writing better unit tests and avoids silly - but hard to debug - issues in serialization and other areas. That's why this PR adds a lot of test code that follows up on #25910 by implementing mutateInstance where it makes sense.

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

OK cool. I'll assume it's best practice to consider timeouts in equality then, and the transform requests are now making a start on that.

@hendrikmuhs

Copy link
Copy Markdown
Author

run elasticsearch-ci/part-1

@droberts195 droberts195 left a comment

Copy link
Copy Markdown

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

LGTM

@hendrikmuhs hendrikmuhs merged commit 4202262 into elastic:master Oct 25, 2021
@hendrikmuhs hendrikmuhs deleted the transfrom-timeouts branch October 25, 2021 06:20
hendrikmuhs pushed a commit to hendrikmuhs/elasticsearch that referenced this pull request Oct 25, 2021
make timeout parameter explicit in all transform actions and pass timeout values in sub-calls.

fixes elastic#79268
elasticsearchmachine pushed a commit that referenced this pull request Oct 25, 2021
…79696)

* [Transform] respect timeout parameters in all API's (#79468)

make timeout parameter explicit in all transform actions and pass timeout values in sub-calls.

fixes #79268

* add timeouts for old endpoints
lockewritesdocs pushed a commit to lockewritesdocs/elasticsearch that referenced this pull request Oct 28, 2021
make timeout parameter explicit in all transform actions and pass timeout values in sub-calls.

fixes elastic#79268
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment

Labels

Projects

None yet

Development

Successfully merging this pull request may close these issues.

[Transform] transform start call does not respect timeout parameter

7 participants