-
Notifications
You must be signed in to change notification settings - Fork 7k
[core][1eventx/04] job event: use a separated thread #55395
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Summary of Changes
Hello @can-anyscale, I'm Gemini Code Assist1! I'm currently reviewing this pull request and will post my feedback shortly. In the meantime, here's a summary to help you and other reviewers quickly get up to speed!
I've refactored the job event recording mechanism to utilize a dedicated thread within the GCS server. This change aims to prevent job event processing from blocking other critical GCS server operations, thereby improving overall system responsiveness and stability. The core idea is to isolate this specific task to ensure it doesn't contend for resources with more time-sensitive operations.
Highlights
- Dedicated Thread for Job Events: The RayJobEventRecorder now operates on its own dedicated I/O context (ray_event_io_context), isolating job event recording from other critical GCS server operations. This is a performance optimization.
- Flexible Event Recorder Initialization: The RayEventRecorderBase class gains a new constructor that allows it to internally manage its EventAggregatorClient and ClientCallManager, simplifying its integration by only requiring an I/O context and a dashboard agent port.
- GCS Server Integration: The GCS server's initialization of RayJobEventRecorder has been updated to leverage this new constructor, ensuring job events are handled on their dedicated thread.
- Build and Test Updates: The build system and unit tests have been adjusted to reflect these architectural changes, ensuring proper compilation and testing of the new event recording mechanism.
Using Gemini Code Assist
The full guide for Gemini Code Assist can be found on our documentation page, here are some quick tips.
Invoking Gemini
You can request assistance from Gemini at any point in your pull request via creating an issue comment (i.e. comment on the pull request page) using either /gemini <command> or @gemini-code-assist <command>. Below is a summary of the supported commands.
| Feature | Command | Description |
|---|---|---|
| Code Review | /gemini review |
Performs a code review for the current pull request in its current state. |
| Pull Request Summary | /gemini summary |
Provides a summary of the current pull request in its current state. |
| Comment | @gemini-code-assist | Responds in comments when explicitly tagged, both in issue comments and review comments. |
| Help | /gemini help |
Displays a list of available commands. |
Customization
To customize Gemini Code Assist for GitHub experience, repository maintainers can create a configuration file and/or provide a custom code review style guide (such as PEP-8 for Python) by creating and adding files to a .gemini/ folder in the base of the repository. Detailed instructions can be found here.
Limitations & Feedback
Gemini Code Assist may make mistakes. Please leave feedback on any instances where its feedback is incorrect or counter productive. You can react with 👍 and 👎 on @gemini-code-assist comments or fill out our survey to provide feedback.
You can also get AI-powered code generation, chat, as well as code reviews directly in the IDE at no cost with the Gemini Code Assist IDE Extension.
Footnotes
-
Review the Privacy Notices, Generative AI Prohibited Use Policy, Terms of Service, and learn how to configure Gemini Code Assist in GitHub here. Gemini can make mistakes, so double check it and use code with caution. ↩
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Code Review
This pull request refactors the job event recording logic to use a dedicated thread, which is a good improvement for decoupling and performance. The changes are mostly correct, but I've found a critical issue where a member variable is not initialized, which would lead to a crash. I've also pointed out a medium-severity issue with member declaration order that should be fixed to improve code quality and prevent future bugs. Please address these points.
fa02b1e to
623d909
Compare
|
@can-anyscale can you provide a little more context on the motivation for this? What is the work that the The concurrency model for the GCS needs a holistic overhaul, so want to make sure we move in roughly the right direction with whatever we do here. |
|
@edoakes: great point, my intention was more for correctness than performance; I chat to @MengjinYan about this too, I'll make a post on the team channel |
899b2b3 to
8293494
Compare
623d909 to
4c27d62
Compare
8293494 to
6e81940
Compare
187f7cf to
652d574
Compare
6e81940 to
8f21f36
Compare
4f72608 to
bf607da
Compare
8f21f36 to
2da57e8
Compare
| "task_io_context", | ||
| "pubsub_io_context", | ||
| "ray_syncer_io_context", | ||
| "ray_event_io_context"}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
| "ray_event_io_context"}; | |
| "event_export_io_context"}; |
we should call it what it is directly. "ray event" is not really meaningful inside the codebase
| namespace ray { | ||
| namespace telemetry { | ||
|
|
||
| template <typename TEventData> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's please not call everything RayEvent -- it's inside the ray codebase, there's no need to call it Ray. else we should rename everything ("RayMetrics", "RayCoreWorker", "RayGCS", ...)
this bugs me with the "ray syncer" already
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I'd also suggest naming the directory something else since "telemetry" already has a pretty specific meaning (the usage data we collect from ray clusters by default)
event_export?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah got you, this directory includes the open-telemetry stuff as well; maybe i can just rename it to observability
| protected: | ||
| RayEventRecorderBase( | ||
| std::unique_ptr<rpc::EventAggregatorClient> event_aggregator_client, | ||
| instrumented_io_context &io_service); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
let's always dependency-inject the client instead of adding a private constructor
|
|
||
| private: | ||
| rpc::EventAggregatorClient &event_aggregator_client_; | ||
| std::unique_ptr<rpc::ClientCallManager> client_call_manager_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need a client call manager? this will spin up extra grpc threads.
should be reused globally from whatever component we're in
the logic to do that can exist wherever we construct the gRPC client and dependency inject it
| template <typename TEventData> | ||
| RayEventRecorderBase<TEventData>::RayEventRecorderBase( |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
why do we need a base class for this? my understanding is the event data is all of type rpc::events::RayEventsData
so we should be able to use a single concrete event export client
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
Signed-off-by: Cuong Nguyen <[email protected]>
2da57e8 to
0df013a
Compare
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic. ------------------ In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling. Test: - CI --------- Signed-off-by: Cuong Nguyen <[email protected]>
This is part of a series of PRs to support JobEvent in the oneevent framework. The full effort will include adding the JobEvent schema, introducing a generic interface for exporting different types of events to the Event Aggregator, and implementing the necessary integration logic.
In this PR, we improve the isolation of the event exporting job from other GCS components by using separate threads for periodic execution and gRPC request handling.
Test: