-
-
Notifications
You must be signed in to change notification settings - Fork 4.4k
ref(aci): use enqueue time to query Snuba in delayed workflow #93882
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
🔍 Existing Issues For ReviewYour pull request is modifying functions with the following pre-existing issues: 📄 File: src/sentry/workflow_engine/processors/workflow.py
Did you find this useful? React with a 👍 or 👎 |
Use the latest timestamp for a set of group IDs with the same Snuba query. | ||
We will query backwards in time from this point. | ||
""" | ||
if self.timestamp is None or (timestamp is not None and timestamp > self.timestamp): |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if timestamp is not None:
self.timestamp = timestamp if self.timestamp is None else max(timestamp, self.timestamp)
perhaps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
sgtm
|
||
|
||
@dataclass | ||
class TimeAndGroups: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is BulkQueryParameters is more accurate name? Or perhaps GroupQueryParameters
? There's a distinction between the unique queries and what we're associating them with that I'm not sure I'm capturing accurately, but TimeAndGroups
strikes me as a bit too literal.
def dcg_to_timestamp(self) -> dict[int, datetime | None]: | ||
""" | ||
Uses the latest timestamp each DataConditionGroup was enqueued with | ||
All groups enqueued for a DataConditionGroup will have the same query, hence the same max timestamp. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I feel like I should know what this means, but I don't.
handler = unique_condition.handler() | ||
group_ids = time_and_groups.group_ids |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
another option is to make groups
be a dict[GroupId, datetime | None]
and
do time = max(ts for ts in groups.values() if ts, default=current_time)
.
Stores a bit more data, but lets the summarizing happen where it is being forced, which has a certain appeal.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
i need to refactor dcg_to_timestamp for this, if it comes to it we can do a refactor
@@ -75,6 +77,7 @@ class DelayedWorkflowItem: | |||
delayed_conditions: list[DataCondition] | |||
event: GroupEvent | |||
source: WorkflowDataConditionGroupType | |||
timestamp: datetime |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
worth explaining what this timestamp is and what it should correspond to. Or, rename the field to make the comment pointless.
{ | ||
"event_id": self.event.event_id, | ||
"occurrence_id": self.event.occurrence_id, | ||
"timestamp": self.timestamp, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
TIL we can dumps
a datetime.
@@ -79,6 +79,7 @@ | |||
class EventInstance(BaseModel): | |||
event_id: str | |||
occurrence_id: str | None = None | |||
timestamp: datetime | None = None |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
You should probably add a test that requires us to parse this value correctly from the expected format. I strongly suspect we don't.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Need test to verify we can parse it and pydantic won't freak out.
Codecov ReportAttention: Patch coverage is ✅ All tests successful. No failed tests found.
Additional details and impacted files@@ Coverage Diff @@
## master #93882 +/- ##
===========================================
+ Coverage 37.94% 81.96% +44.01%
===========================================
Files 9784 10351 +567
Lines 553763 598883 +45120
Branches 23268 23267 -1
===========================================
+ Hits 210143 490884 +280741
+ Misses 343151 106950 -236201
- Partials 469 1049 +580 |
b304ccd
to
fb91cb3
Compare
group_ids: set[GroupId] = field(default_factory=set) | ||
timestamp: datetime | None = None | ||
|
||
def update_timestamp(self, timestamp: datetime | None) -> None: |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Seems simpler and safer to have update(self, group_id: GroupId, timestamp: datetime | None)
.
@@ -74,6 +76,9 @@ class DelayedWorkflowItem: | |||
delayed_conditions: list[DataCondition] | |||
event: GroupEvent | |||
source: WorkflowDataConditionGroupType | |||
timestamp: ( | |||
datetime # time the item was created for enqueue. used in delayed workflow Snuba query |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
maybe
# Used to pick the end of the time window in snuba querying.
# Should be close to when fast conditions were evaluated to try to be consistent.
What you have is fine, though.
For a slow alert condition (e.g. "> 10 events in the past 1 minute"), we always use the time the query is being made as the time to start looking backwards from. This is not optimal in the case where:
A better solution would be to use the time the event is enqueued into the delayed workflow buffer!
However, we also batch queries such that all alerts that end up making the same queries are grouped together, and we run a single query for multiple groups. For example:
In the above case, we'll need to decide which enqueue time to use. We should use the latest enqueue time out of a list of groups as it will cover all groups being queried for in the Snuba query.