ref(aci): use enqueue time to query Snuba in delayed workflow #93882

cathteng · 2025-06-18T22:40:28Z

For a slow alert condition (e.g. "> 10 events in the past 1 minute"), we always use the time the query is being made as the time to start looking backwards from. This is not optimal in the case where:

A task needs to be rerun
There is a backlog in the delayed workflow queue
Because we flush the buffer every minute, here is an edge case where the time period is 1 minute: events come in at 11:00:02 am, are dispatched in a task at ~11:01:00am, the query is run at 11:01:03am or later, which would mean those events aren't picked up in the result

A better solution would be to use the time the event is enqueued into the delayed workflow buffer!

However, we also batch queries such that all alerts that end up making the same queries are grouped together, and we run a single query for multiple groups. For example:

Two events for two different groups come in a different times within the same minute, are both processed by alert A
We will make a single Snuba query for group A with the two groups

In the above case, we'll need to decide which enqueue time to use. We should use the latest enqueue time out of a list of groups as it will cover all groups being queried for in the Snuba query.

sentry-io · 2025-06-18T22:40:41Z

🔍 Existing Issues For Review

Your pull request is modifying functions with the following pre-existing issues:

📄 File: src/sentry/workflow_engine/processors/workflow.py

Function	Unhandled Issue
`evaluate_workflows_action_filters`	SoftTimeLimitExceeded: SoftTimeLimitExceeded() se... `Event Count:` 3

_{Did you find this useful? React with a 👍 or 👎}

kcons · 2025-06-18T22:56:49Z

src/sentry/workflow_engine/processors/delayed_workflow.py

+        Use the latest timestamp for a set of group IDs with the same Snuba query.
+        We will query backwards in time from this point.
+        """
+        if self.timestamp is None or (timestamp is not None and timestamp > self.timestamp):


if timestamp is not None:
self.timestamp = timestamp if self.timestamp is None else max(timestamp, self.timestamp)

perhaps.

kcons · 2025-06-18T22:59:59Z

src/sentry/workflow_engine/processors/delayed_workflow.py

+
+
+@dataclass
+class TimeAndGroups:


Is BulkQueryParameters is more accurate name? Or perhaps GroupQueryParameters? There's a distinction between the unique queries and what we're associating them with that I'm not sure I'm capturing accurately, but TimeAndGroups strikes me as a bit too literal.

kcons · 2025-06-18T23:06:59Z

src/sentry/workflow_engine/processors/delayed_workflow.py

+    def dcg_to_timestamp(self) -> dict[int, datetime | None]:
+        """
+        Uses the latest timestamp each DataConditionGroup was enqueued with
+        All groups enqueued for a DataConditionGroup will have the same query, hence the same max timestamp.


I feel like I should know what this means, but I don't.

kcons · 2025-06-18T23:07:04Z

src/sentry/workflow_engine/processors/delayed_workflow.py

        handler = unique_condition.handler()
+        group_ids = time_and_groups.group_ids


another option is to make groups be a dict[GroupId, datetime | None] and
do time = max(ts for ts in groups.values() if ts, default=current_time).
Stores a bit more data, but lets the summarizing happen where it is being forced, which has a certain appeal.

i need to refactor dcg_to_timestamp for this, if it comes to it we can do a refactor

kcons · 2025-06-18T23:10:14Z

src/sentry/workflow_engine/processors/workflow.py

@@ -75,6 +77,7 @@ class DelayedWorkflowItem:
    delayed_conditions: list[DataCondition]
    event: GroupEvent
    source: WorkflowDataConditionGroupType
+    timestamp: datetime


worth explaining what this timestamp is and what it should correspond to. Or, rename the field to make the comment pointless.

kcons · 2025-06-18T23:11:26Z

src/sentry/workflow_engine/processors/workflow.py

+            {
+                "event_id": self.event.event_id,
+                "occurrence_id": self.event.occurrence_id,
+                "timestamp": self.timestamp,


TIL we can dumps a datetime.

kcons · 2025-06-18T23:15:35Z

src/sentry/workflow_engine/processors/delayed_workflow.py

@@ -79,6 +79,7 @@
 class EventInstance(BaseModel):
    event_id: str
    occurrence_id: str | None = None
+    timestamp: datetime | None = None


You should probably add a test that requires us to parse this value correctly from the expected format. I strongly suspect we don't.

kcons

Need test to verify we can parse it and pydantic won't freak out.

codecov · 2025-06-18T23:22:10Z

Codecov Report

Attention: Patch coverage is 86.11111% with 10 lines in your changes missing coverage. Please review.

✅ All tests successful. No failed tests found.

Files with missing lines	Patch %	Lines
...try/workflow_engine/processors/delayed_workflow.py	65.51%	10 Missing ⚠️

Additional details and impacted files

@@             Coverage Diff             @@
##           master   #93882       +/-   ##
===========================================
+ Coverage   37.94%   81.96%   +44.01%     
===========================================
  Files        9784    10351      +567     
  Lines      553763   598883    +45120     
  Branches    23268    23267        -1     
===========================================
+ Hits       210143   490884   +280741     
+ Misses     343151   106950   -236201     
- Partials      469     1049      +580

kcons · 2025-06-23T23:10:18Z

src/sentry/workflow_engine/processors/delayed_workflow.py

+    group_ids: set[GroupId] = field(default_factory=set)
+    timestamp: datetime | None = None
+
+    def update_timestamp(self, timestamp: datetime | None) -> None:


Seems simpler and safer to have update(self, group_id: GroupId, timestamp: datetime | None).

kcons · 2025-06-23T23:22:03Z

src/sentry/workflow_engine/processors/workflow.py

@@ -74,6 +76,9 @@ class DelayedWorkflowItem:
    delayed_conditions: list[DataCondition]
    event: GroupEvent
    source: WorkflowDataConditionGroupType
+    timestamp: (
+        datetime  # time the item was created for enqueue. used in delayed workflow Snuba query


maybe

# Used to pick the end of the time window in snuba querying. # Should be close to when fast conditions were evaluated to try to be consistent.

What you have is fine, though.

github-actions bot added the Scope: Backend label Jun 18, 2025

cathteng requested a review from kcons June 18, 2025 22:49

cathteng marked this pull request as ready for review June 18, 2025 22:49

cathteng requested a review from a team as a code owner June 18, 2025 22:50

vercel bot deployed to Preview June 18, 2025 22:50 View deployment

kcons reviewed Jun 18, 2025

View reviewed changes

kcons requested changes Jun 18, 2025

View reviewed changes

vercel bot deployed to Preview June 18, 2025 23:37 View deployment

cathteng added 4 commits June 23, 2025 13:09

use enqueue time to query Snuba in delayed workflow

Verified

This commit was created on GitHub.com and signed with GitHub’s verified signature.

GPG key ID: B5690EEEBB952194

Verified
Learn about vigilant mode

3339f43

docstring for dcg_to_timestamp

1b485e4

test timestamp parsing

465eda2

fixes from review

Loading
Loading status checks…

fb91cb3

cathteng force-pushed the cathy/aci/delayed-workflow-start-time branch from b304ccd to fb91cb3 Compare June 23, 2025 20:09

cathteng requested a review from kcons June 23, 2025 20:10

vercel bot deployed to Preview June 23, 2025 20:11 View deployment

fix typing

Loading
Loading status checks…

f9d1b8c

vercel bot deployed to Preview June 23, 2025 22:59 View deployment

kcons approved these changes Jun 23, 2025

View reviewed changes

smol nits i support

Loading
Loading status checks…

da998a7

vercel bot deployed to Preview June 23, 2025 23:41 View deployment

fix typing

Loading
Loading status checks…

977acf1

vercel bot deployed to Preview June 24, 2025 16:58 View deployment

cathteng enabled auto-merge (squash) June 24, 2025 17:12

cathteng merged commit 203d323 into master Jun 24, 2025
64 checks passed

cathteng deleted the cathy/aci/delayed-workflow-start-time branch June 24, 2025 17:14

github-actions bot locked and limited conversation to collaborators Jul 10, 2025

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Uh oh!

ref(aci): use enqueue time to query Snuba in delayed workflow #93882

ref(aci): use enqueue time to query Snuba in delayed workflow #93882

cathteng commented Jun 18, 2025

Uh oh!

sentry-io bot commented Jun 18, 2025

Uh oh!

kcons Jun 18, 2025

Uh oh!

cathteng Jun 23, 2025

Uh oh!

kcons Jun 18, 2025

Uh oh!

kcons Jun 18, 2025

Uh oh!

kcons Jun 18, 2025

Uh oh!

cathteng Jun 23, 2025

Uh oh!

kcons Jun 18, 2025

Uh oh!

kcons Jun 18, 2025

Uh oh!

kcons Jun 18, 2025

Uh oh!

kcons left a comment

Uh oh!

codecov bot commented Jun 18, 2025 •

edited

Loading

Uh oh!

kcons Jun 23, 2025

Uh oh!

kcons Jun 23, 2025

Uh oh!

Uh oh!

		handler = unique_condition.handler()
		group_ids = time_and_groups.group_ids

Uh oh!

ref(aci): use enqueue time to query Snuba in delayed workflow #93882

ref(aci): use enqueue time to query Snuba in delayed workflow #93882

Conversation

cathteng commented Jun 18, 2025

Uh oh!

sentry-io bot commented Jun 18, 2025

🔍 Existing Issues For Review

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

kcons left a comment

Choose a reason for hiding this comment

Uh oh!

codecov bot commented Jun 18, 2025 • edited Loading Uh oh! There was an error while loading. Please reload this page.

Uh oh!

Codecov Report

Uh oh!

Uh oh!

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Choose a reason for hiding this comment

Uh oh!

Uh oh!

Uh oh!

Uh oh!

codecov bot commented Jun 18, 2025 •

edited

Loading