Skip initialization if crash was caused by the SDK itself #1418

bruno-garcia · 2021-10-21T17:09:02Z

When SentrySDK.start is called and a crash report is detected, skip initialization altogether in case that crash was caused by code within the SDK itself. Until new SDK version / app version it should stay disabled.

The text was updated successfully, but these errors were encountered:

philipphofmann · 2021-10-21T17:15:11Z

We could identify if a crash is coming from the SDK itself by looking at the stracktrace and sending a special type event to Sentry, so we know that we are crashing. Our users could get a special warning in Sentry so that they are aware something is wrong. Then they could ship a patch do downgrade to the latest stable version.

philipphofmann · 2021-10-21T17:25:31Z

SentryCrash has a similar concept of a recrash report, see

sentry-cocoa/Sources/SentryCrash/Recording/SentryCrashC.c

Lines 101 to 102 in ad3b44b

    
           if (monitorContext->crashedDuringCrashHandling) { 
        
               sentrycrashreport_writeRecrashReport(monitorContext, g_lastCrashReportFilePath);

marandaneto · 2021-10-22T09:55:52Z

let me play a bit the defensive person here, if the crash is actually on sending events, raising a new event actually causes a sort of infinite loop, I'm not a big fan of raising SDK's issue because of that.
Also, if we introduce a bug on checking if an event is from the SDK or not, might crash everyone.
I'm trying to just give food for thought.

bruno-garcia · 2021-10-26T14:29:10Z

if the crash is actually on sending events, raising a new event actually causes a sort of infinite loop

Right, the idea is to just turn off the SDK if we detect the crash we're processing was initiated from the SDK itself. Not to try to capture anything.

marandaneto · 2021-10-27T08:53:56Z

got it, what's about if we indeed shipped a bug'ed SDK, then fix/release it, the SDK is still not going to be able to init by itself since there's a crash there and it bails out, wondering how this would work.

bruno-garcia · 2021-10-27T14:11:19Z

got it, what's about if we indeed shipped a bug'ed SDK, then fix/release it, the SDK is still not going to be able to init by itself since there's a crash there and it bails out, wondering how this would work.

The SDK will check if version changed or app was reinstalled

bruno-garcia · 2021-10-27T14:18:15Z

One idea that came up during a discussion, to avoid getting completely blind when we turn this off:
Have a simple GET request to relay with /projectid/sdk.name/version so we can get metrics on the backend about this feature kicking in. This obviously is a rough idea and requires more discussions with other teams.

philipphofmann · 2021-12-30T10:38:14Z

According to @armcknight, this is what they did for Specto:

... we didn’t do anything like inspect a stack trace to make the determination. We simply write marker files at various stages of initialization, and then before starting subsequent initializations, check if we wrote a “init succeeded” marker from the last init. If not, we bail until we see a new sdk/app/OS version.

armcknight · 2022-01-26T19:07:40Z

just remembered we wrote a blog post describing the strategy: https://medium.com/specto/preventing-repeated-crashes-on-launch-in-our-sdk-20cb4cc3e430

philipphofmann · 2023-07-13T13:42:16Z

This would break the SDK crash detection: getsentry/sentry#44342. We want that the SDK sends the crash event. If our SDK keeps crashing, we have the report start-up crashes feature to ensure the event ends up in Sentry #2220. If the SDK can't even report the start-up crash anymore, I think we shouldn't skip the initialization but instead keep crashing so our customers are aware. I don't think it's worth the effort to plan for this edge case, and as pointed out by @marandaneto, we could keep crashing anyways. Instead, we should ensure with a proper test suite that this doesn't happen. I don't think this is required anymore. Please reopen with a comment if you disagree.

armcknight · 2023-11-27T19:09:21Z

In the case you describe where we cause an app launch crash and say we should let it happen so customers are aware, I disagree. That is the worst possible UX issue we could cause for end users. They still need the app to work, regardless of whether Sentry is working correctly. What you're proposing is that Sentry is more important than the app, I disagree.

We could implement something like a heartbeat by which we present a notification in the frontend like "hey, we stopped receiving events from app version X at datetime Y, something may be wrong with your Sentry installation in production, please investigate and push out an update"

bruno-garcia · 2023-11-27T23:48:14Z

Agree with Andrew here, we absolutely shouldn't crash the app. If we do, it shouldn't be more than once (if we can do that). Make sure that for that version of the app/Sentry we skip init again.

philipphofmann · 2023-11-28T13:48:11Z

In the case you describe where we cause an app launch crash and say we should let it happen so customers are aware, I disagree.

I might have been wrong here. I don't think Sentry is more important than the app. One of the worst things that can happen is that customers think everything is fine although the house is on fire. The worst thing for us is that we repeatedly crash an app. We could come up with a strategy similar to what you described in your blog post to avoid repeated crashes, but we must ensure that the strategy notifies us and tells us that something is broken. We could send a special type of event to Sentry only once after we detected that our SDK repeatedly crashed to avoid using up customer quota. With the SDK crash detection, it would be easy for us to set an alarm.

What made you revisit the decision here, @armcknight? Would you like to reopen the issue?

armcknight · 2023-11-28T20:17:19Z

We could try to send a special type of event to notify ourselves of the situation, but barring that, we could fall back on the heartbeat type of strategy as well.

I think I saw this in an old github notification yesterday I'd never followed up on 😄 We could reopen it and backlog it, like I said I'm not sure about prioritization.

philipphofmann · 2023-11-29T15:21:21Z

We'll discuss it in our next sync.

bruno-garcia added the Type: Enhancement label Oct 21, 2021

philipphofmann added Effort: Large labels Oct 29, 2021

philipphofmann added the Status: Backlog label Nov 8, 2021

philipphofmann added the Platform: Cocoa label Dec 22, 2021

philipphofmann added this to Mobile & Cross Platform SDK Dec 23, 2021

philipphofmann moved this to Needs Discussion in Mobile & Cross Platform SDK Dec 23, 2021

philipphofmann moved this from Needs Discussion to Backlog in Mobile & Cross Platform SDK Mar 2, 2022

kahest removed Impact: Large labels Jul 11, 2023

philipphofmann closed this as not planned Won't fix, can't repro, duplicate, stale Jul 13, 2023

github-project-automation bot moved this from Backlog to Done in Mobile & Cross Platform SDK Jul 13, 2023

philipphofmann reopened this Nov 29, 2023

philipphofmann moved this from Done to Needs Discussion in Mobile & Cross Platform SDK Nov 29, 2023

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Skip initialization if crash was caused by the SDK itself #1418

Skip initialization if crash was caused by the SDK itself #1418

bruno-garcia commented Oct 21, 2021 •

edited

Loading

philipphofmann commented Oct 21, 2021 •

edited

Loading

philipphofmann commented Oct 21, 2021

marandaneto commented Oct 22, 2021

bruno-garcia commented Oct 26, 2021

marandaneto commented Oct 27, 2021

bruno-garcia commented Oct 27, 2021

bruno-garcia commented Oct 27, 2021

philipphofmann commented Dec 30, 2021

armcknight commented Jan 26, 2022

philipphofmann commented Jul 13, 2023

armcknight commented Nov 27, 2023

bruno-garcia commented Nov 27, 2023

philipphofmann commented Nov 28, 2023

armcknight commented Nov 28, 2023

philipphofmann commented Nov 29, 2023

Skip initialization if crash was caused by the SDK itself #1418

Skip initialization if crash was caused by the SDK itself #1418

Comments

bruno-garcia commented Oct 21, 2021 • edited Loading

philipphofmann commented Oct 21, 2021 • edited Loading

philipphofmann commented Oct 21, 2021

marandaneto commented Oct 22, 2021

bruno-garcia commented Oct 26, 2021

marandaneto commented Oct 27, 2021

bruno-garcia commented Oct 27, 2021

bruno-garcia commented Oct 27, 2021

philipphofmann commented Dec 30, 2021

armcknight commented Jan 26, 2022

philipphofmann commented Jul 13, 2023

armcknight commented Nov 27, 2023

bruno-garcia commented Nov 27, 2023

philipphofmann commented Nov 28, 2023

armcknight commented Nov 28, 2023

philipphofmann commented Nov 29, 2023

bruno-garcia commented Oct 21, 2021 •

edited

Loading

philipphofmann commented Oct 21, 2021 •

edited

Loading