Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Skip initialization if crash was caused by the SDK itself #1418

Open
bruno-garcia opened this issue Oct 21, 2021 · 15 comments
Open

Skip initialization if crash was caused by the SDK itself #1418

bruno-garcia opened this issue Oct 21, 2021 · 15 comments

Comments

@bruno-garcia
Copy link
Member

bruno-garcia commented Oct 21, 2021

When SentrySDK.start is called and a crash report is detected, skip initialization altogether in case that crash was caused by code within the SDK itself. Until new SDK version / app version it should stay disabled.

@philipphofmann
Copy link
Member

philipphofmann commented Oct 21, 2021

We could identify if a crash is coming from the SDK itself by looking at the stracktrace and sending a special type event to Sentry, so we know that we are crashing. Our users could get a special warning in Sentry so that they are aware something is wrong. Then they could ship a patch do downgrade to the latest stable version.

@philipphofmann
Copy link
Member

SentryCrash has a similar concept of a recrash report, see

if (monitorContext->crashedDuringCrashHandling) {
sentrycrashreport_writeRecrashReport(monitorContext, g_lastCrashReportFilePath);

@marandaneto
Copy link
Contributor

let me play a bit the defensive person here, if the crash is actually on sending events, raising a new event actually causes a sort of infinite loop, I'm not a big fan of raising SDK's issue because of that.
Also, if we introduce a bug on checking if an event is from the SDK or not, might crash everyone.
I'm trying to just give food for thought.

@bruno-garcia
Copy link
Member Author

if the crash is actually on sending events, raising a new event actually causes a sort of infinite loop

Right, the idea is to just turn off the SDK if we detect the crash we're processing was initiated from the SDK itself. Not to try to capture anything.

@marandaneto
Copy link
Contributor

got it, what's about if we indeed shipped a bug'ed SDK, then fix/release it, the SDK is still not going to be able to init by itself since there's a crash there and it bails out, wondering how this would work.

@bruno-garcia
Copy link
Member Author

got it, what's about if we indeed shipped a bug'ed SDK, then fix/release it, the SDK is still not going to be able to init by itself since there's a crash there and it bails out, wondering how this would work.

The SDK will check if version changed or app was reinstalled

@bruno-garcia
Copy link
Member Author

One idea that came up during a discussion, to avoid getting completely blind when we turn this off:
Have a simple GET request to relay with /projectid/sdk.name/version so we can get metrics on the backend about this feature kicking in. This obviously is a rough idea and requires more discussions with other teams.

@philipphofmann
Copy link
Member

According to @armcknight, this is what they did for Specto:

... we didn’t do anything like inspect a stack trace to make the determination. We simply write marker files at various stages of initialization, and then before starting subsequent initializations, check if we wrote a “init succeeded” marker from the last init. If not, we bail until we see a new sdk/app/OS version.

@armcknight
Copy link
Member

just remembered we wrote a blog post describing the strategy: https://medium.com/specto/preventing-repeated-crashes-on-launch-in-our-sdk-20cb4cc3e430

@philipphofmann philipphofmann moved this from Needs Discussion to Backlog in Mobile & Cross Platform SDK Mar 2, 2022
@philipphofmann
Copy link
Member

This would break the SDK crash detection: getsentry/sentry#44342. We want that the SDK sends the crash event. If our SDK keeps crashing, we have the report start-up crashes feature to ensure the event ends up in Sentry #2220. If the SDK can't even report the start-up crash anymore, I think we shouldn't skip the initialization but instead keep crashing so our customers are aware. I don't think it's worth the effort to plan for this edge case, and as pointed out by @marandaneto, we could keep crashing anyways. Instead, we should ensure with a proper test suite that this doesn't happen. I don't think this is required anymore. Please reopen with a comment if you disagree.

@philipphofmann philipphofmann closed this as not planned Won't fix, can't repro, duplicate, stale Jul 13, 2023
@github-project-automation github-project-automation bot moved this from Backlog to Done in Mobile & Cross Platform SDK Jul 13, 2023
@armcknight
Copy link
Member

In the case you describe where we cause an app launch crash and say we should let it happen so customers are aware, I disagree. That is the worst possible UX issue we could cause for end users. They still need the app to work, regardless of whether Sentry is working correctly. What you're proposing is that Sentry is more important than the app, I disagree.

We could implement something like a heartbeat by which we present a notification in the frontend like "hey, we stopped receiving events from app version X at datetime Y, something may be wrong with your Sentry installation in production, please investigate and push out an update"

@bruno-garcia
Copy link
Member Author

Agree with Andrew here, we absolutely shouldn't crash the app. If we do, it shouldn't be more than once (if we can do that). Make sure that for that version of the app/Sentry we skip init again.

@philipphofmann
Copy link
Member

In the case you describe where we cause an app launch crash and say we should let it happen so customers are aware, I disagree.

I might have been wrong here. I don't think Sentry is more important than the app. One of the worst things that can happen is that customers think everything is fine although the house is on fire. The worst thing for us is that we repeatedly crash an app. We could come up with a strategy similar to what you described in your blog post to avoid repeated crashes, but we must ensure that the strategy notifies us and tells us that something is broken. We could send a special type of event to Sentry only once after we detected that our SDK repeatedly crashed to avoid using up customer quota. With the SDK crash detection, it would be easy for us to set an alarm.

What made you revisit the decision here, @armcknight? Would you like to reopen the issue?

@armcknight
Copy link
Member

We could try to send a special type of event to notify ourselves of the situation, but barring that, we could fall back on the heartbeat type of strategy as well.

I think I saw this in an old github notification yesterday I'd never followed up on 😄 We could reopen it and backlog it, like I said I'm not sure about prioritization.

@philipphofmann philipphofmann moved this from Done to Needs Discussion in Mobile & Cross Platform SDK Nov 29, 2023
@philipphofmann
Copy link
Member

We'll discuss it in our next sync.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Projects
Status: Needs Discussion
Development

No branches or pull requests

5 participants