-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
solana-validator crash due to thread 'solRpcNotifyXX' has overflowed its stack #34303
Labels
community
Community contribution
Comments
Gonna enable |
I enabled this RUST_LOG=debug
RUST_BACKTRACE=full but the crash does not tell me much, unfortunately
|
another crash
|
actually. This two lines are the same before crash
|
One more crash, but now - without
|
This was referenced Dec 7, 2023
Sign up for free
to join this conversation on GitHub.
Already have an account?
Sign in to comment
Problem
Hello team,
We are running into an issue where solana-validator crashes multiple times per day.
We did update to v1.16.18/20 and this problem still occurs, first seen by us on v1.14.X
The error is always the same, the only difference is the number of threads
solRpcNotifyXX
Most of the time, the validator crashes, but occasionally, only websocket thread dies.
The validator config is just a vanilla non-voting rpc node, that serves both HTTP and WS traffic:
exec /home/solana/.local/share/solana/install/active_release/bin/solana-validator \ --identity /home/solana/identity.json \ --entrypoint entrypoint.mainnet-beta.solana.com:8001 \ --entrypoint entrypoint2.mainnet-beta.solana.com:8001 \ --entrypoint entrypoint3.mainnet-beta.solana.com:8001 \ --entrypoint entrypoint4.mainnet-beta.solana.com:8001 \ --entrypoint entrypoint5.mainnet-beta.solana.com:8001 \ --ledger /home/solana/ledger \ --accounts /mnt/accounts/accountsdb \ --snapshots /mnt/snapshots \ --log /home/solana/log/solana-validator.log \ --gossip-port 8001 \ --rpc-port 8899 \ --rpc-bind-address 0.0.0.0 \ --dynamic-port-range 8002-8102 \ --wal-recovery-mode skip_any_corrupted_record \ --private-rpc \ --no-port-check \ --enable-extended-tx-metadata-storage \ --enable-rpc-transaction-history \ --rpc-pubsub-enable-block-subscription \ --known-validator 7Np41oeYqPefeNQEHSv1UDhYrehxin3NStELsSKCT4K2 \ --known-validator GdnSyH3YtwcxFvQrVVJMm1JhTS4QVX7MFsX56uJLUfiZ \ --known-validator DE1bawNcRJB9rVm3buyMVfr8mBEoyyu73NBovf2oXJsJ \ --known-validator CakcnaRDHka2gXyfbEd2d3xsvkJkqsLw2akB3zsN1D2S \ --limit-ledger-size 1009504738 \ --no-voting \ --only-known-rpc \ --halt-on-known-validators-accounts-hash-mismatch \ --account-index program-id spl-token-owner spl-token-mint \ --full-rpc-api
The number of WS connections is below 1000 on average.
The server hardware is pretty fat with nvmes, 1Tb RAM and AMD EPYC 7443P 24-Core Processor.
This happens pretty randomly.
I was trying to reproduce it, but I cannot. Run benchmarks with 10000 different subscriptions.
But no success in reproducing.
I will appreciate for any advice and I will be happy to provide more logs, if needed.
The text was updated successfully, but these errors were encountered: