-
Notifications
You must be signed in to change notification settings - Fork 4.3k
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Congestion control for staked/unstaked nodes #29754
Comments
This should have quite an impact in production, mainly because most RPC providers purely run unstaked RPC nodes. There's only two i'm aware of that re-submit txs from their staked validators (jito & triton). |
@mschneider @KirillLykov we need some way to prevent the bot armies from spamming the leader. so unstaked nodes need to be capped, anyone that needs more bandwidth should be using staked nodes to forward and we should optimize that interface instead. If the unstaked nodes can saturate the line and prevent staked messages from getting through it's a DOS issue. |
I feel like there should be some unstaked node whitelisting option? |
In Marinade we built the first version of mTransaction. Users can send transactions to our service. These transactions are streamed over gRPC to validators who have previously connected to our service. This effectively backs up transactions with the stake of the validators connected to our system. We are now in beta testing of this and we got ~ 11 M SOL stake connected and a few users (e.g. Triton who helped us design and build this) |
I also want to add that Marinade would love for mTransaction to solve this issue for the network as a whole, and not be a private product but an open solution more closely integrated into the core and easy to use for anyone that needs it. We'd love to discuss some models and the evolution of the product to make it reach this stage. I see Anatoly mentioning an "interface" being needed to access staked nodes and I truly believe mTransaction could end up being it. |
For unstaked, there are several implications of the slow connection:
My understanding is: For staked, after some discussions I have personally concerns (without having any proofs atm) that the implemented approach (implementing stake-based congestion control on top of quinn CC algorithm by varying the receive window size from 2 to 10 and limiting number of streams per connection) is optimal solution. Optimal in sense of TPS, yet we might optimize some other metrics or it might be good enough for the moment. |
@KirillLykov can we let validators tag their "trusted" RPCs and inherit the stake? Unstaked RPCs are like the public ports, they need rate limiting because it's free to abuse them. |
To me, sounds reasonable. What do @mschneider, @janlegner, @Cerbah think? Sorry if overcommunicate, just want to be sure that everyone is on the same page |
While it seems absolutely right to split users into different groups for DOS (and just overload) protection, I wonder if receive window size is the best tool for this job. My understanding is that it restricts the amount of data that a client can have “in flight” for a given connection. [1] Quinn documentation even says it about the
The above means that while windows side does affect the amount of data sent (and received) over the network, it may affect users in a way that I would not call fair. Due to the fact that the round trip time is part of the equation. Take two clients with the same network speed, and the same window size. If the first has twice the round trip time, it will see half the speed. So, if you are closer to the validator, you get better service. DOS and overload protection is a complex problem. And I have limited experience, but my understanding is that good solutions involve several components. First, one needs to have a QoS criterion. It could be a number of transactions processed and/or some parameter of a transaction processing time, such as the 95% quantile. Or something like that. Could be as simple as the byte count processed for a given user. Next, users are split into categories – this is already done. But adding validated users seems like a good idea. Then, a mechanism is used to prioritize user requests based on the QoS. But, for TCP, it is also fine, and is actually much less disruptive, to just not read the received data. I'm still reading the QUIC RFCs. But it does seem to be using similar congestion control mechanisms to TCP. A validator would still need some kind of QoS metric to understand when should it start or stop reading the data. Implementation wise, there are some pretty nice algorithms, for example, using credit systems. That are both efficient and are well researched and tested. Making it possible to push back proportionally to the usage. This system would still be relatively simplistic as far as DOS systems go. In particular, it requires an explicit maximum limit to be specified. Also, if a validator load is not linear enough relative to the network load, it might not be able to protect users well enough. |
@ilya-bobyr thanks for the input. |
@ilya-bobyr great input!
Agree with receive window discriminating by round-trip, this will effectively create a stronger validator concentration:
I would favor KISS, byte / packet count is probably good enough, given that we have a real fee charging execution environment for the actual TXs. Deep introspection into the program execution stack can be very difficult to replicate upstream when implementing a secondary QoS level in a separate environment. Examples would be proxy services like m-transaction or special hardware, like a sig-verify ASIC.
This seems to be preferable over the current receive window solution. I'm also not sure that receive window "slow-down" can't be gamed by hacking on the protocol level and simply try to keep more packets in-flight on a higher latency connection.
I would favor a solution that does not bundle different users into a handful of categories but actually tracks the granularity of stake owners, possible extended to validated (by stake owners) users. It makes the system resilient to scenarios where an individual (stake whale) can try to overload the QoS system in exactly the time when a lot of profit can be made. We have seen this behavior multiple times on every resource that can be hogged and we should simply design with this in mind. Out of a bandwidth consumer perspective having a clear relationship between stake and bandwidth would be ideal:
|
Thank you!
I meant categories at the top level of client separation. There have to be some a decision on how the bandwidth should to be split between unstaked and staked clients. Within a given category, such as, between all staked clients, it is possible to have uneven weights, based on the stake amount. At the same time, all unstaked clients are probably treated equally. I also wonder if there should be 3 top level categories: validators, non-validator clients with stake allocated to a predetermined program, non-validator clients with no stake.
It would indeed be great to have this kind of properties. In any case, if bandwidth is a good QoS metric, I think, this kind of system could improve client experience. |
I would assume that in general delegation should only allow to slice one's own piece, not to get more pieces. So there's only staked (including delegated) bandwidth and unstaked bandwidth. I personally don't even think that we need to include any unstaked bandwidth, the network seems to be decentralized enough that most unstaked participants could with very short notice switch to staked relays. But if that is not possible, agreed that having a top-level split is a good solution. Bandwidth allocation could be updated at epoch boundary (or on any other schedule longer than a few blocks), without restricting users too much. Most bandwidth consumers should have a way to respond to account changes within a few seconds. Over-subscription in low-load periods and back-pressure to individually guaranteed minimum would be ideal. Bandwidth in terms of number of transactions or bytes both work for me, slight preference for number of transactions / block. Seems like the most likely number of accounting that all participants would agree on. |
I think this issue still exists. |
Problem
If client sends txs to validator and network latency is large, tps is very low (see table below).
Contrary, if we place client/server in one datacenter TPS will be 100k higher.
Problem setup
Single node private network:
Looks like it is due to QUIC receive window size set here.
As pointed out by @lijunwangs, it is a adjusted by constant
QUIC_UNSTAKED_RECEIVE_WINDOW_RATIO
see quic.rs:25 which is by default1
.Setting this parameter to
10
(max value for staked) improves situation but cannot beat UDP results.QUIC_UNSTAKED_RECEIVE_WINDOW_RATIO
see quic.rs:25For the sake of experiment, I tried also
QUIC_MAX_UNSTAKED_CONCURRENT_STREAMS = 1024
(default 128) together withQUIC_UNSTAKED_RECEIVE_WINDOW_RATIO = 10
: TPS = 1478.Consequence
If client application uses
TPUClient
to send txs, the variance of TPS is high.It happens because
TPUClient
will try sending to the next leaders and TPS will be strongly correlated with the network latency.Contrary, if we pick
ThinClient
and send always to the closest validator, TPS will be higher because validator will forward txs to the leader and it will use stacked quic configuration.Current implementation summary
Unstaked nodes: the window size is set to minimum (1*MTU) which leads to low PPS on networks with packets drops.
Staked nodes: the window size depends on the stake only and is varied in the range
2..10
.Proposed solution
Atm, discuss a congestion control model which maximizes utilization of network taking into account validator load.
Preparing a setup for reproducible testing is important sub task.
Discussed a bit with @lijunwangs and @ilya-bobyr
The text was updated successfully, but these errors were encountered: