On my list of things to look at is implementing a jail / removing badly behaving nodes. Wanted to run it passed everyone and see what you all think.
Current problem:
• We have had a number of validators go down in the passed and sometimes it takes them weeks (or months) to fix their node issues. This results in sporadic block time and in turn lower throughput.
• Based on passed votes we have a very small voter turn out usually <25% which isn’t ideal and very decentralized.
How do we fix this:
Need a mechanism to remove badly behaving nodes from the pool and give them time to fix issues before we allow them back.
Proposed solution:
Intend to allow two mechanisms voting and auto removal.
Auto removal - This may come after voting but they share a little logic. The consensus will be changed to count the number of blocks mined by each node per cycle on the end of the cycle it will compare the number of validated to the expected number (floor(cycle duration/numvals)) if the number validated falls below X% of the expected the validator will be moved from the pending list into the new “jailed list”.
Voting- Existing validators will be able to propose a vote to push bad actors to the jail list. We need to make sure this can’t be abused so some validator KPIs need to be considered before successfully opening a vote such as uptime/ voting turnout.
Striking system- I propose to make this a strike system
1st strike - 1 cycle min jail time
2nd strike - 2 cycle min jail time
3rd strike - 3 cycle min jail time
Whether we want to go all the way to a permeant ban is up to debate
Released from jail- Jailed validators will remain in jail until they signify they are ready to be released this will result in them being released on the next cycle or once their jail time is up (which ever comes last)
Node maintenance- If an operator foresees down time (server migration for example) they will be able to signal to be moved into the jail list from the start of the next cycle and will need to signal when they are ready to start again.
Agree with auto jailing if a node falls below a certain % uptime. This could be worked out from 30mins per cycle? THat should give enough time for anyone to reboot/do maintenance without being jailed.
Not sure about voting validators out. I think the code should handle this, not individuals. The possibility for collusion is there - ie jailing validators you don’t like. I’m also not sure if there are legal consequenses for individuals stopping others from validating and therefore earning an income. Could people get sued?
If the validator is a bad actor, then the code/infrastructure should be 1) secure enough to stop their activities 2) monitoring pre-defined KPIs to auto jail them.
Should release from jail be automatic when their node is running correctly (meeting the minimum KPIs)
Maintenance - enough time should be included in the % allowance in 1, otherwise they could manually enter the jail, but why would they if it is automatic anyway (when their % drops)?
Thinking about the severity of jailing, this is about removing node running privileges, not stopping someone earning rewards.
With jailing, they will lose delegators and voting rights, but they could restake the majority of their fuse to another node and still earn rewards (although -15% fee).
So it’s just about removing their ability to run a node, which is correct if they are not able to run a node to meet the KPIs.
Not sure about voting validators out. I think the code should handle this, not individuals. The possibility for collusion is there - ie jailing validators you don’t like. I’m also not sure if there are legal consequenses for individuals stopping others from validating and therefore earning an income. Could people get sued?
• The idea with voting is that a vote can only be opened against a validator if there node is a bad actor (not meeting uptime or consistently not voting on proposals). So it won’t be possible to vote out a “good” node.
Should release from jail be automatic when their node is running correctly (meeting the minimum KPIs)
• This is near impossible to implement, when they are not active validators the consensus has no insight into the node (i.e. has no knowledge when it is up again).
Maintenance - enough time should be included in the % allowance in 1, otherwise they could manually enter the jail, but why would they if it is automatic anyway (when their % drops)?
• The benefit of actively flagging maintenance is to avoid a strike and to avoid their uptime being affected
Thinking about the severity of jailing, this is about removing node running privileges, not stopping someone earning rewards.
With jailing, they will lose delegators and voting rights, but they could restake the majority of their fuse to another node and still earn rewards (although -15% fee).
So it’s just about removing their ability to run a node, which is correct if they are not able to run a node to meet the KPIs.
Correct it is more of a way to stabilize the network rather than punishment. We could easily add a lock on a node pulling stake if they are jailed (but thinking about it you could get round this by just delegating rather than self staking).
I’m not sure why we would have to vote them out when the code should do it automatically - check uptime and voting participation? My preference is that validators vote for network parameter changes, not be responsible for policing validator behaviour. I think it can open up a load of problems. It’s much better that this is all handled in the code.
I’m not sure the technical method monitoring and jailing will be implemented, but was thinking that there must be a way to monitor all nodes (as health.fuse.io does) and then draw the jailing list from that. Anyone not meeting KPIs for that cycle is put on the jail list. When nodes are active again, their KPIs from the last full cycle determine whether they are on, or out of, the jail list.
The list is dynamic, and based on the nodes performance last cycle.
Fair enough. Strike system could also look for KPIs over the same length of time as the jail time - i.e. 2nd strike - needs 2 cycles good performance before they are removed from jail list.
Locking a node would just mean they remove everything from it and start again - also quicker than possibly waiting for 3 cycles etc. So not really worth doing. Also, not sure how legal it is to ‘retain’ someone’s funds…
Great suggestion Andy! that proposal is very important and i think breaking it down to smaller chunks and voting on it makes sense.
How do i need to misbehave to get the 1st strike?
did anyone do some research how other POS are handling this?
I agree that we should avoid voting. though its not possible to completely avoid this.
2.1 regarding uptime: we could have in the validator code, a ping method, that checks if other validators are online, and each validator puts that info a smart contract. so it is almost like voting, but would require long term colusion (ie faking ping results for a long), so effected validator might have some time to response, detect colusion.
2.2 regarding proposal participation, if it is on chain voting then it could be measured on-chain. otherwise, why is voting participation so important? i mean its important but i dont think you should be punished for not voting, what is the threshold for vote being passed?
what about misbehaving nodes/collusion for double spending/rewriting history, @marksmargon can you shed some light on how the current consensus works? would honest nodes detect that?
The consensus has been modified to count blocks per cycle Validated by each active Validator. On cycle end the consensus checks if a Validator has validated at least 70% of his/hers expected number of blocks if not they are taken off the pending Validators list moved to the jailed list and a strike is added. The jail time is set to (endOftheNewCycle + (numberOfBlocksPerCycle * strike count))
Once jailed a jailed validator can call the new unJail method which checks if the unJailing block is less than the end of the current cycle if so it will pull them out of the jailed list and move them back to pending (assuming there stake is still >100k)
Have also added a new call maintenance () this moves a Validator from pending to jailed list for one cycle but without incrementing the strike counter. This allows Validators to perform maintenance without getting strikes and without impacting the network