Work allocation between client and nodes

From Pluggable consensus and rollbacks - #2 by dirvine - Features - Autonomi Forum (Safe)

we have a consensus where the work is done by the proposer (it means spammers etc. have a hard time and the network always has an easy time).

Is this currently true? I believe it is but want to look at it both in some detail and later in broad strokes.

It seems to me this would require a call to threshold_crypto combine_signatures by the client, since elders would be meant to each do 1 signature each and the client meant to do the hard work of combining N signatures.

To confirm if this is the case, I see in the frontend:
No instances of combine_signatures in sn_api
No instances in sn_client
No instances in sn_messaging
No instances in sn_data_types
It is in sn_transfers, inputs are credit_sig_shares and debit_sig_shares so it seems only payment-related, and it bubbles up to the surface as sn_api ā† sn_client ā† sn_transfers

And I see in the backend:
combine_signatures does appear in sn_node but only as part of network genesis (which seems correct, since nodes overall should not do the work of combining, clients should, and genesis is one of the few node operations that does not involve clients).
It also appears in sn_routing in create_first_proof

So for the zoomed in details, does the current workload implement a suitable distribution of work between nodes vs clients?

And for the broader zoomed out point of this topic, what should we be aiming for with the distribution of work? How can we ensure that clients must do more work than nodes?

Iā€™m just looking to start a broad conversation / brainstorming here, not trying to poke holes or try to find bugs or that sort of thing. Hopefully itā€™s as simple as ā€˜yes it works correctly alreadyā€™ but maybe thereā€™s more to it, Iā€™m not sure yet.

5 Likes

Yep, transfers is where itā€™s at just now. It works well there. The actor (be it client or node) has to fire off requests and get them back to be able to proceed.

Down the lin weā€™d apply this logic to data that needed consensus or order too.

The main idea weā€™re rolling with is ā€œwhoever wants to do thing should drive the processā€. Here adults are just signing that theyā€™ve seen a thing and itā€™s valid. That can be short circuited for anythign out of order, so we should avoid doing too much validation in that manner.

4 Likes

Donā€™t want to distract anyone, but I think some diagrams of this in a short series of blog posts would be a great resource. Or could be part of the primer, but this might get a bit lost unless we also have it in a way that can be linked.

I donā€™t know if this approach is completely new, but it is groundbreaking and something we can use to show the elegance and efficiency of the main data types. David summarised the two main consensus models, so something as simple as that plus a diagram and short explanation would be great. From there we can link to the primer (when updated) for those wanting to read more deeply.

Canā€™t wait for the testnet and get a feel for this. Thank you to everyone working on this!

3 Likes

Given all data mutations require a transaction to be made, I think this does give broad coverage.

Moreover, standalone transfers can still be no fee, as to spam the network would put more load on the client. Assuming, ofc, that the actions of the client do outweigh those of the network.

1 Like

Looking at the details a bit further, it seems to me that elders do more work than clients, which is the opposite of what we want.

A quick summary of the basic process weā€™re looking at

  • Number of elders is 5.
  • Majority is 3.
  • The client needs 3 signatures from 5 elders to make a payment.
  • The client does 2 requests to their connected section elders (see Reliable Message Delivery (RMD), using at least 1/3 of the elders, ie 2 elders).
  • This is routed to the destination with RMD.
  • The destination elders all verify and sign the payment if valid (ie all 5 elders).
  • The individual elder signatures are routed back to the client.
  • The client combines 3 of the 5 responses.
  • The client sends the combined proof as the final payment.

We can look at the workload on two fronts - the bandwidth and the computation.

Looking at bandwidth, due to RMD it seems that client and nodes each require an equal amount of bandwidth both on sending and receiving. Iā€™m not sure whether or not we should say the network does more work than the client here, since each individual node does an equal amount of bandwidth work, but overall the network as a whole does a lot more work than the client. This is a bit of a semantic argument but weā€™ll need to be clear about it if we want to communicate the security model properly. Also worth noting that routing needs 2 nodes per hop but signing needs 5 nodes so thereā€™s another imbalance there favoring the client.

Looking at computation, a quick test script shows the elders do more work than the client. The test has elders creating their individual signatures and the client combining 3 of the 5. The elders each took ~2.5ms to generate a signature and the client took ~0.9ms to combine 3 of the 5. So creating signatures takes much more work than combining signatures, which is a little counter-intuitive. Itā€™s true that combining many signatures takes longer than creating them (in my test combining 6 or more signatures takes longer than creating a single signature), but simply increasing the number of signature shares is not a good way to approach the security model.

Verification of signatures has the same characteristics as bandwidth, since all nodes do the same amount of verification. Maybe thereā€™s even room to drop some verification steps, eg if the client tries to combine signatures without verifying each signature beforehand, the combine operation will fail anyway, acting as the verification for the client. The necessity (or not) of the signature verification step is actually pretty interesting to look at, but is getting off topic.

Iā€™m trying to make a broader point here that if thereā€™s a ratio of 1:5 client:elders we canā€™t assume it means the client does 5x the work of the elders. We need to measure the actual work done by each. And we should also consider whether we want to be measuring the client:node work ratio or the client:network ratio.

My feeling is we might end up including some spam reduction mechanism for the client like hashcash, except signature-based rather than hash-based. If the client is going to route via X nodes and Y destination elders, impacting a total of (X+Y) nodes, the client should provide a signature showing they did more work than the X+Y nodes end up doing (similar to how hashcash does). Maybe the client:network work ratio is actually not the security parameter we need to focus on, maybe itā€™s something else like current load? I wonā€™t go too far into details here, but it looks to me like the current spam prevention is not going to work as intended.

Maybe Iā€™ve got some details of the process wrong, but Iā€™m confident the combine signatures process is much faster than the create signature process, which we probably need to account for in our analysis of the node vs client workload.

5 Likes

Iā€™m not sure thatā€™s a given. I think atm weā€™re looking at mutations (up to a size limit) being priced into initial PUT, so updates do not need transfers.

That seems fair aye :+1:

No, I think youā€™re right here.

1 Like

Oh? That sounds like a fairly fundamental. It would be interesting to hear the rational on that. Is this related to pushing more of the required effort onto the client to prevent spamming?

1 Like

I think itā€™s something weā€™ll be looking at ongoing, so not something definite per se. Atm just to keep things simpler. cc @dirvine who can perhaps elaborate a bit more here.

3 Likes

I had a bit of a look into threshold_crypto signatures today because I noticed they do not span the full 768 bit range from 0-2768. Signatures always seemed to have leading character between 8-b, never 0-7 or c-f.

This would affect any sort of hashcash type threshold, so I looked at what the min/max values for a signature are. Thereā€™s certainly a mathematical answer but for me to dive that deep seemed overkill when we can just do lots of signatures and see what range we get, to approximate the result.

use rand;
use threshold_crypto::SecretKey;

fn main() {
    let mut minsig = [255_u8; 96];
    let mut maxsig = [0_u8; 96];
    for i in 0..u64::MAX {
        let sk = SecretKey::random();
        let msg: Vec<u8> = (0..32).map(|_| { rand::random::<u8>() }).collect();
        let sig = sk.sign(msg);
        let sigbytes = sig.to_bytes();
        if sigbytes < minsig {
            minsig = sigbytes;
        }
        if sigbytes > maxsig {
            maxsig = sigbytes;
        }
        if sigbytes <= minsig || sigbytes >= maxsig{
            println!("iteration {}", i);
            println!("{}", hex::encode(minsig));
            println!("{}", hex::encode(maxsig));
        }
    }
}

gives output after running for about 6 hours 3 days

iteration 0
a05f5ec7a102842bf2d9309a035b70...
a05f5ec7a102842bf2d9309a035b70...
iteration 1
85f12cd3e2f1f393dbe40edfea1465...
a05f5ec7a102842bf2d9309a035b70...
iteration 2
85f12cd3e2f1f393dbe40edfea1465...
b2db0964768bf7a6326b676df2ed9d...
iteration 19
85f12cd3e2f1f393dbe40edfea1465...
b9998d72515b406eb6f98a81730392...
iteration 23
83e37db905f93dfd52adbf438fd84d...
b9998d72515b406eb6f98a81730392...
iteration 60
82a9f81a487711dc8e724429372de7...
b9998d72515b406eb6f98a81730392...
iteration 74
82a9f81a487711dc8e724429372de7...
b99cde2d4c9a52b0e2789f162bc4b8...
iteration 75
808a94ee740ba9873b98b8be67ae0e...
b99cde2d4c9a52b0e2789f162bc4b8...
iteration 150
80494b57fe8daddb778abca430419e...
b99cde2d4c9a52b0e2789f162bc4b8...
iteration 220
80494b57fe8daddb778abca430419e...
b9f993f77450605378724e65ce0929...
iteration 571
8042255361a7f757c0b0b394a58e2b...
b9f993f77450605378724e65ce0929...
iteration 638
8026aa76e633ea7e12d7b5e6bb8c84...
b9f993f77450605378724e65ce0929...
iteration 1341
8026aa76e633ea7e12d7b5e6bb8c84...
b9fde1a24a6d2014051b815ac99e01...
iteration 1673
80090d39409278fa06ff41360bd47b...
b9fde1a24a6d2014051b815ac99e01...
iteration 2409
8003e6a362d19c490563b1072b6c02...
b9fde1a24a6d2014051b815ac99e01...
iteration 4860
8001e72e6ea2df28371a3a000db987...
b9fde1a24a6d2014051b815ac99e01...
iteration 5593
8001e72e6ea2df28371a3a000db987...
b9ff5091b69d56512d04ec86844b71...
iteration 12584
8001e72e6ea2df28371a3a000db987...
b9ff8e6d4a7be510e71dcafa01ac81...
iteration 14197
8001e72e6ea2df28371a3a000db987...
b9fff0412ac94734921ddf0413c46f...
iteration 14587
8000f5cae0fe38efd232e40538ac8e...
b9fff0412ac94734921ddf0413c46f...
iteration 21214
8000f5cae0fe38efd232e40538ac8e...
ba004fa22774b25c2e5bc41ba8e6e4...
iteration 31543
8000f5cae0fe38efd232e40538ac8e...
ba005381c890aa750148d883c1fd3b...
iteration 34612
8000e3081757053f0988033adc1b05...
ba005381c890aa750148d883c1fd3b...
iteration 39031
80007e98fd5f0b919cf4b8a360f5cf...
ba005381c890aa750148d883c1fd3b...
iteration 51564
8000023ca3267a08c15112ad5e070d...
ba005381c890aa750148d883c1fd3b...
iteration 63120
8000023ca3267a08c15112ad5e070d...
ba006acae66f4e04c8a1e722ea0e2d...
iteration 99100
8000023ca3267a08c15112ad5e070d...
ba00fc64486e7fddbeeb49b2ba2ee0...
iteration 514734
8000023ca3267a08c15112ad5e070d...
ba010122628878a312894348ab6eec...
iteration 735125
8000023ca3267a08c15112ad5e070d...
ba0102a6a68fdffc2401c959661bb2...
iteration 888108
8000023ca3267a08c15112ad5e070d...
ba01037c216f6c0f4b078891a045be...
iteration 914378
8000023ca3267a08c15112ad5e070d...
ba010b861c6eaff200d567d43acc80...
iteration 922239
8000023ca3267a08c15112ad5e070d...
ba010d42b5d79ec70c05f2a57acb95...
iteration 1378538
80000231218d9753913787831a8fb1...
ba010d42b5d79ec70c05f2a57acb95...
iteration 1395686
80000231218d9753913787831a8fb1...
ba010eada2448c41dfd34408f7a714...
iteration 1519745
80000231218d9753913787831a8fb1...
ba0111cb0d92ae0f7d394832647981...
iteration 3016717
8000007ce08858d0ecb16139fb38b5...
ba0111cb0d92ae0f7d394832647981...
<edit: appending results from a couple more days of running>
iteration 9301317
80000054134a327c43c45d627c9a1c9...
ba0111cb0d92ae0f7d394832647981d...
iteration 12380545                       ...
80000035d50ba2435b64bd794a5f84a...
ba0111cb0d92ae0f7d394832647981d...
iteration 15753594                       ...
80000024ca039c3c828b66337e4b32f...
ba0111cb0d92ae0f7d394832647981d...
iteration 22404643                       ...
80000024ca039c3c828b66337e4b32f...
ba0111e9aec99f6622f341c0e7884cd...
iteration 31771713                       ...
8000001f723cf8b8c52312249d2d1f0...
ba0111e9aec99f6622f341c0e7884cd...
iteration 54071723                       ...
80000005c7542b99c528ad5763e180a...
ba0111e9aec99f6622f341c0e7884cd...

So it looks like the smallest possible value for threshold_cignature signature is probably 80000... and the maximum is something slightly larger than ba0111cb0...

Not sure if this ever going to be useful information but may as well put it somewhere public.

4 Likes