FileTree CRDT for SAFE Network

happybeing · July 21, 2020, 11:35am

Thanks @danda. The high level goals are very helpful - I’ve added them to the OP (this link goes directly to the heading: High Level Goals) so we can maintain and update them there, maybe add more detail as things clarify.

Wrt local-first, I think I now have an understanding of how Sequential CRDTs work (but maybe @bochaco can confirm). I summarised it in this post, but the part which I think differentiates from local-first is as follows:

local-first: ala Martin Kleppmann (link) entails each author/client having their own complete copy of the CRDT (content and CRDT metadata). Updates are made locally, but periodically, the changes (metadata) are shared with others who also have their own complete copy, possibly including different changes. When a client receives changes, it can merge them into its own copy and any two copies will converge on the same state when they have incorporated the same changes.
Sequence CRDT: is implemented so that only vaults hold a reference copy of the Sequence (content and CRDT metadata). When a client/author inserts/deletes it must first obtain a local copy (which is cached in memory to reduce fetches), and then applies the mutation locally. As soon as possible it also sends the mutation (CRDT operation) to all vaults which are looking after the reference copy, and as all the vaults merge the changes, each copy will reach the same ultimate state.

I think there are merits in both approaches, particularly for a filesystem. We’ll need to think about the trade-offs which would vary depending on application context.

Separately, I’ve been wondering about using a single FileTree CRDT analogous to a FilesContainer, but implemented as I’ve described for the Sequence CRDT. Also whether it is feasible to support nested FileTree CRDTs (I think it is) and whether than helps. I suspect that is a thing for later, but worth thinking about now even if the intention is to start simple (which I think would be wise!).

I think it may be too late, but at least we suffer the same madness

Excellent news. Well done man. Phew

I haven’t come across path based FUSE so will read your links. Thanks. My instinct is to stick with nodes unless we have good reason but I’ll read up and we can butt heads as necessary.

The following look sensible. Let’s return to them once I’ve read and thought a bit more.

danda:
Sounds great! As you’ll see from the brainstorming doc, I have kind of a high level design in mind, but haven’t come up with finer details such as an API specification, though I think we can kind of use fuse API and C lib filesystem calls as a starting point for the needed functions. For the time being, I’m just trying to get the crdt-tree working, so we can start to get some hands on experience with that and see where it leads us. If you’d like to stub out a fuse impl, that could be a starting point on the other end. We could probably go with path based (fuse-mt) for now.

It would also be helpful to get some benchmark numbers against the current filesystem API using the mock network, so we can later show improvement (or not). This is a bit tricky though, because it serializes to/from the fake_vault_data.json file, and very very quickly the serialization dominates the CPU as that json file rapidly grows. That issue would need to be dealt with somehow, so maybe more trouble than its worth for now.