Over a year ago there was an idea to use only symmetric keys for NFS encryption. Asymmetric public and private keys are redundant in the current implementation, as files and directories are encrypted only to be used by their owners.
In addition to that, symmetric keys allow more granular access to directories: if a user wants to share read access, they only need to give someone a unique symmetric key for that directory. To give a write access a user just adds a public sign key of another person to the list of directory owners.
We’ve decided to implement these changes this week and it looks like they will affect the file system design in a bigger way. We want to discuss these changes with the community.
Current state of NFS
The current design of NFS allows users to have versioned directories and files. These are used in the case if a user wants to restore deleted files or roll back changes. However, if we allow shared access to directories and use unique symmetric keys per directory for encryption, this feature starts to get complicated.
First of all, each time we modify a file or a directory the parent directory metadata has to be updated too. NFS directories store the following info in the metadata structure:
Now, consider this hierarchy:
root/
|- dir-a
| `- dir-b
`- file-0
Here are some of the problems and questions we’ve come across:
-
If
root
is versioned and laterdir-b
was removed fromdir-a
would we want this to reflect onroot
as well? -
If we restore something to version
v
and start making changes there, should it branch off at that point?
If we change the name of dir-b
, it would be updated and its metadata would be updated in dir-a
. This will create a new version of dir-a
, v0 -> v1
. Now files were added and other operations were done on dir-b
which does not affect dir-a
. Now a user choses to restore to version v0
for dir-a. It will show the metadata of dir-b as it was when dir-a was at v0
. However, if we fetched dir-b
using metadata in dir-a
, we would get a latest dir-b
. How should we design that dir-b
corresponding to that time when dir-a
was at v0
is fetched and how would this work recursively if dir-b
had children too?
Proposed solutions
-
One of the options is to get rid of versioned directories, allowing only files to be versioned. This approach solves many of the raised problems while also allowing to restore files one-by-one.
The proposed structure can be found in this Gist: Versioned and unversioned directories · GitHub (you can see the context and discussion there, but we’re aiming for the topmost update-1)
Advantages:
This approach simplifies things a lot conceptually. For e.g. if versioning a directory is seen as restoration of entire tree structure then it is better handed over to operations similar to restore-point creation (snapshot of the entire tree) instead of using versioning in directories to keep track of that. It is also very wasteful in the sense that if a dir has 100 files then a new version of the dir will be created for any file being modified (even if rest 99 haven’t changed).Disadvantages:
Versioned directories might still be useful for some users and developers. Something to consider is if this is not a wide use case then we already provide tools (low-level-apis) for those who want to code a custom made versioning system - so might come into the app-devs realm to do anything more complex. -
Another option is to use a flat hierarchy to store files, akin to cloud object storages like AWS S3, etc. Using this scheme we have
Bucket
s instead ofDir
s andObject
s instead ofFile
s. Buckets don’t have any pointers to subdirectories or parents and objects don’t have modification time in their metadata.Proposed design can be found here:
buckets.md · GitHubAdvantages:
It simplifies the NFS architecture a lot and makes it simpler to understand.
We can retain versioning of directories/buckets in some form.
And the file system can be still organised as a hierarchical tree. E.g. consider Amazon’s approach, where the file tree is derived from the object names (e.g. an object named “a/b/file” can be represented as a tree “dir a => dir b => file”).Disadvantages:
A large number of objects in a bucket might have a bigger performance cost than the hierarchical FS. -
Third option is to redesign the file system to make it similar to Git or some other disitributed version control system. Instead of applying versioning on a per-file or per-directory basis, we can store versions for the entire file system tree so that every change would be reflected only on the root version. Shared directories can be implemented as separate trees inside a root file system (i.e. similar to git submodules).
Advantages:
This approach resolves confusion about versioning of directories and their metadata. Users can snapshot and then roll back a state of the entire file system.Disadvantages:
It’s too complex to implement in a reasonable time, so we’re not really considering this option now. Another disadvantage would be performance cost given our view of network. Depending on how it is implemented (pointers or delta storage) it might fall flat due to performance hits. Further we can stick to the principle of providing the tools (Low-level-api) so that if someone wants to build such a structure they can do it using them instead of MaidSafe investing effort into it. -
Possibly some other approach can be used here. If you have ideas, we’d love to hear them!
We need help from the community to answer the following questions:
-
What NFS features are important to you?
-
In your opinion, what parts of the NFS API should we prioritise?
-
What option would you prefer?
Shared write access
There’s another open question about shared write access: currently vaults check that there’s a consensus on changes, and for that data has to be signed with at least a half of the owner keys. Which means that e.g. in order to modify shared data you have to ask other owners to sign StructuredData
that you constructed, or your changes will be rejected by vaults.
This is something that must be changed at the vaults end somehow - either invent a type-tag to do something different or change the way these things work by having some kind of a weight
field in StructuredData
that gives additional information on how much is each person’s weight in the modification is. In this case the weight would be 100% for each owner so that when anyone adds their signature it is enough for the change to be accepted by vaults.