Does read(offset, end) need to fetch the entire file?

I have a general question on the usage of ImmutableData read(), or that matter any read from the network… The numerous read methods in the node api Home - Documentation allow you to set an ‘offset’ and an ‘end’ to how much data you want to read.

So for example… If I have used ImmutableData.write() to write a large file to the network (gigabytes) then on another computer (that hasn’t cached or touched that data before) try to read bytes in the ‘middle’ of the file, what happens? Does the entire file need to be downloaded/fetched to read those middle bytes? Or are things magical enough that it can only fetch the chunks necessary to perform that read operation? I am assuming that things work similarly for NFS emulation etc. The way the datamap works with self encryption leads me to believe that it is just that magical… Would like a concrete answer though from someone that knows what they are talking about. Thanks! :grinning:

5 Likes

Hi, sorry for a delayed response,

Yes, you are correct with the assumption that it is magical :slight_smile:

NFS is a high-level implementation in SAFE Client Libs and it doesn’t care about the underlying chunks: it just uses the API provided by the self_encryption library.

self_encryption however does have a notion of chunks and it works transparently for a user. All chunks are mapped in a data map, and from the pre-encryption chunk sizes (which are stored along with the data map) we can derive which chunks we need to retrieve from the network to read the data with requested offset & length.

The actual calculation is performed by prepare_window_for_reading, which also abstracts away the underlying storage implementation (that’s why it’s a generic function) - so, technically, self_encryption does not know anything about SAFE Network: the chunks themselves can be stored anywhere. The back-end storage implementation can be found in SAFE Client Libs.

Hope that answers your question, but please let me know if I can help with understanding it better.

10 Likes

No worries!

Wow! That was the little piece of information I was needing, I didn’t realise the DataMap contains the sizes too… Genius!

The software engineer in me really likes this approach… Can think of many more scenarios where self_encryption could be useful!

Yeah that was a fabulous explanation, thank you for clearing that up for me. :grinning:

3 Likes

Indeed! It was definitely created with this use case in mind.
And as a side effect, this approach also helps with unit tests a lot.

2 Likes