[RFC Discussion]: XOR-URLs

bochaco · May 6, 2019, 4:41pm

It’s in the RFC as well: https://github.com/maidsafe/rfcs/blob/master/text/0053-xor-urls/0053-xor-urls.md#xor-urls-specification

It’s just the CID and as you already know we use the multicodec-content-type part for the mime types as suggested

As you can see I worked on a PR against the multicode repo, which wasn’t merged yet, they were suggesting some minors changes that I/we will need to work on to presumably get it there.

Now, the python implementation is perhaps not using the master list of codec as it should, which is the one from the multicodec repo from my understanding: multicodec/table.csv at master · multiformats/multicodec · GitHub and this is why you and me had to patch the table to have them in there untill they are effectively approved and make it part of the master list (our SAFE experimental api uses my patch to the table: GitHub - bochaco/js-multicodec at mime-types-as-codecs, which is used from GitHub - bochaco/js-cid at temp-use-bochaco-multicodec that in turn is the safe_app_nodejs’s dependency).

Therefore the CID implementation (in any lang) shall follow the spec from GitHub - multiformats/cid: Self-describing content-addressed identifiers for distributed systems, the issues you had so far seem to be all due to some tiny difference in the CID implementation and/or encodings used within them. Do I know if the js one is the correct one and the python is wrong, no I don’t know, since the browser and tools all use the same implementation. In any case, if we use CID and multiformats, we should be able to work on PRs to be sent to those implementations, in fact these problems you are seeing could be a good issue to be reported in the python implementation repo

riddim · May 6, 2019, 5:22pm

Ah - sorry for my impatience =D… Thx I’ll dig a bit further why my link doesn’t generate correctly as soon as I have some time again - you don’t happen to be able to generate a working png link to

@bochaco - that would be super nice and might speed up finding my mistake

bochaco · May 6, 2019, 7:54pm

These are the XOR-URLs I’m manually generating with JS:

For 2d50dd18645fe99a98b89271b44d10c612ca2cb168eba51e4ea3ce7909689583:
- image/png: safe://hygkdrftyfiep4gdrm9w3igfa1ja5eueoaajcwmftpdi4k81qwx881nme1sbo
- imag/jpeg: safe://hygjurftyfiep4gdrm9w3igfa1ja5eueoaajcwmftpdi4k81qwx881nme1sbo
For 30a3fcb0130310087d0890da0143a594b40dd96b7536de15892d78ee264a6813:
- image/png: safe://hygkdrftygnt93cyuyceyo9ee1dpyno7f114y5smmqw5phfcjfihqhj1kpyjo
- image/jpeg: safe://hygjurftygnt93cyuyceyo9ee1dpyno7f114y5smmqw5phfcjfihqhj1kpyjo

I cannot fetch the files with any of them, and the safe-URL Analyser is decoding them correctly to the correct xorname and mime type (you can try them). Perhaps because they are private, i.e. encrypted, are they?

I just pushed a commit you can use to generate ImmD XOR-URLs with different mime types, as long as you didn’t encrypt the targeted Immd files (we are missing an API which allow you to generate the XOR-URL of an ImmD without trying to fetch it, as it fails to decrypt it if not owned).

Just clone the repo GitHub - bochaco/safe-tools: Some basic tools for SAFE Network users and developers, then npm i && npm start, then open it with the browser at localhost:1357, you’ll see a new tab for obtaining ImmD XOR-URLs.

riddim · May 6, 2019, 8:30pm

Yapp - it’s encrypted - that’s correct

I’ll test it asap (sorry - not possible before tomorrow) =)

riddim · May 7, 2019, 4:23pm

oh you mean owned by the app and not owned by my account - aren’t you? and the browser doesn’t own my png …?

ps: oh nice - and when i don’t accidentally damage the link before re-encoding it to base32z i get exactly your link! @bochaco

bochaco · May 7, 2019, 6:37pm

No, I just mean that if the ImmD you are trying to fetch is encrypted by another app, the call to app.immutableData.fetch(iDataAddress) fails because it cannot decrypt the ImmutableData, therefore you don’t get the Reader object where you could call getXorUrl(mimeType) on. So I’m thinking we should have an API that you can do app.immutableData.genXorUrl(iDataAddress, mimeType) which doesn’t try to fetch it but just generate the XOR-URL for you.

That’s really good news!! so to clarify, are you saying that there is no issue then between the Python implementation and JS one for CID as we suspected before?

riddim · May 8, 2019, 5:09am

so you mean yes i wasn’t aware that encrypted means only possible to decrypt by the app that created it Oo … how can that be …? how do i get the decryption key? do i always need to re-upload an immutable as unencrypted if i want to share it? Oo

No absolutely not - there is an issue and it’s all down in the ‘standard libs for cid’ - I use the character table I found in the JS implementation of base32z to re-encode base32 to JS-base32z ‘by hand’ - the ‘regular python functions’ to encode to base32z leads to different strings…

(my opinion regarding using base32z or not in your api didn’t change … - I think you should definitely go with the standard base32 because that comes with less trouble in all languages … the skipping of certain ambiguous characters might be something one could consider … but this weird re-ordering to match ‘more frequent characters’ to easier to identify characters doesn’t make sense at all … since we are talking about immutable data identifiers => some hash of encrypted data => (at least pseudo-)random data by design and for mutables only not randomly generated mutables will not be randomly distributed (standard containers and stuff using that as starting point will again be randomly distributed because of the account being a random one…) … while i would prefer even more the use of base16…)

…and that i (again) made a mistake with the base32z-encoding thing so that it took me longer to come up with the correct way of doing it again shows why it would be better to just use the standardized base32 … i lost hours in finding the correct way to re-encode data, i lost hours for reverse-engineering base32z to base32 encoded safe-links and then see where the difference is/how the correct code for the image-link you used is, you lost hours in reading my posts and answering my questions … and in the end you and me both are slower in working for the SAFE Network …

riddim · May 8, 2019, 10:23am

And this absolutely! We’re getting there can’t wait to see some really powerful apps on safe

bochaco · May 8, 2019, 12:33pm

Nice, good finding, I think you should send a PR to the their repo to fix it

Not trying to defend any base encoding, in fact I think it’s not that important for two rasons:

any base encoding will be decodable (that’s the point of the CID spec), so if some people use base32, base64, etc. they will all would work fine and can fetch the same content. I think the one that most people use would become the standard.
remember that both JS and your python implementation won’t be needed (and would go away) if this becomes natively supported by SAFE libs, note the following from our previous discussions:

bochaco · May 8, 2019, 1:09pm

And now, in an attempt to bring back the main discussion, yesterday we had a talk with @joshuef about considering the type tag to be encoded in the CID itself rather than being the port number of the URL. This would not only solve issues we already face with some browsers or libs not supporting port numbers larger than 65525 (even that the spec doesn’t mention such a limit IIRC), but also in the future if we need additional parameters for our data types we can embedd them in the CID string, i.e. the <multihash-content-address> of the CID spec, after all the type tag is part of our addressing system and would still make sense to be part of the content-address part. Just bringing it up for discussion and feedback about this alternative.
The only problem is that strictly speaking that wouldn’t be just a hash but a hash+number, which wouldn’t be a valid CID anymore…?..

happybeing · May 8, 2019, 1:47pm

Could the mutlticodec <key> be used for the tag_type in order to leave the hash as is (see here).

bochaco · May 8, 2019, 1:52pm

This is the CID string:
<cidv1> ::= <multibase-prefix><cid-version><multicodec-content-type><multihash-content-address>

And the <multihash-content-address> is:
<varint hash function code><varint digest size in bytes><hash function output>

So do you mean to use a multicodec in the <hash function output> ? or where exactly?

riddim · May 8, 2019, 2:22pm

valid CID as in ‘a cid is a multiformat-encoded hash’ and we would be be using it as ‘multiformat encoded hash+additional data’ - or did i misunderstand you?

+1

maybe not as intended by the creators but as we don’t only use the hash for data discovery/recovery and it works well with the cid format that makes great sense imho

bochaco · May 8, 2019, 2:26pm

Like the following definition, which includes clear definitions of what each of these parts exactly are and encode:

If we change any of that we cannot say our URl contains a CID

riddim · May 8, 2019, 2:34pm

if we’d hash all the bytes of xor-name and typetag it would be a valid cid - wouldn’t it? what would speak against this? - as it indeed would be simpler than splitting up a link into cid + type-tag … +it would benefit from the properties of a cid …? (what would be the motivation to exclude something from the encoding algorithm?)

bochaco · May 8, 2019, 2:42pm

Yes, that’s an option, but just trying to be very critical and strict (not saying it’s correct), hashing xorname+typetag can strictly be considered a SAFE address? probably yes, but in SAFE currently the xorname is the address, that’s what routing only knows, the type tag is something else, still to locate the data…so that’s all I mean

happybeing · May 8, 2019, 3:42pm

Sorry, ignore me (often best ) I misread the page I linked. I see it is not part of CID but showing the generic case of which the CID <mc><hash> is an implementation. Dang it!

riddim · May 8, 2019, 3:59pm

Okay - since we’re talking now - what’s speaking against adding a small checksum and this systematic to share private data that comes with keys?

Or which other alternatives can you think of (what are their upsides?)

bochaco · May 8, 2019, 5:40pm

I’m not sure why I don’t see the need for a checksum, if you cannot fetch the content it’s probably invalid, even if the XOR-URL was checked-sum but couldn’t fetch the content what is it that you can get/conclude out of it?

Having decryption keys in the URL to a private content…hold on…isn’t that contradictory? if you do that then decryption keys become just like “an additional encoding” to your URL, as eveyone with such URL can see the content and therefore not private. In fact, some toosl out there already handle sharing data by just providing a difficult-to-guess URL, but if you have the URL you have access, so it’s pseudo private and shared.

I think encryption keys or any key needed to decrypt/fetch a piece of data needs to be out of band, with some other type of sharing mechanism at the application layer (I wouldn’t disagree at all we should provide those utilities though)

Edit: an example of application layer solution is safe://<XOR-URL>?key=<keys to decrypt>

riddim · May 8, 2019, 6:48pm

True - but you need Internet connection for validating this (and need to wait for the response ‘not available’) so only online possible and slower

Well - simple sharing of encrypted data with a group? (for example my holiday pictures with my grandma (who can click a link in a messenger but for sure can’t operate many programs) and family)

safe://<XOR-URL>?key=<keys to decrypt>

how is this different except for the missing upsides that come with cids?