[RFC Discussion]: XOR-URLs

first of all - thanks for your fast feedback :wink:

true

@bochaco @joshuef is it possible that you both are suggesting to use the JS bindings in python? Oo

sorry - i might have not explained enough what i wanted to do - but this now is a bit frustratingā€¦

i thought the whole point of xor-urls is that i can locate something in the network ā€¦

i have exactly 160 bytes of MDataInfo that enable me to find my mutable data from any computer where i put them on

somehow i thought that i would need to e.g. take the xor-name (32bytes), take the hex representation (or some widely used encoding)

image

put a safe:// in front of it - maybe a :777 at the end for the type_tag

image

and just view my uploaded mutable data in the safe-browser Oo ā€¦
ā€¦if i donā€™t know how the xor-url is defined - what i need to do with which bytes i cannot share data uploaded from python in a simple way (and i donā€™t want to be rude - but the only thing i want to know is instructions on ā€˜doing what with whatā€™ to generate a link to my data coming from the MDataInfo-object - i thought i wouldnā€™t be asking for a lot here ā€¦ all in all we should be talking about a maximum of 160 bytes ā€¦)

1 Like

Briefly, the reason we donā€™t simply encode (with baseX) a XoR name/addr and use just that itā€™s because we are trying to encode some additional information in the URL and also account for any future changes in some of that information, this is what multiformats protocol provides, e.g. if in the future we want/need to use any other base encoding we can do it and the format already specifies which base we are using, the same goes for other info we are trying to include like the content type of the data being referenced by such a XOR-URL.

No, thatā€™s not what we suggest exactly.

We are proposing to have the XOR URL to be a standard type of URLs for SAFE, which means our APIs should allow you to generate them and make use of them, therefore you wouldnā€™t be generating them yourself in any language but just use the APIs provided. This would need to be provided at the lower Rust lib so it can then be exposed in any language binding like Python binding. At the moment the PoC has it implemented at JS layer only but thatā€™s just the PoC and not where such an API core implementation should be, but in Rust instead.

Since you seem to be trying to implement this in Python already, then unfortunately you would need to implement it yourself, or wait till itā€™s available in safe_app FFI API and just create the binding functions.

Now, if you want to implement it yourself in python right now, you will have to implement the same as itā€™s made here for MD XOR-URLs: safe_app_nodejs/src/api/mutable.js at master Ā· maidsafe-archive/safe_app_nodejs Ā· GitHub, which is what is speced out here in the RFC: rfcs/text/0000-xor-urls/0000-xor-urls.md at 357384147ae005e4061079b27a30f43cf379fda5 Ā· maidsafe/rfcs Ā· GitHub, the same goes for ImmD, you can see how itā€™s implemented here: safe_app_nodejs/src/api/immutable.js at master Ā· maidsafe-archive/safe_app_nodejs Ā· GitHub

As you can see in the code, we use the CID/multihash implementations already available for JS, i.e. we donā€™t even deal with base32 encoding ourselves. So you could do the same by using the implementations available for Python: GitHub - ipld/py-cid: Self-describing content-addressed identifiers for distributed systems implementation in Python and GitHub - ivilata/pymultihash: Python implementation of the multihash specification

5 Likes

oooh - thank you :smiley: well that was very helpful now =)

then i may be just too impatient - kk
i somehow didnā€™t expect it to be something this fancy :wink: ā€¦ (iā€™ll read something about it later on ā€¦ thanks for clarification =)

3 Likes

haHAAAAA

safe://bafybmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777

thank you very very much @bochaco - awesome xD

ps: side note encodings/hashes used are the following: (thank you for the comment with hin further up in the JS code =)

image

okay - thatā€™s cool with me then :smiley: just was super confused that it didnā€™t look like a ā€˜standard procedureā€™ or a ā€˜standardized wayā€™ of doing it ā€¦ but since it is used by IPFS as well and it seems to be a thing (i didnā€™t have a lot of trouble reproducing it)

but just to mention it - itā€™s not z-base32 you are using but standard base32 as far as i can tell

2 Likes

Nice!
We do use base32z, you can see the definition of the consts.CID_BASE_ENCODING set to base32z here: safe_app_nodejs/src/consts.js at master Ā· maidsafe-archive/safe_app_nodejs Ā· GitHub, but Iā€™m not sure why the one you are generating with base32z is not finding the content, Iā€™ll debug it tomorrow, I can only think that perhaps the content type dag-pub could be causing some problem, not sure, but if you look at the JS code we set raw as the content type for MDs. This is the equivalent base32z we generate for the same XOR name ( 0x8f7fdd831e0ef35eb7f46965b8b0915e636522a0ffda8c73261983b50188289d):

safe://hyfktcerxx9qag8oq6pxmx7djcshmbrk6cp11fe895kg8gjo3oq4odnbeuw:777

Anyhow, this demonstrates you already that the CID format we use here is allowing you to encode it with another base (base32) and because that info is part of the URL you generated (safe://bafybmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777), the browser is able to decode it and find the content, so even that we are trying to make a decision on which base we want to use as the standard encoding, people could generate URLs with other encoding and the address still be decoded from the URL by the browser and API.

4 Likes

Oooooh - beautiful! :flushed: :dizzy_face: :hugs:

Ps and sorry for not simply adding the other address as text for copyingā€¦

3 Likes

TL;DR: the current cid solution looks nice and is a very elegant way of doing it but is imho not very flexible and since we need to provide only 24 byte of data to discover a piece of data (160 for including all the keys) this solution is overy complicated for a super simple task


current implementation/suggestion is this:

def getXorAddresOfMutable(data, ffi):
    xorName_asBytes = ffi.buffer(data.name)[:]
    myHash = multihash.encode(xorName_asBytes,'sha3-256')
    myCid = cid.make_cid(1,'dag-pb',myHash)
    encodedAddress = myCid.encode('base32z').decode()
    return 'safe://' + encodedAddress + ':' + str(data.type_tag)

we sha3-256 the 24 bytes of the xor-name, then we twist it somehow into a cid and in the end we encode it into base32z-ish

if we look at the bytes we can see that the difference between the hashed and unhashed value is not large - and if we look closer than we see that the sha-3ā€™d values are just patched with (hex) 1618 from 24 to 26 byte

then some magic with cid happens and we can somehow revert that to get the xor-name back.

if we want to be case-insensitive and use a base32-encoding we are only slightly shorter than just using the hex-value. - do we plan on integrating additional information into the cid? or is it just a fancy way to do it ā€¦? if the second ā€¦ then why would we do this instead of the simple hex-value that is easy and well-known/understood and not calculation-intense ā€¦?

in addition the this i saw somewhere the question/idea (i think by @happybeing) that we could make even xor-urls for private data; while the argument against it was that we cannot encode the additional keys in the cid ā€¦

if we just take the hex value of the MdataInfo we can have 160 byte in hex-representation that are describing our piece of data perfectly (the following piece of data is unencrypted so all the keys are 0); yes nobody wants to type those by hand - but as clickable link or qr-code thatā€™s an absolute valid solution imho

if we want to give someone-read-only-access to a file we mask the key that enables modification of the file and only provide the decryption key ā€¦ simple as that ā€¦

and since the MDataInfo is exactly what a program needs to handle a piece of data; and hex representation is super easy to implement and clearly defined (and easy to adapt in case something changes with appendable data/any future data format change) i really really really donā€™t see why we would go for something unnecessarily complicated like cids ā€¦

ps: and in addition to this the currently selected encoding base32z is definitely not a super-widely used encoding that has a different implementation in e.g. python than in Javascript and the python-generated base32z-links donā€™t work with the browser and i cannot decode a link that i get from the browser/JS API in python (!)ā€¦ (the good thing is that ā€œstandard encodedā€ links do work in the browser too - but this means again that we have redundantly defined many links to the same location in the network - while the redundancy doesnā€™t provide any additional benefits because itā€™s not like check-bits that show you at least that the link is correct or wrong but itā€™s just randomly spread links that suddenly end up in the same location ā€¦)


so i guess my suggestion would be to just use the hex-values of the name or the whole Info*-object as xor-urls where one can mask the properties which should not be shared ā€¦ simple to integrate for private data too and in the ā€˜easy caseā€™ only 5 characters longer than the current proposal

(and for e.g. the type-tag 18446744073709551615 (largest possible if iā€™m not mistaken) the difference is [including separator] exactly 1 character vs. included in the hex string:

)

2 Likes

pps: okay - and if i really missed something about the multiformats-thing that would make it very useful and cool for the futureā€¦

ā€¦you donā€™t want bas64 because of upper/lowercase and special characters ā€¦ you donā€™t want base32 because of similar looking characters ā€¦ why donā€™t simply choose base16 (hex) as default because itā€™s standardized and well known ā€¦?

(and if someone just wants to generate a link in a random language that doesnā€™t implement cids yet [maybe e.g. rust for exposing this through the client libs?:roll_eyes:] he can just hex the xor-name, put e.g. ā€˜safe://f01701620ā€™ (description: sha3-encoded + first patch-bytes because the xor-address is shorter than the checksum, bytes then hex-encoded) in front of it and have a working link to the newly generated mutable:

safe://f017016202976d0fbd38b8d2d29bde345379b4d541b4c76c4df33920171ef20fa70f33ac8:777

because of the self describing nature of the multicodec-thing this already works anyway - itā€™s just not obvious to someone wanting to do it ā€¦ or if you want to have your address to be keccak-512-encoded you put ā€˜f01701d20ā€™ in front of it and magically end up still in the same place

safe://f01701d202976d0fbd38b8d2d29bde345379b4d541b4c76c4df33920171ef20fa70f33ac8:777

(the trick with the cids seems to be that they choose a hashing function to patch the data to a working size - and then they put the information about the used hashing function + the used encoding for the following string at the front of the string of the output)

if you personally prefer your base32z encoded strings because they are a bit shorter and easier to identify/type in then you can still generate them on demand and the browser will accept them and show you the location you want it to show ā€¦?

but please donā€™t use a non-standard-encoding as default behaviour in your api ā€¦


yes itā€™s nice that you can choose the representation of your liking for the data you encode:

safe://mAXAWII9/3YMeDvNet/RpZbiwkV5jZSKg/9qMcyYZg7UBiCid:777
safe://bafybmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
safe://zdjCkrCPzAj3i5AoLGmfsjvpoz1tU9LXEinMkPrf2QzD4t9cg:777
safe://f017016208f7fdd831e0ef35eb7f46965b8b0915e636522a0ffda8c73261983b50188289d:777

some of them are implemented in JS, some not ā€¦ but i donā€™t see the value in it as of now ā€¦ would be nice to be able to use the base58btc because of the length ā€¦ but that again isnā€™t implemented in JS as of now ā€¦ and if you choose to do it then i would tend to just append the type-tag in encoded form instead of doing the :777 thing ā€¦ i donā€™t know where the real value is there [to encode it base10 and introduce a separator]

other examples for working links for the interested reader - all ends up at the same place
bin::          safe://bafkrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
base1::        safe://baearmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
base8::        safe://baedrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
base10::       safe://baeermiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
cbor::         safe://bafirmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
protobuf::     safe://bafibmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
rlp::          safe://bafqbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
bencode::      safe://bafrrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
multicodec::   safe://baeybmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
multihash::    safe://baeyrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
multiaddr::    safe://baezbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
multibase::    safe://baezrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha1::         safe://baeirmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha2-256::     safe://baejbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha2-512::     safe://baejrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
dbl-sha2-256:: safe://baflbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha3-224::     safe://baelrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha3-256::     safe://baelbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha3-384::     safe://baekrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
sha3-512::     safe://baekbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
shake-128::    safe://baembmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
shake-256::    safe://baemrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
keccak-224::   safe://baenbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
keccak-256::   safe://baenrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
keccak-384::   safe://baeobmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
keccak-512::   safe://baeormiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
murmur3::      safe://baerbmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
blake2b-8::    safe://baga6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-16::   safe://bagboiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-24::   safe://bagb6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-32::   safe://bagcoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-40::   safe://bagc6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-48::   safe://bagdoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-56::   safe://bagd6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-64::   safe://bageoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-72::   safe://bage6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-80::   safe://bagfoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-88::   safe://bagf6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-96::   safe://baggoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-104::  safe://bagg6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-112::  safe://baghoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-120::  safe://bagh6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-128::  safe://bagioiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-136::  safe://bagi6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-144::  safe://bagjoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-152::  safe://bagj6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-160::  safe://bagkoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-368::  safe://bagxoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-376::  safe://bagx6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-384::  safe://bagyoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-392::  safe://bagy6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-400::  safe://bagzoiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-408::  safe://bagz6iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-416::  safe://bag2oiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-424::  safe://bag26iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-432::  safe://bag3oiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-440::  safe://bag36iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-448::  safe://bag4oiaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
blake2b-456::  safe://bag46iaqwechx7xmddyhpgxvx6ruwlofqsfpggzjcud75vddteymyhnibrauj2:777
ipfs::         safe://bagsqgfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
http::         safe://bahqagfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
https::        safe://bag5qgfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
quic::         safe://bahgagfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
ws::           safe://bahoqgfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
onion::        safe://bag6agfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
p2p-circuit::  safe://bagraefrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
dag-pb::       safe://bafybmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
dag-cbor::     safe://bafyrmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
git-raw::      safe://baf4bmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777
eth-block::    safe://bagiacfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777
eth-block-list::safe://bagiqcfrar5753ay6b3zv5n7unfs3rmerlzrwkiva77niy4zgdgb3kamifcoq:777

ps:

if we would append a checksum instead of the encoding-information-stuff we could not only check for ā€˜incorrect characters in the encoded stringā€™ but we could do offline-checks for typos ā€¦

as of now e.g. both of the strings are valid cids ā€¦ while the last one has the last character mis-typed if you want to end up at the address of the mutable ā€¦

3 Likes

The key and main reason is simply to allow us to evolve without breaking things, as an example, your next statement:

Letā€™s imagine in the future we decide to change the encoding, or even the hash function we use for our immutable data XOR addrs, we will have to make sure that whatever new format we adopt we donā€™t break backward compatibility with older URLs.

Remember we are after the perptual web, we donā€™t want to break old URLs just because we are moving away from one encoding to another, or even from one hash function to another. So using the hex encoded XOR addr wouldnā€™t be enough if we want to accomplish that, we need some other ways to make sure that if I give you a URL to an immutable data on SAFE, itā€™s immutable and perpetual regardless whatā€™s the most used encoding at any moment.

I get your point, although just FYI Rust seems to be covered already: GitHub - multiformats/cid: Self-describing content-addressed identifiers for distributed systems

This is just another type of CID you are creating, what they are trying to achieve with CID is to have something standard that can be use to encode additional information to the content address. Where is the Rust implementation for that CID you are creating, or the golang one :slight_smile: just kidding ofc, I hope you understand what I mean

2 Likes

So we patch the data we want to encode up a couple of bytes for the possibility that we might at some point randomly decide to change the formatā€¦? Hex is the 1:1 representation of the bytes (the information we want to encode) altering it (except from changing the base) always will always be less efficientā€¦ Why would we want to become less efficienct?
(and hex is around since the beginning of computing - hard to belief it wonā€™t be understood/be hard to handle at any point in time)

1 Like

And imho the multiformat thing just create the impression of a future proof formatā€¦

ā€¦ It can handle different encodings and different patching algorithms (hash functions)ā€¦ But can it handle if we decide to add 2 additional leading bytes as check sum for offline validation? How does it handle if we decide to move from cid:typeTag to [cidWithTypeTagCodedInForNotHavingASeparator] how does it handle if we expand the address space from 32 byte to 64 byte?

All those cases cannot be simply coded into the cid but we would need to extract the bytes from the cid and then do a case decision on the length of the bytesā€¦ Just as we would do without cidā€¦ Only that with using cid we need to extract the bytes from it before we can use themā€¦ (so one additional step with cid) wasted resources in my opinionā€¦

Let me say it differently.

CIDs are an elegant way of encoding random data bytes into one data encoding of your choosing (taking care of the issues that arise if the data you want to encode doesnā€™t fit the alignment of the encoding you want to useā€¦ you will always get the exact byte string length back you wanted to encode initially) [the chosen hashing function is just a random property of the cid to identify the length of the encoded byte string and to patch/unpatch it in the process ā€¦] they donā€™t take care of any data format changes

Soā€¦ Unless you plan on changing the base for encoding the xor URL cids can only solve a problem they create themselvesā€¦ Thatā€™s why Iā€™m against cids for xor urls! We know the length of our xor URL, we donā€™t need to solve issues we cannot have - cids are answering the wrong question and are not the proper tool for thisā€¦ Why would you add complexity you donā€™t need? but we could add some additional bits as checksum to verify the validity of a xor URL [and maybe add a byte to describe the encoding if you wantā€¦ So weā€™d have self description again if we at some point randomly would think that it makes way more sense to move from base32 to base16 or base58] (!)

Ps:

oh well - you know whatā€¦ I donā€™t care anymoreā€¦ If you love your cids that much then go with themā€¦ Imho itā€™s a bad decision because it makes it overly complicated - but I donā€™t want to waste more of my life time for this issue that doesnā€™t matter anyway in the long runā€¦ just please donā€™t use the non standardised base32z encoding in your official api that possibly creates incompatibilities (or at least tell me which characters I need to replace by which others to get back to standard base 32 encoding to be able to decode urls you created)! (and why not append a checksum after the address too for typo-recognitionā€¦? Even the Iban comes with check bits nowadaysā€¦ But please (!) not a complex solution this time but just e.g. Counting oneā€™s in the bytes and take the last 1 or 2 digits or soā€¦ )

2 Likes

Hey @riddim, I donā€™t think everyone is really in love with CID and/or completely sold on it, this was just a proposal made and one way of achieving the goal. I think there are valid points in your critic which are not being ignored, if thatā€™s what you feel. Iā€™m personally waiting for others to also chime in here with their opinions and perspectives, Iā€™m aware some other people are trying to catch up with these discussions. Iā€™m trying to explain what were the decisions and the reasoning behind what has been done, but itā€™s good we are looking at them and reviewing them from different angles.

7 Likes

Aye, @riddim donā€™t be disheartened please.

Itā€™s awesome youā€™re certainly raising your concerns. This is how we progress on this front :+1:

Iā€™d been of the opinion that flexibility and future proofing are worthwhile additions. But you raise some good points. I need to digest and re read some of your posts above before I can opine something more though :slight_smile:

But aye, please donā€™t mistake a lack of immediate response as a lack of interest in your posts/points. :bowing_man:

8 Likes

All good just mega busy

tbh Iā€™m not 100% against cidā€¦ Probably the point where I got a bit upset was when I realised that cid is just a way of representing bytes and that you chose a non standard encoding for doing so as default behaviour ā€¦ I donā€™t care that much about a couple of bytes more or lessā€¦ But I would rather discuss if it wouldnā€™t make sense to include a check byte for offline typo recognition and to include the type tag just as bytes instead of the :typeTag thing that looks a bit pointless to me tbhā€¦

Ps: and since I found out that indeed I can just hex the name and set something in front of it I had the impression you donā€™t know what cid precisely is and think itā€™s a careless package for all data (but itā€™s not - itā€™s just one way to represent it - actually a smart way because you first say how the following data will be structured - but it is really no more then just a representation of bytesā€¦ (that can even be base 8 or base 2 as pure zeros and oneā€™sā€¦ If you exclude the type tag from the cid you make it complicated to transfer an xor address in such an environment instead of simpleā€¦ Same goes with an environment where someone in the future wants to use pure base base64 or base 128 dataā€¦ )

1 Like

Side note:

Note: base64 or base128 would for example be possible if someone used 6 or 7 parallel data transfer channels (just going away from the visual data representation layer and looking at the technical level)

So indeed there might be future use cases with different encodings where cids then could be natively at home and one wouldnā€™t need to decode and re-encode the data by hand (if the type tag is included in the xor address cid - otherwise you need to split it up again and treat the type tag different from the rest and need to re-encode it)

Finally had the chance to scroll shortly through the new primerā€¦

Thatā€™s something I didnā€™t pay attention to earlierā€¦

Plus while making Screenshots for this here I got a bit confusedā€¦

Immutable data Name: 32byte array

I thought for mutable data it would be 32byte too (24 byte name +8byte type tag) did this changeā€¦?

Anyway - but to sum it up the current proposal is to do:

  • cid(name):typeTag for mutable public
  • cid(name+mime type) for immutable public
  • David said 32byte array for safecoin (which is just data)
  • format unclear for private data

ā€¦ Looks like a bunch of different formats arisingā€¦

I know I sound a bit like a broken record now - but I would vote for cid(relevantBytes+checksum) for simply everythingā€¦

You can just return all three of base58, base32 and base16 through the api and people themselves can decide which one they want to use (the browser should be able to decode at least base32 and base16)

So it would be

  • cid(name+mimeType+checksum) for immutables
  • cid(name+checksum) for safecoin
  • cid(name+typeTag+checksum) for public mutable
  • cid(name+keys+typeTag+checksum) for private mutable

ā€¦ Still many different lengths of data but a bit more homogeneousā€¦


[i cannot post more than 3 times in a row - so here an EDIT]


ā€¦ Okay more on encoding and the differences between base32 and base32zā€¦ Just that everyone knows what we are talking about

I had a look at the JS implementation of those 2 and can make a base32 string from a base32z-js-string nowā€¦

So this here

safe://hyfabcerxx9qag8oq6pxmx7djcshmbrk6cp11fe895kg8gjo3oq4odnbeuw:777

is as base32

safe://bafybmiepp7oyghqo6nplp5djmw4lbek6mnssfih73kghgjqzqo2qdcbitu:777

As we can see the base32z leaves out l, v and 2

IMG_20190415_122813_546

And re-sorts all other characters :thinking:

1 Like

Hey @riddim, thanks for your valuable comments! Iā€™m still reading through this discussion and will chime in soon. For now Iā€™ve got just a quick remark:

we never had 24 byte XorNames ā€“ for Mutable Data itā€™s 32 byte XorName + an 8 byte (64 bit) type tag:
https://docs.rs/safe_core/0.32.0/safe_core/client/mdata_info/struct.MDataInfo.html

2 Likes

Then I need to check this in pySafe! Thx!

ps: all good - donā€™t know why i thought it would be 24 bytes ā€¦

2 Likes

Okay ā€“ playing with immutables now ā€¦

and here again there is the question how xor-links are supposed to work for me

i uploaded a jpg to this xor-name (hex)

30a3fcb0130310087d0890da0143a594b40dd96b7536de15892d78ee264a6813

and the same file as png to this xor-name (hex)

2d50dd18645fe99a98b89271b44d10c612ca2cb168eba51e4ea3ce7909689583

i know that itā€˜s really there because i downloaded it on a different pc and both downloads succeeded without error

IMG_20190505_160452_162

since the prefix you used for your png link at safe://hygjdkftyx3k7kr51q9mxapy418zk3stdsss8suyqcim3b56jcten8d4j9emo is not in the python implementation of multicodec i ā€žadded it manuallyā€œ to my local version of it (just smuggled it into the source code)

then i used the safe://toolbox.dapp to analyse the picture link you provided to extract the xor-name ā€“ i can download the lamp and get the data ā€¦

IMG_20190505_160851_510

when i convert it to a cid i get as base32:

safe://bagkdefraipltiobuvllajcbkfl34dbiphitkor5jz2r6lqwdptkzpqbgvz2a

which seems to be fine (page loads ā€“ toolbox analyses)

for base32z then suddenly i only get ā€šroughlyā€˜ what your link is (pay attention to the 2 additional yā€˜s) and safe://hygkdrftyexmueqbwimmyjnbkfm5hdbex8eukqt7j34t6mosdxuk3xobgi34y then analyses fine again with the toolbox + loads the picture (so itā€˜s definitely not ā€šjust base32z encodedā€˜ but somehow there are additional characters that were not there before [and imo are not supposed to be there ā€“ since itā€˜s 2 additional characters that obviously donā€˜t contain any information ā€¦ otherwise the base32 encoded data wouldnā€˜t analyze and workā€¦])

if i try the same with my uploaded png i get:

safe://bagkdefrafvin2gdel7uzvgfysjy3itiqyyjmulfrndv2khsouphhscliswbq

which doesnā€˜t let me view the png in the browser and doesnā€˜t analyse with the toolbox,

base32z
ā€˜safe://hgkdrftyfiep4gdrm9w3igfa1ja5eueoaajcwmftpdi4k81qwx881nme1sboā€™
fails too ā€“ and with the 2 additional yā€˜s (as in the example link)
ā€˜safe://hygkdrftyexmueqbwimmyjnbkfm5hdbex8eukqt7j34t6mosdxuk3xobgi34yā€™
it fails as well ā€¦

so what am i doing wrong with my png?

If i messed up something how is the precise specification of the xor-url of an immutable? why are there those 2 additional yā€™s? (:face_with_raised_eyebrow:) and why donā€™t we just append the mime type to the bytes and encode it just the same way we did before ā€¦?

As it is now for me in python - I need to manually patch the multicodecs implementation to have the required mime types (not sure how standardised that is - and how widely usedā€¦ The last update of the hash constants on github for python was 2015ā€¦ where do those codec-numbers come from anyway? I didnā€™t see them in the iana link from the github issue and the used codec for the png is not the in the issue mentioned x1910 but x1914ā€¦? May we run into collisions with the definitions suddenly? ) then I can generate a link (which only works in some cases as it seems)ā€¦

Ps: oh sorry! My mistake with the y! I think I made a copy&paste error with the base32z declaration dictā€¦ On second view the lengths of the links looked fishy :roll_eyes: :thinking:

Then your base32z link is perfect - itā€™s just that I fail with generating the right link to my uploaded png

2 Likes