-
Notifications
You must be signed in to change notification settings - Fork 236
docs: adds spec sections for mode and mtime metadata (Unixfs v1.5) #226
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Conversation
I didn't include the pros and cons of the other options because I don't think they should be included in the spec. That is, if I'm implementing this, I read the spec to understand what I should implement and how it should behave, I don't want to read a bunch of exposition on how we got to where we are. The decision log is important though, should it go in `ipfs/notes`? Or maybe we could have accompanying `*-notes.md` files? Follows on from/supersedes #223
ee58754
to
172846d
Compare
optional uint64 hashType = 5; | ||
optional uint64 fanout = 6; | ||
optional uint32 mode = 7; | ||
optional int64 mtime = 8; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
These are here and not in the Metadata
message because it's deprecated so let's stay well away from it.
Cc @ribasushi @ianopolous @rvagg - I can't add you to the review box for some reason. |
UNIXFS.md
Outdated
|
||
As a further optimization, if the `File` node's serialized size is small, it may be inlined into it's v1 [CID] by using the [`identity`](https://github.com/multiformats/multicodec/blob/master/table.csv) [multihash]. | ||
|
||
Such [CID]s must consist of 23 bytes or fewer in order for them to fit inside the 63 character limit for a DNS label when encoded in base32 (see [RFC1035 Section 2.3.1](https://tools.ietf.org/html/rfc1035#section-2.3.1)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
- Could you do a quick check to see if this is actually possible?
- In terms of domains, we really need to generalize to multiple subdomains (different issue).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so, as long as we don't get to the extremes of any number values. For 'reasonable' values it should work:
const CID = require('cids')
const multihashing = require('multihashing-async')
const UnixFS = require('ipfs-unixfs')
async function main () {
const data = new UnixFS('file')
data.addBlockSize(256)
const multihash = await multihashing(data.marshal(), 'identity')
const cid = new CID(1, 'dag-pb', multihash)
console.info(cid.toBaseEncodedString('base32').length)
// prints '21'
}
main()
For the following configurations:
const data = new UnixFS('file')
data.addBlockSize(256)
data.fanout = 4294967295 // e.g. max uint32
// prints '30'
const data = new UnixFS('file')
data.addBlockSize(256)
data.fanout = 4294967295 // e.g. max uint32
data.hashType = 9223372036854775807 // e.g. max uint64
// prints '48'
JS rounds max uint64 to 9223372036854776000
because lol javascript numbers. I'd use BigInt in the test but it's so new nothing in our stack supports it.
Anyway, I think we're ok, as long as we're not trying to inline HAMT
shards with really large fanout
or hashType
values that are made in the grim darkness of the far future and have all the permissions set.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Nice!
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should probably 23 + a single link to a block.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've pulled this out too because the generated identity CID is too long to be used in DNS labels after all:
const CID = require('cids')
const multihashing = require('multihashing-async')
const UnixFS = require('ipfs-unixfs')
const dagPB = require('ipld-dag-pb')
const { DAGLink, DAGNode } = dagPB
async function main () {
// store some file data
const fileData = Buffer.from([0, 1, 2, 3, 4])
const fileDataHash = await multihashing(fileData, 'sha2-256')
const fileDataCid = new CID(1, 'raw', fileDataHash)
// create the UnixFS node that points to the file data
const file = new UnixFS('file')
file.addBlockSize(5)
const node = new DAGNode(file.marshal(), [
new DAGLink('', 5, fileDataCid)
])
// inline the UnixFS node into it's CID
const multihash = await multihashing(dagPB.util.serialize(node), 'identity')
const cid = new CID(1, 'dag-pb', multihash)
console.info(cid.toBaseEncodedString('base32').length)
// prints '91'
}
main()
These need to be included somewhere. |
See the spec Raul did for Eth2.0's use of libp2p: https://github.com/ethereum/eth2.0-specs/blob/dev/specs/networking/p2p-interface.md#design-decision-rationale -- does a really good job capturing design rationale in the spec itself, which I think is warranted here given how foundational these decisions are |
@Stebalien @momack2 Raul's is a nice format - I've added the decision rationale at the bottom in a similar style as suggested. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I like the context!
UNIXFS.md
Outdated
|
||
UnixFS currently supports two optional metadata fields: | ||
|
||
* `mode` -- The `mode` is for persisting the [file permissions in numeric notation](https://en.wikipedia.org/wiki/File_system_permissions#Numeric_notation) \[[spec](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html)\]. If unspecified this defaults to `0755` for directories/HAMT shards and `0644` for all other types where applicable. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Let's note that, for now, only the ugo-rwx
bits are defined.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've added that. Also setuid, setgid and the sticky bit for completeness.
UNIXFS.md
Outdated
|
||
### Inheritance | ||
|
||
When traversing down through a UnixFSv1.5 directory, child entries without metadata fields will inherit those of their direct ascendants. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this really what we want?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I think so? Maybe the wording is wrong - you can have perms set at the root of a tree, which will cascade to all descendants, then you can specialise sections of the tree by setting perms on intermediate directories, and/or you can set perms on individual files for granular control. You then save space by not having to set the perms on every node.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
If you copied a file that had inherited perms from one UnixFS directory into another, presumably you'd want the option of preserving those perms, so you'd resolve them from the parent and apply them to the resulting node which would change its CID. I think that's ok. Bit weird, but ok.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This has the same problems as putting the permissions inside directories instead of inside files: if I give you a CID, it'll have different permissions/timestamps.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I had the same thought while travelling today. I’ll remove this section.
For example many files are under the 256KiB block size limit, so we tend to inline them into the describing UnixFS `File` node. This would not be possible with an intermediate `Metadata` node. | ||
|
||
1. The `File` node already contains some metadata (e.g. the file size) so you'd end up with metadata stored in multiple places | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
also note: this would have had forwards-compatibility issues that would have been annoying.
UNIXFS.md
Outdated
@@ -59,6 +75,31 @@ For files that are comprised of more than a single block, the 'Type' field will | |||
|
|||
This data is serialized and placed inside the 'Data' field of the outer merkledag protobuf, which also contains the actual links to the child nodes of this object. | |||
|
|||
For files comprised of a single block, the 'Type' field will be set to 'File', 'filesize' will be set to the total number of bytes in the file and the file data will be stored in the 'Data' field. | |||
|
|||
The serialized size of a UnixFS node must not exceed 256KiB in order to work will with the [Bitswap] protocol. |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not quite: It can be up to 1MiB (bitswap will transfer 2 (4?) MiB blocks).
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I've removed this last paragraph, we can always add it in later when the bitswap spec is more fleshed out.
UNIXFS.md
Outdated
|
||
As a further optimization, if the `File` node's serialized size is small, it may be inlined into it's v1 [CID] by using the [`identity`](https://github.com/multiformats/multicodec/blob/master/table.csv) [multihash]. | ||
|
||
Such [CID]s must consist of 23 bytes or fewer in order for them to fit inside the 63 character limit for a DNS label when encoded in base32 (see [RFC1035 Section 2.3.1](https://tools.ietf.org/html/rfc1035#section-2.3.1)). |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
nit: should probably 23 + a single link to a block.
🚀 |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
🎉 🥰 AWESOME!
Following our Redux Redux meeting, here's what we decided (as I recall). It largely swings back towards #220 with added sections on deduping and inlining and is more specific about the sizes of things.
The spec is updated with metadata implementation details and a decision rationale section has been added at the end.
Follows on from/supersedes #223