Skip to content

Conversation

achingbrain
Copy link
Member

@achingbrain achingbrain commented Nov 14, 2019

Following our Redux Redux meeting, here's what we decided (as I recall). It largely swings back towards #220 with added sections on deduping and inlining and is more specific about the sizes of things.

The spec is updated with metadata implementation details and a decision rationale section has been added at the end.

Follows on from/supersedes #223

I didn't include the pros and cons of the other options because I
don't think they should be included in the spec.  That is, if I'm
implementing this, I read the spec to understand what I should
implement and how it should behave, I don't want to read a bunch
of exposition on how we got to where we are.

The decision log is important though, should it go in
`ipfs/notes`?  Or maybe we could have accompanying `*-notes.md`
files?

Follows on from/supersedes #223
@achingbrain achingbrain force-pushed the unixfs-1.5-metadata-in-files branch from ee58754 to 172846d Compare November 14, 2019 17:26
optional uint64 hashType = 5;
optional uint64 fanout = 6;
optional uint32 mode = 7;
optional int64 mtime = 8;
Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

These are here and not in the Metadata message because it's deprecated so let's stay well away from it.

@achingbrain
Copy link
Member Author

Cc @ribasushi @ianopolous @rvagg - I can't add you to the review box for some reason.

UNIXFS.md Outdated

As a further optimization, if the `File` node's serialized size is small, it may be inlined into it's v1 [CID] by using the [`identity`](https://github.com/multiformats/multicodec/blob/master/table.csv) [multihash].

Such [CID]s must consist of 23 bytes or fewer in order for them to fit inside the 63 character limit for a DNS label when encoded in base32 (see [RFC1035 Section 2.3.1](https://tools.ietf.org/html/rfc1035#section-2.3.1)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • Could you do a quick check to see if this is actually possible?
  • In terms of domains, we really need to generalize to multiple subdomains (different issue).

Copy link
Member Author

@achingbrain achingbrain Nov 15, 2019

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so, as long as we don't get to the extremes of any number values. For 'reasonable' values it should work:

const CID = require('cids')
const multihashing = require('multihashing-async')
const UnixFS = require('ipfs-unixfs')

async function main () {
  const data = new UnixFS('file')
  data.addBlockSize(256)

  const multihash = await multihashing(data.marshal(), 'identity')
  const cid = new CID(1, 'dag-pb', multihash)

  console.info(cid.toBaseEncodedString('base32').length)
  // prints '21'
}

main()

For the following configurations:

const data = new UnixFS('file')
data.addBlockSize(256)
data.fanout = 4294967295  // e.g. max uint32

// prints '30'
const data = new UnixFS('file')
data.addBlockSize(256)
data.fanout = 4294967295  // e.g. max uint32
data.hashType = 9223372036854775807 // e.g. max uint64

// prints '48'

JS rounds max uint64 to 9223372036854776000 because lol javascript numbers. I'd use BigInt in the test but it's so new nothing in our stack supports it.

Anyway, I think we're ok, as long as we're not trying to inline HAMT shards with really large fanout or hashType values that are made in the grim darkness of the far future and have all the permissions set.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Nice!

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should probably 23 + a single link to a block.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've pulled this out too because the generated identity CID is too long to be used in DNS labels after all:

const CID = require('cids')
const multihashing = require('multihashing-async')
const UnixFS = require('ipfs-unixfs')
const dagPB = require('ipld-dag-pb')
const { DAGLink, DAGNode } = dagPB

async function main () {
  // store some file data
  const fileData = Buffer.from([0, 1, 2, 3, 4])
  const fileDataHash = await multihashing(fileData, 'sha2-256')
  const fileDataCid = new CID(1, 'raw', fileDataHash)

  // create the UnixFS node that points to the file data
  const file = new UnixFS('file')
  file.addBlockSize(5)
  const node = new DAGNode(file.marshal(), [
    new DAGLink('', 5, fileDataCid)
  ])

  // inline the UnixFS node into it's CID
  const multihash = await multihashing(dagPB.util.serialize(node), 'identity')
  const cid = new CID(1, 'dag-pb', multihash)

  console.info(cid.toBaseEncodedString('base32').length)
  // prints '91'
}

main()

@Stebalien
Copy link
Member

I didn't include the pros and cons of the other options because I don't think they should be included in the spec. That is, if I'm implementing this, I read the spec to understand what I should implement and how it should behave, I don't want to read a bunch of exposition on how we got to where we are.

These need to be included somewhere.

@momack2
Copy link
Contributor

momack2 commented Nov 14, 2019

See the spec Raul did for Eth2.0's use of libp2p: https://github.com/ethereum/eth2.0-specs/blob/dev/specs/networking/p2p-interface.md#design-decision-rationale -- does a really good job capturing design rationale in the spec itself, which I think is warranted here given how foundational these decisions are

@achingbrain
Copy link
Member Author

@Stebalien @momack2 Raul's is a nice format - I've added the decision rationale at the bottom in a similar style as suggested.

Copy link
Member

@Stebalien Stebalien left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I like the context!

UNIXFS.md Outdated

UnixFS currently supports two optional metadata fields:

* `mode` -- The `mode` is for persisting the [file permissions in numeric notation](https://en.wikipedia.org/wiki/File_system_permissions#Numeric_notation) \[[spec](https://pubs.opengroup.org/onlinepubs/9699919799/basedefs/sys_stat.h.html)\]. If unspecified this defaults to `0755` for directories/HAMT shards and `0644` for all other types where applicable.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Let's note that, for now, only the ugo-rwx bits are defined.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've added that. Also setuid, setgid and the sticky bit for completeness.

UNIXFS.md Outdated

### Inheritance

When traversing down through a UnixFSv1.5 directory, child entries without metadata fields will inherit those of their direct ascendants.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this really what we want?

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I think so? Maybe the wording is wrong - you can have perms set at the root of a tree, which will cascade to all descendants, then you can specialise sections of the tree by setting perms on intermediate directories, and/or you can set perms on individual files for granular control. You then save space by not having to set the perms on every node.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

If you copied a file that had inherited perms from one UnixFS directory into another, presumably you'd want the option of preserving those perms, so you'd resolve them from the parent and apply them to the resulting node which would change its CID. I think that's ok. Bit weird, but ok.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This has the same problems as putting the permissions inside directories instead of inside files: if I give you a CID, it'll have different permissions/timestamps.

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I had the same thought while travelling today. I’ll remove this section.

For example many files are under the 256KiB block size limit, so we tend to inline them into the describing UnixFS `File` node. This would not be possible with an intermediate `Metadata` node.

1. The `File` node already contains some metadata (e.g. the file size) so you'd end up with metadata stored in multiple places

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

also note: this would have had forwards-compatibility issues that would have been annoying.

UNIXFS.md Outdated
@@ -59,6 +75,31 @@ For files that are comprised of more than a single block, the 'Type' field will

This data is serialized and placed inside the 'Data' field of the outer merkledag protobuf, which also contains the actual links to the child nodes of this object.

For files comprised of a single block, the 'Type' field will be set to 'File', 'filesize' will be set to the total number of bytes in the file and the file data will be stored in the 'Data' field.

The serialized size of a UnixFS node must not exceed 256KiB in order to work will with the [Bitswap] protocol.
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not quite: It can be up to 1MiB (bitswap will transfer 2 (4?) MiB blocks).

Copy link
Member Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I've removed this last paragraph, we can always add it in later when the bitswap spec is more fleshed out.

UNIXFS.md Outdated

As a further optimization, if the `File` node's serialized size is small, it may be inlined into it's v1 [CID] by using the [`identity`](https://github.com/multiformats/multicodec/blob/master/table.csv) [multihash].

Such [CID]s must consist of 23 bytes or fewer in order for them to fit inside the 63 character limit for a DNS label when encoded in base32 (see [RFC1035 Section 2.3.1](https://tools.ietf.org/html/rfc1035#section-2.3.1)).
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

nit: should probably 23 + a single link to a block.

@Stebalien Stebalien merged commit 7b87248 into master Nov 17, 2019
@Stebalien
Copy link
Member

🚀

@Stebalien Stebalien deleted the unixfs-1.5-metadata-in-files branch November 17, 2019 23:25
@daviddias daviddias changed the title docs: adds spec sections for mode and mtime metadata docs: adds spec sections for mode and mtime metadata (Unixfs v1.5) Nov 18, 2019
Copy link
Member

@alanshaw alanshaw left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

🎉 🥰 AWESOME!

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

4 participants