An on-chain index of all data stored and distributed in the system, with associated information about ownership and what providers are tasked with storing and providing bandwidth.
This subsystem serves to retain an index of all static assets, including who owns them, and information about what actors are currently tasked with storing and distributing them to end users. It has no awareness of the underlying content or purpose represented by such an object, as it is used by different parts of the system. It can represent assets as diverse as
- Video media, with a particular resolution and encoding, living in the content directory.
- Image media used for the avatar of a member, an election candidate or for the cover of a channel in the content directory.
- A data attachment to a blog post, proposal or role application.
In the future it will be extended with autonomous interactive auditing features for storage providers and payments between service providers to incentive replication.
The data directory account is a module account for holding funds in the data directory, and it has an account id denoted by
A data object represents a single static data asset, like an image or video media, and it is defined by the following information
id: A unique immutable non-negative integer identifying an individual data object, is automatically assigned by the blockchain upon creation.
accepted: Whether the object has been verified as correctly uploaded to an initial storage providers. The storage provider making such a confirmation for a given object is referred to as the liasion for the object.
deletion_prize: An amount of funds locked up as a state bloat bond for the object.
size: The claimed size of the object, as stipulated during creation by the owner, and implicitly understood to be verified by the liasion.
hash: The IPFS CID of the object, specifically SS58 format of Multihash with blake3 hashing algorithm.
A Bag Id is a value which can identify a specific bag, see section on Bag for further elaboration, and it takes one of the following varieties
static: a static bag id identifies one of the built in bags in the system, and it comes in one of the following subvarieties
council: identifies the bag reserved for the council to manage through its proposal system.
membership: identifies the bag for the membership working group.
storage: identifies the bag for the storage working group.
bandwidth: identifies the bag for the bandwidth working group.
content: identifies the bag for the content directory working group.
forum: identifies the bag for the forum working group
operations_beta...: each identifies the bag for the corresponding operations working group.
dynamic: a dynamic bag id identifies one of the dynamic, so not built in, bags in the system, and it comes in one of the following subvarieties:
member: has associated membership id and identifies bag for this member.
channel: has associated channel id, and identifies the bag for this channel.
A data object bag, or bag for short, is a dynamic collection of data objects which can be treated as one subject in the system. Each bag has an owner, which is established when the bag is created. A data object lives in exactly one bag, but may be moved across bags by the owner of the bag. Only the owner can create new data objects in a bag, or opt into absorbing objects from another bag. The purpose of the concept of bags is to limit the on-chain transactional footprint of administrating multiple objects which should be treated the same way. This is achieved by establishing a small immutable identifier for these objects. The canonical example would be assets that will be consumed together, such as the cover photo and different video media encodings of a single piece of video content. Storage and distribution nodes have commitments to bags, not individual data objects.
A bag is defined by the following information
object_size: cumulative size of all data objects in the bag.
object_count: cumulative number of objects in the bag.
A storage bucket represents a commitment to hold some set of bags for long term storage. A bucket may have a bucket operator, which is a single worker in the storage working group. There is distinct bucket operator metadata associated with each, which describes things such as how to resolve the host. The operator of a bucket may change over time. As previously described, when new dynamic bags are created, they are allocated to one or more such buckets, unless the bucket has been temporarily disabled from accepting new bags.
id: a unique immutable non-negative integer identifying an individual storage bucket, is automatically assigned by the blockchain upon creation.
operator_status: status of bucket operator, is one of the following varieties
missing: when there is no operator.
invited: with associated https://github.com/Joystream/handbook/blob/master/system/storage/broken-reference/README.md id in storage working group, when the lead has invited the given worker to operate the bucket.
accepting_new_bags: whether this this bucket is an acceptable destination for additional bags.
total_size_limit: upper bound on cumulative size of all data objects in bucket.
object_count_limit: upper bound on cumulative number of all data objects in bucket.
total_size: cumulative size of all data objects in bucket.
object_count: cumulative number of all data objects in bucket.
A distribution bucket represents a commitment to distribute a set of bags to end users. A bucket may have multiple bucket operators, each being a worker in the distribution working group. The same metadata concept applies here as well, and additionally covers whether the operator is live or not. Bags are assigned to buckets when being uploaded, or later by the lead by manual intervention.
id: a unique immutable non-negative integer identifying an individual distributor bucket, is automatically assigned by the blockchain upon creation.
accepting_new_bags: whether this bucket is an acceptable destination for additional bags.
distributing: whether assigned operators are servicing this bucket at the moment.
pending_invitations: set https://github.com/Joystream/handbook/blob/master/system/storage/broken-reference/README.md ids from bandwidth working group for workers who have been invited and can join as operators.
operators: set https://github.com/Joystream/handbook/blob/master/system/storage/broken-reference/README.md ids from bandwidth working group for workers currently acting as operators.
assigned_bags: the number of bags currently assigned to this bucket.
Buckets are partitioned into so called distribution bucket families. These families group buckets with interchangeable semantics from distributional point of view, and the purpose of the grouping is to allow sharding over the bag space for a given service level when creating new bags. Here is an example that can make this more clear. A subset of families could for example represent each country in East Asia, where each family corresponds to a specific country. The buckets in a family, say the family for Mongolia, will be operated by infrastructure which can provide sufficiently low latency guarantees w.r.t. the corresponding country. The bag for a channel known to be particularly popular in this area could be setup so as to use these buckets disproportionately.
id: a unique immutable non-negative integer identifying an individual distribution bucket family, is automatically assigned by the blockchain upon creation.
distribution_buckets: a map which sends Distribution Bucket
idto the corresponding bucket, and holds all buckets that are part of this family.
A dynamic bag creation policy holds parameter values impacting how exactly the creation of a new dynamic bag occurs, and there is one such policy for each type of dynamic bag, so two, one for
memberand one for
channel. It describes how many storage buckets should store the bag, and from what subset of distribution bucket families (described below) to select a given number of distribution buckets, specifically
number_of_storage_buckets: number of storage buckets which should replicate the new bag.
families: map of Distribution Bucket Family id to the number of distribution buckets in the given family one must assign to a new bag for distribution when subject to this policy.
The blacklist is a collection hashes, managed by the lead, which are not allowed for future introductions of data objects in the directory.
The following overview summarizes the main relationships between the primary concepts.
The following mutable parameters are part of the system.
Whether all new uploads blocked.
Size based pricing of new objects uploaded.
The following set of method can be invoked from within the blockchain itself by other systems, and it is the way that different subsystems unlock the ability to have end-users interact with the storage and bandwidth system, for example allowing channel owners to publish video media into this infrastructure.
Validates upload parameters and conditions (like global uploading block). Validates voucher usage for affected buckets.
Upload new data objects.
Validates moving objects parameters, voucher usage for affected buckets.
Move data objects to a new bag.
delete_data_objectsparameters, voucher usage for affected buckets.
Delete storage objects. Transfer deletion prize to the provided account.
Delete dynamic bag. Updates related storage bucket vouchers.
delete_dynamic_bagparameters and conditions.
Creates dynamic bag. BagId should provide the caller.
create_dynamic_bagparameters and conditions.
Checks if a bag does exists and returns it. Static Always exists
Get all objects id in a bag, without checking its existence
A prize for a data object deletion.
maximum size of the "hash blacklist" collection.
A prize for a data object .
"Storage buckets per bag" value constraint.
"Distribution buckets per bag" value constraint.
The default dynamic bag creation policy for members (storage bucket number).
The default dynamic bag creation policy for channels (storage bucket number).
Max random iteration number (eg.: when picking the storage buckets).
Max allowed distribution bucket family number.
Max allowed distribution bucket number per family.
Max number of pending invitations per distribution bucket.
Max data object size in bytes.