๐๏ธData Directory
An on-chain index of all data stored and distributed in the system, with associated information about ownership and what providers are tasked with storing and providing bandwidth.
Introduction
This subsystem serves to retain an index of all static assets, including who owns them, and information about what actors are currently tasked with storing and distributing them to end users. It has no awareness of the underlying content or purpose represented by such an object, as it is used by different parts of the system. It can represent assets as diverse as
Video media, with a particular resolution and encoding, living in the content directory.
Image media used for the avatar of a member, an election candidate or for the cover of a channel in the content directory.
A data attachment to a blog post, proposal or role application.
In the future it will be extended with autonomous interactive auditing features for storage providers and payments between service providers to incentive replication.
Concepts
Data Directory Account
The data directory account is a module account for holding funds in the data directory, and it has an account id denoted by DATA_DIRECTORY_ACCOUNT_ID
.
Data Object
A data object represents a single static data asset, like an image or video media, and it is defined by the following information
id
: A unique immutable non-negative integer identifying an individual data object, is automatically assigned by the blockchain upon creation.accepted
: Whether the object has been verified as correctly uploaded to an initial storage providers. The storage provider making such a confirmation for a given object is referred to as the liasion for the object.deletion_prize
: An amount of funds locked up as a state bloat bond for the object.size
: The claimed size of the object, as stipulated during creation by the owner, and implicitly understood to be verified by the liasion.hash
: The IPFS CID of the object, specifically SS58 format of Multihash with blake3 hashing algorithm.
Bag Id
A Bag Id is a value which can identify a specific bag, see section on Bag for further elaboration, and it takes one of the following varieties
static
: a static bag id identifies one of the built in bags in the system, and it comes in one of the following subvarietiescouncil
: identifies the bag reserved for the council to manage through its proposal system.membership
: identifies the bag for the membership working group.storage
: identifies the bag for the storage working group.bandwidth
: identifies the bag for the bandwidth working group.content
: identifies the bag for the content directory working group.forum
: identifies the bag for the forum working groupoperations_alpha
,operations_beta
...: each identifies the bag for the corresponding operations working group.
dynamic
: a dynamic bag id identifies one of the dynamic, so not built in, bags in the system, and it comes in one of the following subvarieties:member
: has associated membership id and identifies bag for this member.channel
: has associated channel id, and identifies the bag for this channel.
Bag
A data object bag, or bag for short, is a dynamic collection of data objects which can be treated as one subject in the system. Each bag has an owner, which is established when the bag is created. A data object lives in exactly one bag, but may be moved across bags by the owner of the bag. Only the owner can create new data objects in a bag, or opt into absorbing objects from another bag. The purpose of the concept of bags is to limit the on-chain transactional footprint of administrating multiple objects which should be treated the same way. This is achieved by establishing a small immutable identifier for these objects. The canonical example would be assets that will be consumed together, such as the cover photo and different video media encodings of a single piece of video content. Storage and distribution nodes have commitments to bags, not individual data objects.
A bag is defined by the following information
id
: an id of type Bag Id for this bag.stored_by
: set of ids for Storage Bucket tasked with storing the objects in the bag.distributed_by
: set of ids for #distributor-bucket tasked with distributing the objects in the bag.deletion_prize
: amount of money placed in #undefined as staking bond for cleaning up unused bag.object_size
: cumulative size of all data objects in the bag.object_count
: cumulative number of objects in the bag.
Storage Bucket
A storage bucket represents a commitment to hold some set of bags for long term storage. A bucket may have a bucket operator, which is a single worker in the storage working group. There is distinct bucket operator metadata associated with each, which describes things such as how to resolve the host. The operator of a bucket may change over time. As previously described, when new dynamic bags are created, they are allocated to one or more such buckets, unless the bucket has been temporarily disabled from accepting new bags.
id
: a unique immutable non-negative integer identifying an individual storage bucket, is automatically assigned by the blockchain upon creation.operator_status
: status of bucket operator, is one of the following varietiesmissing
: when there is no operator.invited
: with associated https://github.com/Joystream/handbook/blob/master/system/storage/broken-reference/README.md id in storage working group, when the lead has invited the given worker to operate the bucket.
accepting_new_bags
: whether this this bucket is an acceptable destination for additional bags.total_size_limit
: upper bound on cumulative size of all data objects in bucket.object_count_limit
: upper bound on cumulative number of all data objects in bucket.total_size
: cumulative size of all data objects in bucket.object_count
: cumulative number of all data objects in bucket.
Distribution Bucket
A distribution bucket represents a commitment to distribute a set of bags to end users. A bucket may have multiple bucket operators, each being a worker in the distribution working group. The same metadata concept applies here as well, and additionally covers whether the operator is live or not. Bags are assigned to buckets when being uploaded, or later by the lead by manual intervention.
id
: a unique immutable non-negative integer identifying an individual distributor bucket, is automatically assigned by the blockchain upon creation.accepting_new_bags
: whether this bucket is an acceptable destination for additional bags.distributing
: whether assigned operators are servicing this bucket at the moment.pending_invitations
: set https://github.com/Joystream/handbook/blob/master/system/storage/broken-reference/README.md ids from bandwidth working group for workers who have been invited and can join as operators.operators
: set https://github.com/Joystream/handbook/blob/master/system/storage/broken-reference/README.md ids from bandwidth working group for workers currently acting as operators.assigned_bags
: the number of bags currently assigned to this bucket.
Distribution Bucket Family
Buckets are partitioned into so called distribution bucket families. These families group buckets with interchangeable semantics from distributional point of view, and the purpose of the grouping is to allow sharding over the bag space for a given service level when creating new bags. Here is an example that can make this more clear. A subset of families could for example represent each country in East Asia, where each family corresponds to a specific country. The buckets in a family, say the family for Mongolia, will be operated by infrastructure which can provide sufficiently low latency guarantees w.r.t. the corresponding country. The bag for a channel known to be particularly popular in this area could be setup so as to use these buckets disproportionately.
id
: a unique immutable non-negative integer identifying an individual distribution bucket family, is automatically assigned by the blockchain upon creation.distribution_buckets
: a map which sends Distribution Bucketid
to the corresponding bucket, and holds all buckets that are part of this family.
Dynamic Bag Creation Policy
A dynamic bag creation policy holds parameter values impacting how exactly the creation of a new dynamic bag occurs, and there is one such policy for each type of dynamic bag, so two, one for member
and one for channel
. It describes how many storage buckets should store the bag, and from what subset of distribution bucket families (described below) to select a given number of distribution buckets, specifically
number_of_storage_buckets
: number of storage buckets which should replicate the new bag.families
: map of Distribution Bucket Family id to the number of distribution buckets in the given family one must assign to a new bag for distribution when subject to this policy.
Blacklist
The blacklist is a collection hashes, managed by the lead, which are not allowed for future introductions of data objects in the directory.
Figures
Overview
The following overview summarizes the main relationships between the primary concepts.
Parameters
The following mutable parameters are part of the system.
uploading_blocked
Bool
Whether all new uploads blocked.
data_object_per_mega_byte_fee
Balance
Size based pricing of new objects uploaded.
Internal Methods
The following set of method can be invoked from within the blockchain itself by other systems, and it is the way that different subsystems unlock the ability to have end-users interact with the storage and bandwidth system, for example allowing channel owners to publish video media into this infrastructure.
can_upload_data_objects
Validates upload parameters and conditions (like global uploading block). Validates voucher usage for affected buckets.
upload_data_objects
Upload new data objects.
can_move_data_objects
Validates moving objects parameters, voucher usage for affected buckets.
move_data_objects
Move data objects to a new bag.
can_delete_data_objects
Validates delete_data_objects
parameters, voucher usage for affected buckets.
delete_data_objects
Delete storage objects. Transfer deletion prize to the provided account.
delete_dynamic_bag
Delete dynamic bag. Updates related storage bucket vouchers.
can_delete_dynamic_bag
Validates delete_dynamic_bag
parameters and conditions.
create_dynamic_bag
Creates dynamic bag. BagId should provide the caller.
can_create_dynamic_bag
Validates create_dynamic_bag
parameters and conditions.
ensure_bag_exists
Checks if a bag does exists and returns it. Static Always exists
get_data_objects_id
Get all objects id in a bag, without checking its existence
Constants
DataObjectDeletionPrize
A prize for a data object deletion.
BlacklistSizeLimit
maximum size of the "hash blacklist" collection.
DATA_DIRECTORY_ACCOUNT_ID
A prize for a data object .
StorageBucketsPerBagValueConstraint
"Storage buckets per bag" value constraint.
DistributionBucketsPerBagValueConstraint
"Distribution buckets per bag" value constraint.
DefaultMemberDynamicBagNumberOfStorageBuckets
The default dynamic bag creation policy for members (storage bucket number).
DefaultChannelDynamicBagNumberOfStorageBuckets
The default dynamic bag creation policy for channels (storage bucket number).
MaxRandomIterationNumber
Max random iteration number (eg.: when picking the storage buckets).
MaxDistributionBucketFamilyNumber
Max allowed distribution bucket family number.
MaxDistributionBucketNumberPerFamily
Max allowed distribution bucket number per family.
MaxNumberOfPendingInvitationsPerDistributionBucket
Max number of pending invitations per distribution bucket.
MaxDataObjectSize
Max data object size in bytes.
Extrinsics
create_storage_bucket
WIP.
update_storage_buckets_for_bag
WIP.
delete_storage_bucket
WIP.
invite_storage_bucket_operator
WIP.
cancel_storage_bucket_operator_invite
WIP.
remove_storage_bucket_operator
WIP.
update_uploading_blocked_status
WIP.
update_storage_buckets_per_bag_limit
WIP.
update_storage_buckets_voucher_max_limits
WIP.
update_number_of_storage_buckets_in_dynamic_bag_creation_policy
WIP.
update_blacklist
WIP.
set_storage_bucket_voucher_limits
WIP.
accept_storage_bucket_invitation
WIP.
set_storage_operator_metadata
WIP.
accept_pending_data_objects
WIP.
create_distribution_bucket_family
WIP.
delete_distribution_bucket_family
WIP.
create_distribution_bucket
WIP.
delete_distribution_bucket
WIP.
update_distribution_bucket_status
WIP.
update_distribution_buckets_for_bag
WIP.
distribution_buckets_per_bag_limit
WIP.
update_families_in_dynamic_bag_creation_policy
WIP.
cancel_distribution_bucket_operator_invite
WIP.
remove_distribution_bucket_operator
WIP.
set_distribution_bucket_family_metadata
WIP.
accept_distribution_bucket_invitation
WIP.
set_distribution_operator_metadata
WIP.
Last updated