- 
                Notifications
    
You must be signed in to change notification settings  - Fork 26
 
File Upload And Merkle Root Calculation
Consider you have an allocation with EC ratio of n:p where n is number of data-blobbers and p is number of parity-blobbers.
Following are some of the constants/definitions:
- BlockSize(currently aka chunkSize) --> 64 KB
 - Number of Merkle Leaves --> 1024
 - Merkle leaf's block size(aka merkle-chunk-size) --> 64 Bytes
 - Fragment: Result of 
n* 64KB 
For some file of size S, it is divided into multiple fragments. Last fragment that is short of n * 64KB will be padded with null bytes to make upto n * 64KB.
Each fragment is taken and splitted into 64KB of data resulting into n such blocks with above consideration.
Now with n data-shards, using reed-solomon erasure encoding, p parity-shards are calculated.
After parity-shards are calculated, respective shards are sent to the respective blobbers. So there will be n+p upload requests. Blobber will create new file in its temporary directory.
For next fragment, same process is followed. Blobber will append to existing file. And so on.
Consider there were m fragments. So blobbers will get m * 64KB of data each.
For a blobber, merkle root is calculated as follow:
- Divide each 64KB into block of 64 Bytes resulting in 1024 blocks. So the data indexes are from 0...1023
 - For some index 
i, there aremnumber of64 Bytes, which are treated as continuous data (Each index thus has data ofm* 64 Bytesas its leaf) - Hash of this continuour 
m* 64 Bytesis thus a leaf of an index. - Merkle tree and hence merkle root is calculated from these 1024 leaves.
 
Obviously merkle root of a file for each blobber will be different.
When a file is uploaded, it has num_of_blocks field which is number of 64KB that a blobber received. A directory will have num_of_blocks too which is simply sum of num_of_blocks of its children. So a root directory / will have record of total_num_of_blocks in an allocation.
Network will pose blobber with random_seed, which is used to calculate random block_num of an allocation. So block_num is basically calculated as; block_num = rand.NewSource(random_seed).Int63n(total_num_of_blocks).
Using GetObjectPath which has function signature func GetObjectPath(ctx context.Context, allocationID string, blockNum int64) (*ObjectPath, error), we get an object path which provides file/ref that is related to above block_num, list of all refs with hierarchy maintained that were retrieved from database while getting respective ref, root_hash of an allocation and other few details.
Challenge to a blobber is issued in following manner:
- 
Mining network posts a transaction, that has a list of eligible validators, random seed to determine which data block a blobber must submit and the latest writemarker.
 - 
Blobber checks network if any challenges has been issued.
 - 
It then uses random seed to calculate block number.
 - 
Blobber finds a file which stores this block of data. (In blobber code, using
GetObjectPath) - 
The file is splitted into blocks of 64KB.
 - 
Each 64KB are splitted into block of 64 Bytes resulting into 1024 blocks.
 - 
64 Bytes in same indices are treated as continuous data, which is a leaf for that respective index.
 - 
Merkle tree is calculated from 1024 leaves.
 - 
Random block of a file from indexes of 0...1023 is also selected using same random seed above
 - 
Merkle tree, data of some index i,
object_pathand list of writemarkers are sent to the validator. Note: These writemarkers are list of writemarkers that fall in range of writemarker's allocation_root that is issued by blockchain for the challenge and the latest writemarker's allocation_root of an allocation - 
Validator uses same mechanism to calculate random block and verify file metadata from
object_path. Validation of anobject_pathis crucial here as it will also verify that,merkle_rootof file is a valid hash. With thismerkle_root,merkle_treealong with responded data is validated. - 
Validator issues ticket regarding the validity of a challenge. Upon reaching consensus among validators, transaction is written to Blockchain which transfers some tokens from challenge pool to blobber's account.
 
Note: @cnlangzi has different implementation regarding leaf's hash in gosdk. His implementation calculates hash in each index as merkle root of m number of 64 Bytes as leaves. This is not an issue as it is used by blobber and validators as well to re-calculate merkle tree for challenge hash.
BlockSize and Number of merkle leaves and its blocksize(merkle-chunk-size) is always fixed to 64KB and 1024 and 64 Bytes respectively. This size may not be optimal in multiple cases. As an example, we can maybe have the following configurations;
For files lesser than 1KB, We can either consider using BlockSize to 1KB along with setting merkle-chunk-size to 1 Byte instead of 64 Bytes thus conserving 1024 number of merkle leaves or, consider decreasing number of merkle leaves and increasing merkle-chunk-size
For files greater than 1GB, We can consider using BlockSize of 50 MB along with setting merkle-chunk-size to 50 KB instead of 64 Bytes thus conserving 1024 number of merkle leaves.
We can also make it dynamic based on number of data-shards that an allocation has.
Note: Above example is simply an observation. To implement it we need to test its effect in the token economy, computational efficiency and blobber's convenience to store small files