Version
1
DataCap Applicant
Zhao Hongwei
Project ID
80
Data Owner Name
LAMOST
Data Owner Country/Region
China
Data Owner Industry
Environment
Website
http://www.lamost.org/dr9/
Social Media Handle
http://www.lamost.org/dr9/
Social Media Type
Other
What is your role related to the dataset
Data Preparer
Total amount of DataCap being requested
20PiB
Expected size of single dataset (one copy)
2.5PiB
Number of replicas to store
8
Weekly allocation of DataCap requested
1000TiB
On-chain address for first allocation
f1srtc34se2lgufylq35mgfkegpgdltdiyziw5diq
Data Type of Application
Public, Open Dataset (Research/Non-Profit)
Custom multisig
Identifier
No response
Share a brief history of your project and organization
1. LAMOST Data includes Three Major Types:
Type (I):
Raw Data: All original data as well as original provenance information (for example, the observing log files, calibration files, software versions used, etc.), and the batch reduced two-dimensional spectra.
Type (II):
1D Spectral Data: One-dimensional spectra of observed objects, reduced through standardized reduction pipelines. Some provenance information is included with the 1D spectra, including the input catalog information, selection criteria and observing information such as exposure time, observation quality, seeing, weather conditions, and so on).
Type (III):
Catalog Data: Objective physical quantities with errors, derived from the spectral data and input catalog. The catalog includes the coordinates, magnitudes, radial velocities, effective temperature, surface gravity, elemental abundances, warning flags and so on.
Is this project associated with other projects/ecosystem stakeholders?
No
If answered yes, what are the other projects/ecosystem stakeholders
Describe the data being stored onto Filecoin
1. LAMOST Data includes Three Major Types:
Type (I):
Raw Data: All original data as well as original provenance information (for example, the observing log files, calibration files, software versions used, etc.), and the batch reduced two-dimensional spectra.
Type (II):
1D Spectral Data: One-dimensional spectra of observed objects, reduced through standardized reduction pipelines. Some provenance information is included with the 1D spectra, including the input catalog information, selection criteria and observing information such as exposure time, observation quality, seeing, weather conditions, and so on).
Type (III):
Catalog Data: Objective physical quantities with errors, derived from the spectral data and input catalog. The catalog includes the coordinates, magnitudes, radial velocities, effective temperature, surface gravity, elemental abundances, warning flags and so on.
Where was the data currently stored in this dataset sourced from
AWS Cloud
If you answered "Other" in the previous question, enter the details here
If you are a data preparer. What is your location (Country/Region)
None
If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?
To get the LAMOST DR9 dataset onto the Filecoin network, the first step is downloading all the data and making sure it's complete and correct. After that, you dive into prepping the data - this means converting formats, compressing files, and pulling out metadata for easier searching later on. Next up, you slice and dice the data into chunks that fit nicely into Filecoin sectors, wrapping them up as CAR files, and calculate a PieceCID for each one of those files. Picking the right Storage Provider is crucial here; you want to look at where they're located, what kind of retrieval protocols they support, and their overall reputation. Once you've got that sorted, it's time to seal those sectors, which involves generating zero-knowledge proofs, before proposing the deal on-chain to officially store your data. When the storage provider accepts the deal and submits the PoRep on-chain, they'll need to keep submitting WindowPoSts to prove the data's still there and accessible. Finally, anyone can grab the stored data using the CID or through other retrieval methods. Throughout this whole process, it's important to think about setting up good data governance, incentives for participation, and ensuring everything complies with relevant rules and standards, so the scientific data stays safe and shareable long-term.
If you are not preparing the data, who will prepare the data? (Provide name and business)
Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.
Indeed, only a small portion of the LAMOST DR9 dataset has been stored on the Filecoin network so far. However, the process was hindered by delays in receiving the Datacap allocation, which affected further storage of the DR9 dataset. Although some data has already been stored, the complete LAMOST DR9 dataset contains a vast amount of spectroscopic data, which is crucial for astronomical research. Ensuring the integrity of the entire dataset would maximize its scientific value.
This ensures not only that the full breadth of data is available for research purposes but also enhances the long-term preservation and accessibility of this critical information for the global scientific community. Despite the initial setback with the Datacap allocation, continuing to store the complete dataset is essential for supporting extensive and detailed astronomical studies.
Please share a sample of the data
http://www.lamost.org/dr9/
Confirm that this is a public dataset that can be retrieved by anyone on the Network
If you chose not to confirm, what was the reason
What is the expected retrieval frequency for this data
Sporadic
For how long do you plan to keep this dataset stored on Filecoin
1.5 to 2 years
In which geographies do you plan on making storage deals
Greater China, Asia other than Greater China, North America, Europe, Australia (continent)
How will you be distributing your data to storage providers
HTTP or FTP server, Shipping hard drives
How did you find your storage providers
Slack, Filmine, Big Data Exchange
If you answered "Others" in the previous question, what is the tool or platform you used
Please list the provider IDs and location of the storage providers you will be working with.
f01081419
f01084149
f01708981
f02825281
f02825675
f02826602
f02826762
f02827010
f02827109
f02827135
f02827843
f02827953
f03528948
f03528979
f03529412
f03558501
f03602120
f03610683
How do you plan to make deals to your storage providers
Boost client, Lotus client, Singularity
If you answered "Others/custom tool" in the previous question, enter the details here
Can you confirm that you will follow the Fil+ guideline
Yes
Version
1
DataCap Applicant
Zhao Hongwei
Project ID
80
Data Owner Name
LAMOST
Data Owner Country/Region
China
Data Owner Industry
Environment
Website
http://www.lamost.org/dr9/
Social Media Handle
http://www.lamost.org/dr9/
Social Media Type
Other
What is your role related to the dataset
Data Preparer
Total amount of DataCap being requested
20PiB
Expected size of single dataset (one copy)
2.5PiB
Number of replicas to store
8
Weekly allocation of DataCap requested
1000TiB
On-chain address for first allocation
f1srtc34se2lgufylq35mgfkegpgdltdiyziw5diq
Data Type of Application
Public, Open Dataset (Research/Non-Profit)
Custom multisig
Identifier
No response
Share a brief history of your project and organization
Is this project associated with other projects/ecosystem stakeholders?
No
If answered yes, what are the other projects/ecosystem stakeholders
Describe the data being stored onto Filecoin
Where was the data currently stored in this dataset sourced from
AWS Cloud
If you answered "Other" in the previous question, enter the details here
If you are a data preparer. What is your location (Country/Region)
None
If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?
If you are not preparing the data, who will prepare the data? (Provide name and business)
Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.
Please share a sample of the data
Confirm that this is a public dataset that can be retrieved by anyone on the Network
If you chose not to confirm, what was the reason
What is the expected retrieval frequency for this data
Sporadic
For how long do you plan to keep this dataset stored on Filecoin
1.5 to 2 years
In which geographies do you plan on making storage deals
Greater China, Asia other than Greater China, North America, Europe, Australia (continent)
How will you be distributing your data to storage providers
HTTP or FTP server, Shipping hard drives
How did you find your storage providers
Slack, Filmine, Big Data Exchange
If you answered "Others" in the previous question, what is the tool or platform you used
Please list the provider IDs and location of the storage providers you will be working with.
How do you plan to make deals to your storage providers
Boost client, Lotus client, Singularity
If you answered "Others/custom tool" in the previous question, enter the details here
Can you confirm that you will follow the Fil+ guideline
Yes