Skip to content

[DataCap Application] <DR 9> #87

@zhaohongwei201109

Description

@zhaohongwei201109

Version

1

DataCap Applicant

Zhao Hongwei

Project ID

80

Data Owner Name

LAMOST

Data Owner Country/Region

China

Data Owner Industry

Environment

Website

http://www.lamost.org/dr9/

Social Media Handle

http://www.lamost.org/dr9/

Social Media Type

Other

What is your role related to the dataset

Data Preparer

Total amount of DataCap being requested

20PiB

Expected size of single dataset (one copy)

2.5PiB

Number of replicas to store

8

Weekly allocation of DataCap requested

1000TiB

On-chain address for first allocation

f1srtc34se2lgufylq35mgfkegpgdltdiyziw5diq

Data Type of Application

Public, Open Dataset (Research/Non-Profit)

Custom multisig

  • Use Custom Multisig

Identifier

No response

Share a brief history of your project and organization

1. LAMOST Data includes Three Major Types:
Type (I):
Raw Data: All original data as well as original provenance information (for example, the observing log files, calibration files, software versions used, etc.), and the batch reduced two-dimensional spectra.
Type (II):
1D Spectral Data: One-dimensional spectra of observed objects, reduced through standardized reduction pipelines. Some provenance information is included with the 1D spectra, including the input catalog information, selection criteria and observing information such as exposure time, observation quality, seeing, weather conditions, and so on).
Type (III):
Catalog Data: Objective physical quantities with errors, derived from the spectral data and input catalog. The catalog includes the coordinates, magnitudes, radial velocities, effective temperature, surface gravity, elemental abundances, warning flags and so on.

Is this project associated with other projects/ecosystem stakeholders?

No

If answered yes, what are the other projects/ecosystem stakeholders


Describe the data being stored onto Filecoin

1. LAMOST Data includes Three Major Types:
Type (I):
Raw Data: All original data as well as original provenance information (for example, the observing log files, calibration files, software versions used, etc.), and the batch reduced two-dimensional spectra.
Type (II):
1D Spectral Data: One-dimensional spectra of observed objects, reduced through standardized reduction pipelines. Some provenance information is included with the 1D spectra, including the input catalog information, selection criteria and observing information such as exposure time, observation quality, seeing, weather conditions, and so on).
Type (III):
Catalog Data: Objective physical quantities with errors, derived from the spectral data and input catalog. The catalog includes the coordinates, magnitudes, radial velocities, effective temperature, surface gravity, elemental abundances, warning flags and so on.

Where was the data currently stored in this dataset sourced from

AWS Cloud

If you answered "Other" in the previous question, enter the details here


If you are a data preparer. What is your location (Country/Region)

None

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

To get the LAMOST DR9 dataset onto the Filecoin network, the first step is downloading all the data and making sure it's complete and correct. After that, you dive into prepping the data - this means converting formats, compressing files, and pulling out metadata for easier searching later on. Next up, you slice and dice the data into chunks that fit nicely into Filecoin sectors, wrapping them up as CAR files, and calculate a PieceCID for each one of those files. Picking the right Storage Provider is crucial here; you want to look at where they're located, what kind of retrieval protocols they support, and their overall reputation. Once you've got that sorted, it's time to seal those sectors, which involves generating zero-knowledge proofs, before proposing the deal on-chain to officially store your data. When the storage provider accepts the deal and submits the PoRep on-chain, they'll need to keep submitting WindowPoSts to prove the data's still there and accessible. Finally, anyone can grab the stored data using the CID or through other retrieval methods. Throughout this whole process, it's important to think about setting up good data governance, incentives for participation, and ensuring everything complies with relevant rules and standards, so the scientific data stays safe and shareable long-term.

If you are not preparing the data, who will prepare the data? (Provide name and business)


Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

Indeed, only a small portion of the LAMOST DR9 dataset has been stored on the Filecoin network so far. However, the process was hindered by delays in receiving the Datacap allocation, which affected further storage of the DR9 dataset. Although some data has already been stored, the complete LAMOST DR9 dataset contains a vast amount of spectroscopic data, which is crucial for astronomical research. Ensuring the integrity of the entire dataset would maximize its scientific value.

This ensures not only that the full breadth of data is available for research purposes but also enhances the long-term preservation and accessibility of this critical information for the global scientific community. Despite the initial setback with the Datacap allocation, continuing to store the complete dataset is essential for supporting extensive and detailed astronomical studies.

Please share a sample of the data

http://www.lamost.org/dr9/

Confirm that this is a public dataset that can be retrieved by anyone on the Network

  • I confirm

If you chose not to confirm, what was the reason


What is the expected retrieval frequency for this data

Sporadic

For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe, Australia (continent)

How will you be distributing your data to storage providers

HTTP or FTP server, Shipping hard drives

How did you find your storage providers

Slack, Filmine, Big Data Exchange

If you answered "Others" in the previous question, what is the tool or platform you used


Please list the provider IDs and location of the storage providers you will be working with.

f01081419
f01084149
f01708981
f02825281
f02825675
f02826602
f02826762
f02827010
f02827109
f02827135
f02827843
f02827953
f03528948
f03528979
f03529412
f03558501
f03602120
f03610683

How do you plan to make deals to your storage providers

Boost client, Lotus client, Singularity

If you answered "Others/custom tool" in the previous question, enter the details here


Can you confirm that you will follow the Fil+ guideline

Yes

Metadata

Metadata

Assignees

No one assigned

    Type

    No type

    Projects

    No projects

    Milestone

    No milestone

    Relationships

    None yet

    Development

    No branches or pull requests

    Issue actions