[DataCap Application] <DR 9>

### Version

1

### DataCap Applicant

Zhao Hongwei

### Project ID

80

### Data Owner Name

LAMOST

### Data Owner Country/Region

China

### Data Owner Industry

Environment

### Website

http://www.lamost.org/dr9/

### Social Media Handle

http://www.lamost.org/dr9/

### Social Media Type

Other

### What is your role related to the dataset

Data Preparer

### Total amount of DataCap being requested

20PiB

### Expected size of single dataset (one copy)

2.5PiB

### Number of replicas to store

8

### Weekly allocation of DataCap requested

1000TiB

### On-chain address for first allocation

f1srtc34se2lgufylq35mgfkegpgdltdiyziw5diq

### Data Type of Application

Public, Open Dataset (Research/Non-Profit)

### Custom multisig

- [ ] Use Custom Multisig

### Identifier

_No response_

### Share a brief history of your project and organization

```text
1. LAMOST Data includes Three Major Types:
Type (I):
Raw Data: All original data as well as original provenance information (for example, the observing log files, calibration files, software versions used, etc.), and the batch reduced two-dimensional spectra.
Type (II):
1D Spectral Data: One-dimensional spectra of observed objects, reduced through standardized reduction pipelines. Some provenance information is included with the 1D spectra, including the input catalog information, selection criteria and observing information such as exposure time, observation quality, seeing, weather conditions, and so on).
Type (III):
Catalog Data: Objective physical quantities with errors, derived from the spectral data and input catalog. The catalog includes the coordinates, magnitudes, radial velocities, effective temperature, surface gravity, elemental abundances, warning flags and so on.
```

### Is this project associated with other projects/ecosystem stakeholders?

No

### If answered yes, what are the other projects/ecosystem stakeholders

```text

```

### Describe the data being stored onto Filecoin

```text
1. LAMOST Data includes Three Major Types:
Type (I):
Raw Data: All original data as well as original provenance information (for example, the observing log files, calibration files, software versions used, etc.), and the batch reduced two-dimensional spectra.
Type (II):
1D Spectral Data: One-dimensional spectra of observed objects, reduced through standardized reduction pipelines. Some provenance information is included with the 1D spectra, including the input catalog information, selection criteria and observing information such as exposure time, observation quality, seeing, weather conditions, and so on).
Type (III):
Catalog Data: Objective physical quantities with errors, derived from the spectral data and input catalog. The catalog includes the coordinates, magnitudes, radial velocities, effective temperature, surface gravity, elemental abundances, warning flags and so on.
```

### Where was the data currently stored in this dataset sourced from

AWS Cloud

### If you answered "Other" in the previous question, enter the details here

```text

```

### If you are a data preparer. What is your location (Country/Region)

None

### If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

```
To get the LAMOST DR9 dataset onto the Filecoin network, the first step is downloading all the data and making sure it's complete and correct. After that, you dive into prepping the data - this means converting formats, compressing files, and pulling out metadata for easier searching later on. Next up, you slice and dice the data into chunks that fit nicely into Filecoin sectors, wrapping them up as CAR files, and calculate a PieceCID for each one of those files. Picking the right Storage Provider is crucial here; you want to look at where they're located, what kind of retrieval protocols they support, and their overall reputation. Once you've got that sorted, it's time to seal those sectors, which involves generating zero-knowledge proofs, before proposing the deal on-chain to officially store your data. When the storage provider accepts the deal and submits the PoRep on-chain, they'll need to keep submitting WindowPoSts to prove the data's still there and accessible. Finally, anyone can grab the stored data using the CID or through other retrieval methods. Throughout this whole process, it's important to think about setting up good data governance, incentives for participation, and ensuring everything complies with relevant rules and standards, so the scientific data stays safe and shareable long-term.

```

### If you are not preparing the data, who will prepare the data?  (Provide name and business)

```text

```

### Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

```
Indeed, only a small portion of the LAMOST DR9 dataset has been stored on the Filecoin network so far. However, the process was hindered by delays in receiving the Datacap allocation, which affected further storage of the DR9 dataset. Although some data has already been stored, the complete LAMOST DR9 dataset contains a vast amount of spectroscopic data, which is crucial for astronomical research. Ensuring the integrity of the entire dataset would maximize its scientific value.

This ensures not only that the full breadth of data is available for research purposes but also enhances the long-term preservation and accessibility of this critical information for the global scientific community. Despite the initial setback with the Datacap allocation, continuing to store the complete dataset is essential for supporting extensive and detailed astronomical studies.

```

### Please share a sample of the data

```text
http://www.lamost.org/dr9/
```

### Confirm that this is a public dataset that can be retrieved by anyone on the Network

- [x] I confirm

### If you chose not to confirm, what was the reason

```text

```

### What is the expected retrieval frequency for this data

Sporadic

### For how long do you plan to keep this dataset stored on Filecoin

1.5 to 2 years

### In which geographies do you plan on making storage deals

Greater China, Asia other than Greater China, North America, Europe, Australia (continent)

### How will you be distributing your data to storage providers

HTTP or FTP server, Shipping hard drives

### How did you find your storage providers

Slack, Filmine, Big Data Exchange

### If you answered "Others" in the previous question, what is the tool or platform you used

```text

```

### Please list the provider IDs and location of the storage providers you will be working with.

```
f01081419
f01084149
f01708981
f02825281
f02825675
f02826602
f02826762
f02827010
f02827109
f02827135
f02827843
f02827953
f03528948
f03528979
f03529412
f03558501
f03602120
f03610683
```

### How do you plan to make deals to your storage providers

Boost client, Lotus client, Singularity

### If you answered "Others/custom tool" in the previous question, enter the details here

```text

```

### Can you confirm that you will follow the Fil+ guideline

Yes

[DataCap Application] <DR 9> #87

Description

Version

DataCap Applicant

Project ID

Data Owner Name

Data Owner Country/Region

Data Owner Industry

Website

Social Media Handle

Social Media Type

What is your role related to the dataset

Total amount of DataCap being requested

Expected size of single dataset (one copy)

Number of replicas to store

Weekly allocation of DataCap requested

On-chain address for first allocation

Data Type of Application

Custom multisig

Identifier

Share a brief history of your project and organization

Is this project associated with other projects/ecosystem stakeholders?

If answered yes, what are the other projects/ecosystem stakeholders

Describe the data being stored onto Filecoin

Where was the data currently stored in this dataset sourced from

If you answered "Other" in the previous question, enter the details here

If you are a data preparer. What is your location (Country/Region)

If you are a data preparer, how will the data be prepared? Please include tooling used and technical details?

If you are not preparing the data, who will prepare the data? (Provide name and business)

Has this dataset been stored on the Filecoin network before? If so, please explain and make the case why you would like to store this dataset again to the network. Provide details on preparation and/or SP distribution.

Please share a sample of the data

Confirm that this is a public dataset that can be retrieved by anyone on the Network

If you chose not to confirm, what was the reason

What is the expected retrieval frequency for this data

For how long do you plan to keep this dataset stored on Filecoin

In which geographies do you plan on making storage deals

How will you be distributing your data to storage providers

How did you find your storage providers

If you answered "Others" in the previous question, what is the tool or platform you used

Please list the provider IDs and location of the storage providers you will be working with.

How do you plan to make deals to your storage providers

If you answered "Others/custom tool" in the previous question, enter the details here

Can you confirm that you will follow the Fil+ guideline

Metadata

Metadata

Assignees

Labels

Type

Projects

Milestone

Relationships

Development

Issue actions