diff --git a/.gitbook.yaml b/.gitbook.yaml index 1fc14649b..fdf7a4bb7 100644 --- a/.gitbook.yaml +++ b/.gitbook.yaml @@ -79,7 +79,7 @@ redirects: data-science/ocean.py/publish-flow: data-scientists/ocean.py/publish-flow.md data-science/ocean.py/remote-setup: data-scientists/ocean.py/remote-setup.md data-science/ocean.py/technical-details: data-scientists/ocean.py/technical-details.md - "developers/ocean.py": data-scientists/ocean.py/README.md + developers/ocean.py: data-scientists/ocean.py/README.md developers/ocean.py/compute-flow: data-scientists/ocean.py/compute-flow.md developers/ocean.py/consume-flow: data-scientists/ocean.py/consume-flow.md developers/ocean.py/datatoken-interface-tech-details: data-scientists/ocean.py/datatoken-interface-tech-details.md diff --git a/.gitbook/assets/c2d/Free Flow for Compute to Data - Part 1.png b/.gitbook/assets/c2d/Free Flow for Compute to Data - Part 1.png new file mode 100644 index 000000000..7a4803a01 Binary files /dev/null and b/.gitbook/assets/c2d/Free Flow for Compute to Data - Part 1.png differ diff --git a/.gitbook/assets/c2d/Free Flow for Compute to Data - Part 2.png b/.gitbook/assets/c2d/Free Flow for Compute to Data - Part 2.png new file mode 100644 index 000000000..1086e2745 Binary files /dev/null and b/.gitbook/assets/c2d/Free Flow for Compute to Data - Part 2.png differ diff --git a/.gitbook/assets/c2d/Paid Flow for Compute to Data - Part 1.png b/.gitbook/assets/c2d/Paid Flow for Compute to Data - Part 1.png new file mode 100644 index 000000000..3134e2e53 Binary files /dev/null and b/.gitbook/assets/c2d/Paid Flow for Compute to Data - Part 1.png differ diff --git a/.gitbook/assets/c2d/Paid Flow for Compute to Data - Part 2.png b/.gitbook/assets/c2d/Paid Flow for Compute to Data - Part 2.png new file mode 100644 index 000000000..be4e3e3b5 Binary files /dev/null and b/.gitbook/assets/c2d/Paid Flow for Compute to Data - Part 2.png differ diff --git a/.gitbook/assets/c2d/free-compute-flow-1.puml b/.gitbook/assets/c2d/free-compute-flow-1.puml new file mode 100644 index 000000000..f34dac4b8 --- /dev/null +++ b/.gitbook/assets/c2d/free-compute-flow-1.puml @@ -0,0 +1,110 @@ +@startuml "Free Flow for Compute to Data - Part 1" +title "Free Flow for Compute to Data - Part 1" + +skinparam sequenceArrowThickness 2 +skinparam roundcorner 10 +skinparam maxmessagesize 85 +skinparam sequenceParticipant underline + +actor "End User" as end_user +participant "Consumer\n(Ocean CLI)" as consumer +participant "Ocean.js" as ocean_js +participant "Ocean Node" as ocean_node +database "Ocean Node's Database\n(SQLiteCompute DB)" as db +participant "Smart Contracts" as smart_contracts + +note over ocean_node +When deploying Ocean Node, +make sure to export +**DOCKER_COMPUTE_ENVIRONMENTS**. +For quickstart node with c2d, please +check script **ocean-node-update.sh** +from ocean-node GitHub xwrepo. +end note + +group Select compute environment with free resources + + end_user -> consumer: Requests compute environments + consumer -> ocean_js + ocean_js -> ocean_node: **GET /computeEnvironments** + note over ocean_node + Filtering by chanId is optional. + end note + ocean_node -> ocean_node: Parses engine's exported compute environments. + note over ocean_node + Currently, only Docker + engine is supported. + Fee token for payment is included + in DOCKER_COMPUTE_ENVIRONMENTS, + default is OCEAN token. + end note + ocean_node --> ocean_js: Returns environments. + ocean_js --> consumer: Passes further the environments. + consumer --> end_user: Displays compute environments containing free + paid resources. + +end group + +end_user -> consumer: Fills in maxJobDuration and free available resources. +end_user -> consumer: Triggers start compute. + +group Start compute job + consumer -> ocean_js: Calls freeStartCompute with resources from env. + ocean_js -> ocean_node: **POST /freeCompute** + ocean_node -> ocean_node: Checks nonce, creates job ID. + alt Nonce and signature are invalid + ocean_node --> ocean_js: Returns 500, error 'Invalid nonce or signature, unable to proceed.' + ocean_js --> consumer: Returns error + consumer --> end_user + end + ocean_node -> ocean_node: Checks if the assets are orderable + have grant access to run compute jobs + group Credentials check + alt Policy server configured + ocean_node -> policy_server: Requests credentials validation using **startCompute** command + policy_server --> ocean_node: Success/failure response + else Policy server not configured + ocean_node -> ocean_node: Checks allow & deny lists of addresses or from access lists + end + alt Validation response failure + ocean_node --> ocean_js: 403 - Consumer address not authorized + ocean_js --> consumer + consumer --> end_user + end + end group + group Monitor compute job + ocean_node -> ocean_node: Tries to create docker environment within Docker engine class. + alt Job created successfully + ocean_node -> db: Saves job. + db --> ocean_node + ocean_node --> ocean_js: Returns job ID. + ocean_js --> consumer + consumer --> end_user: Displays job ID + else Job not created successfully - cleanup + group Cleanup job + alt Algorithm finished successfully + ocean_node -> ocean_node: Write algorithm logs to config temporary folder path. + end + ocean_node -> ocean_node: Kills container, remove volumes + ocean_node -> ocean_node: Removes temporary folders with algorithm and datasets + end group + end + alt Algorithm execution exceeds specified maxJobDuration + ocean_node -> ocean_node: Stops docker container immediatly + ocean_node -> db: Updates job status from **Running** to **Publishing Results** + db --> ocean_node + ocean_node -> ocean_node: Cleanup job started + note over ocean_node + Check group **Cleanup job** + end note + else Algorithm finishes its execution in time + ocean_node -> db: Updates job status from **Running** to **Publishing Results** + db --> ocean_node + ocean_node -> ocean_node: Cleanup job started - **claimLock** case + note over ocean_node + Check group **Cleanup job** + end note + end + + end group +end group + +@enduml \ No newline at end of file diff --git a/.gitbook/assets/c2d/free-compute-flow-2.puml b/.gitbook/assets/c2d/free-compute-flow-2.puml new file mode 100644 index 000000000..b6d6ebd80 --- /dev/null +++ b/.gitbook/assets/c2d/free-compute-flow-2.puml @@ -0,0 +1,37 @@ +@startuml "Free Flow for Compute to Data - Part 2" +title "Free Flow for Compute to Data - Part 2" + +skinparam sequenceArrowThickness 2 +skinparam roundcorner 10 +skinparam maxmessagesize 85 +skinparam sequenceParticipant underline + +actor "End User" as end_user +participant "Consumer\n(Ocean CLI)" as consumer +participant "Ocean.js" as ocean_js +participant "Ocean Node" as ocean_node +database "Ocean Node's Database\n(SQLiteCompute DB)" as db +participant "Smart Contracts" as smart_contracts + +group Get compute job status + consumer -> ocean_js: Calls computeStatus + ocean_js -> ocean_node: **GET /compute** + ocean_node -> db: Requests job data from specific C2D engine. + db --> ocean_node: Returns job data from specific C2D engine. + ocean_node --> ocean_js: Returns status. + ocean_js --> consumer + consumer --> end_user: Displays the progress of the job. +end group + +group Retrieve compute job results + end_user -> consumer: Requests results, provide path. + consumer -> ocean_js: Calls computeResult. + ocean_js -> ocean_node: **GET /computeResult** + ocean_node -> db: Requests job from specific C2D engine. + db --> ocean_node: Returns job from specific C2D engine. + ocean_node -> ocean_js: Returns streams for results files + ocean_js -> consumer: Returns job results + consumer -> end_user: Downloads results in the path. +end group + +@enduml \ No newline at end of file diff --git a/.gitbook/assets/c2d/paid-compute-flow-1.puml b/.gitbook/assets/c2d/paid-compute-flow-1.puml new file mode 100644 index 000000000..ccee1a98a --- /dev/null +++ b/.gitbook/assets/c2d/paid-compute-flow-1.puml @@ -0,0 +1,165 @@ +@startuml "Paid Flow for Compute to Data - Part 1" +title "Paid Flow for Compute to Data - Part 1" + +skinparam sequenceArrowThickness 2 +skinparam roundcorner 10 +skinparam maxmessagesize 85 +skinparam sequenceParticipant underline + +actor "End User" as end_user +participant "Consumer\n(Ocean CLI)" as consumer +participant "Ocean.js" as ocean_js +participant "Ocean Node" as ocean_node +participant "Policy Server" as policy_server +database "Ocean Node's Database\n(SQLiteCompute DB)" as db +participant "Smart Contracts" as smart_contracts + +legend top left +Assuming Ocean Node is running, +dataset and algorithm assets +are already published +and resolved by **Indexer**. +For more details regarding +publishing flow, check +ocean-cli **publish** flow. + +end legend + +note over ocean_node +When deploying Ocean Node, +make sure to export +**DOCKER_COMPUTE_ENVIRONMENTS**. +For quickstart node with c2d, please +check script **ocean-node-update.sh** +from ocean-node GitHub xwrepo. +end note + +group Select compute environment + + end_user -> consumer: Requests compute environments + consumer -> ocean_js + ocean_js -> ocean_node: **GET /computeEnvironments** + note over ocean_node + Filtering by chanId is optional. + end note + ocean_node -> ocean_node: Parses engine's exported compute environments. + note over ocean_node + Currently, only Docker + engine is supported. + Fee token for payment is included + in DOCKER_COMPUTE_ENVIRONMENTS, + default is OCEAN token. + end note + ocean_node --> ocean_js: Returns environments. + ocean_js --> consumer: Passes further the environments. + consumer --> end_user: Displays compute environments containing free + paid resources. + note over end_user + For this scenario, user will select paid + resources from environment id. + end note + +end group + +end_user -> consumer: Fills in maxJobDuration and resources. +end_user -> consumer: Triggers initialize compute. +group Initialize Compute + consumer -> ocean_js: Calls initialize compute method from Provider class. + ocean_js -> ocean_node: **POST /initializeCompute** + ocean_node -> ocean_node: Checks if the assets are orderable + have grant access to run compute jobs + group Credentials check + alt Policy server configured + ocean_node -> policy_server: Requests credentials validation using **initialize** command + policy_server --> ocean_node: Success/failure response + else Policy server not configured + ocean_node -> ocean_node: Checks allow & deny lists of addresses or from access lists + end + alt Validation response failure + ocean_node --> ocean_js: 403 - Consumer address not authorized + ocean_js --> consumer + consumer --> end_user + else Validation response success + note over ocean_node + Continue with provider fees. + end note + end + end group + loop For each asset (datasets + algorithm) + alt New order, new provider fees + ocean_node --> ocean_js: Returns validOrder = **false**, providerFees, payment. + ocean_js --> consumer + else Existing order, valid provider fees + ocean_node --> ocean_js: Returns validOrder = **orderTxId**, payment. + ocean_js --> consumer + else Expired order, new provider fees + ocean_node --> ocean_js: Returns validOrder = **false**, providerFees, payment. + ocean_js --> consumer + end + end loop + consumer --> end_user + note over end_user + Consults the price after + initialize response. + end note +end group +group Add funds + note over end_user + Make sure that end user's account + has enough of chain's native token + (e.g. ETH) for gas and payment + token/fee token. + Also consumerAddress of the selected + environment must have ETH for + gas and fee token -> **transfer** + **to consumerAddress necessary** + **funds.** + The amount is calculated per resource. + end note + +end group + +group Escrow funding + note over ocean_js + Checks before if + funds exists in Escrow and if + consumerAddress is already + authorized within limits. + end note + consumer -> ocean_js: Checks for funds in escrow (interacting with ocean.js). + note over ocean_js + Ocean.js checks balances for native chain + token, payment token, necessary allowances. + end note + ocean_js -> ocean_js: Use Datatoken class for payment token & approve escrow contract address. + ocean_js --> consumer + alt No funds & no auths + consumer -> ocean_js: Deposit amount from user into Escrow + ocean_js -> smart_contracts: Calls deposit smart contract function + ocean_js -> ocean_js: Wait for fetching tx. + consumer -> ocean_js: Authorize consumerAddress in Escrow. + ocean_js -> smart_contracts: Calls authorize smart contract function with maxLockedAmount, maxLockCounts, maxLockSeconds. + ocean_js -> ocean_js: Wait for fetching tx. + else Funds already deposited & consumerAddress authorized + note over consumer + Jump to start order using provider fees + end note + end +end group +group Start order using provider fees + loop For each asset (datasets + algorithm) + consumer -> ocean_js: Calls handleComputeOrder + ocean_js -> ocean_js: Checks if calling smart contracts **startOrder** or **reuseOrder**. + alt 1. New order, new provider fees\n-> startOrder + ocean_js -> smart_contracts: Asset datatoken invokes startOrder. + smart_contracts --> ocean_js: Transaction hash as proof for ordering. + ocean_js --> consumer: Returns transaction hash. + else 2. Existing order, valid provider fees\n-> returns existing valid order tx ID. + ocean_js -> consumer: Returns validOrder from initialize response. + else 3. Expired order, new provider fees\n-> reuseOrder + ocean_js -> smart_contracts: Asset datatoken invokes reuseOrder. + smart_contracts --> ocean_js: Transaction hash as proof for ordering. + ocean_js --> consumer: Returns transaction hash. + end + end loop +end group + +@enduml \ No newline at end of file diff --git a/.gitbook/assets/c2d/paid-compute-flow-2.puml b/.gitbook/assets/c2d/paid-compute-flow-2.puml new file mode 100644 index 000000000..6586054d6 --- /dev/null +++ b/.gitbook/assets/c2d/paid-compute-flow-2.puml @@ -0,0 +1,106 @@ +@startuml "Paid Flow for Compute to Data - Part 2" +title "Paid Flow for Compute to Data - Part 2" + +skinparam sequenceArrowThickness 2 +skinparam roundcorner 10 +skinparam maxmessagesize 85 +skinparam sequenceParticipant underline + +actor "End User" as end_user +participant "Consumer\n(Ocean CLI)" as consumer +participant "Ocean.js" as ocean_js +participant "Ocean Node" as ocean_node +database "Ocean Node's Database\n(SQLiteCompute DB)" as db +participant "Smart Contracts" as smart_contracts + +group Start compute job + consumer -> ocean_js: Calls startCompute with resources from env. + ocean_js -> ocean_node: **POST /compute** + alt Nonce and signature are invalid + ocean_node --> ocean_js: Returns 500, error 'Invalid nonce or signature, unable to proceed.' + ocean_js --> consumer: Returns error + consumer --> end_user + end + ocean_node -> ocean_node: Checks if the assets are orderable + have grant access to run compute jobs + group Credentials check + alt Policy server configured + ocean_node -> policy_server: Requests credentials validation using **startCompute** command + policy_server --> ocean_node: Success/failure response + else Policy server not configured + ocean_node -> ocean_node: Checks allow & deny lists of addresses or from access lists + end + alt Validation response failure + ocean_node --> ocean_js: 403 - Consumer address not authorized + ocean_js --> consumer + consumer --> end_user + else Validation response success + note over ocean_node + Continue with provider fees. + end note + end + end group + ocean_node -> ocean_node: Calculates price per specified resources, creates job ID. + ocean_node -> smart_contracts: Create lock in Escrow contract with maxLockedAmount. + smart_contracts --> ocean_node: Returns agreementId. + group Monitor compute job + ocean_node -> ocean_node: Tries to create docker environment within Docker engine class. + alt Job created successfully + ocean_node -> db: Saves job. + db --> ocean_node + ocean_node --> ocean_js: Returns job ID. + ocean_js --> consumer + consumer --> end_user: Displays job ID + else Job not created successfully - cleanup + group Cleanup job + alt Algorithm runTime > 0 + ocean_node -> smart_contracts: Calls claimLock with user as node owner getting paid. + smart_contracts --> ocean_node: Returns tx ID. + else Algorithm runTime = 0 + ocean_node -> smart_contracts: Calls cancelExpiredLocks without user as node owner getting paid. + smart_contracts --> ocean_node: Returns tx ID. + end + end group + end + alt Algorithm execution exceeds specified maxJobDuration + ocean_node -> ocean_node: Stops docker container immediatly + ocean_node -> db: Updates job status from **Running** to **Publishing Results** + db --> ocean_node + ocean_node -> ocean_node: Cleanup job started + note over ocean_node + Check group **Cleanup job** + end note + else Algorithm finishes its execution in time + ocean_node -> db: Updates job status from **Running** to **Publishing Results** + db --> ocean_node + ocean_node -> ocean_node: Cleanup job started - **claimLock** case + note over ocean_node + Check group **Cleanup job** + end note + end + + end group +end group + + +group Get compute job status + consumer -> ocean_js: Calls computeStatus + ocean_js -> ocean_node: **GET /compute** + ocean_node -> db: Requests job data from specific C2D engine. + db --> ocean_node: Returns job data from specific C2D engine. + ocean_node --> ocean_js: Returns status. + ocean_js --> consumer + consumer --> end_user: Displays the progress of the job. +end group + +group Retrieve compute job results + end_user -> consumer: Requests results, provide path. + consumer -> ocean_js: Calls computeResult. + ocean_js -> ocean_node: **GET /computeResult** + ocean_node -> db: Requests job from specific C2D engine. + db --> ocean_node: Returns job from specific C2D engine. + ocean_node -> ocean_js: Returns streams for results files + ocean_js -> consumer: Returns job results + consumer -> end_user: Downloads results in the path. +end group + +@enduml \ No newline at end of file diff --git a/.gitbook/assets/vscode/setup.png b/.gitbook/assets/vscode/setup.png new file mode 100644 index 000000000..458678680 Binary files /dev/null and b/.gitbook/assets/vscode/setup.png differ diff --git a/developers/compute-to-data-v2.0/README.md b/developers/compute-to-data-v2.0/README.md new file mode 100644 index 000000000..4e73bf52f --- /dev/null +++ b/developers/compute-to-data-v2.0/README.md @@ -0,0 +1,36 @@ +--- +description: Compute to data version 2 (C2dv2) +--- + +# Compute to data + +### Introduction + +Certain datasets, such as health records and personal information, are too sensitive to be directly sold. However, Compute-to-Data offers a solution that allows you to monetize these datasets while keeping the data private. Instead of selling the raw data itself, you can offer compute access to the private data. This means you have control over which algorithms can be run on your dataset. For instance, if you possess sensitive health records, you can permit an algorithm to calculate the average age of a patient without revealing any other details. + +Compute-to-Data effectively resolves the tradeoff between leveraging the benefits of private data and mitigating the risks associated with data exposure. It enables the data to remain on-premise while granting third parties the ability to perform specific compute tasks on it, yielding valuable results like statistical analysis or AI model development. + +Private data holds immense value as it can significantly enhance research and business outcomes. However, concerns regarding privacy and control often impede its accessibility. Compute-to-Data addresses this challenge by granting specific access to the private data without directly sharing it. This approach finds utility in various domains, including scientific research, technological advancements, and marketplaces where private data can be securely sold while preserving privacy. Companies can seize the opportunity to monetize their data assets while ensuring the utmost protection of sensitive information. + +Private data has the potential to drive groundbreaking discoveries in science and technology, with increased data improving the predictive accuracy of modern AI models. Due to its scarcity and the challenges associated with accessing it, private data is often regarded as the most valuable. By utilizing private data through Compute-to-Data, significant rewards can be reaped, leading to transformative advancements and innovative breakthroughs. + +The Ocean Protocol provides a compute environment that you can access at the following [address](https://1.c2d.nodes.oceanprotocol.com:8000/). Feel free to explore and utilize this platform for your needs. + +We suggest reading these guides to get an understanding of how compute-to-data works: + +### Architecture & Overview Guides + +* [Architecture](compute-to-data-architecture.md) +* [Datasets & Algorithms](compute-to-data-datasets-algorithms.md) +* [Writing Algorithms](compute-to-data-algorithms.md) +* [Compute options](compute-options.md) +* [Free Start Compute flow](free-compute-to-data-flow.md) +* [Paid Start Compute flow](paid-compute-to-data-flow.md) + +### Developer Guides + +* [How to use compute to data with ocean.js](../ocean.js/cod-asset.md) +* [How to use compute to data with ocean.py](../../data-scientists/ocean.py) +* [How to run free compute jobs with VSCode extension](../vscode/README.md) +* [How to run free and paid compute jobs with Ocean CLI](../ocean-cli/run-c2d.md) + diff --git a/developers/compute-to-data/compute-options.md b/developers/compute-to-data-v2.0/compute-options.md similarity index 100% rename from developers/compute-to-data/compute-options.md rename to developers/compute-to-data-v2.0/compute-options.md diff --git a/developers/compute-to-data/compute-to-data-algorithms.md b/developers/compute-to-data-v2.0/compute-to-data-algorithms.md similarity index 100% rename from developers/compute-to-data/compute-to-data-algorithms.md rename to developers/compute-to-data-v2.0/compute-to-data-algorithms.md diff --git a/developers/compute-to-data-v2.0/compute-to-data-architecture.md b/developers/compute-to-data-v2.0/compute-to-data-architecture.md new file mode 100644 index 000000000..19f9f8d04 --- /dev/null +++ b/developers/compute-to-data-v2.0/compute-to-data-architecture.md @@ -0,0 +1,85 @@ +--- +title: Compute-to-Data +description: Architecture overview +--- + +# Architecture + +Compute-to-Data (C2D) is a cutting-edge data processing paradigm that enables secure and privacy-preserving computation on sensitive datasets. + +In the C2D workflow, the following steps are performed: + +1. The consumer initiates a compute-to-data job by selecting the desired data asset and algorithm, and then, the orders are validated via the dApp used. +2. A dedicated and isolated execution container is created for the C2D job. +3. The execution pod loads the specified algorithm into its environment. +4. The execution pod securely loads the selected dataset for processing. +5. The algorithm is executed on the loaded dataset within the isolated execution pod. +6. The results and logs generated by the algorithm are securely returned to the user, based on nonce and signature generation. +7. The execution pod deletes the dataset, algorithm, and itself to ensure data privacy and security. + +

Compute architecture overview

+ +The interaction between the Consumer and the Ocean Node follows a specific workflow. To initiate the process, the Consumer contacts the Ocean Node by invoking the `POST /compute(did, algorithm, additionalDIDs)` function with parameters such as the data identifier (DID), algorithm, and additional DIDs if required. Upon receiving this request, the Ocean Node generates a unique job identifier (`XXXX`) and returns it to the Consumer. The Ocean Node then assumes the responsibility of overseeing the remaining steps. + +Throughout the computation process, the Consumer has the ability to check the status of the job by making a query to the Ocean Node using the `GET /compute(XXXX)` endpoint for status, providing the job identifier (`XXXX`) as a reference. + + +You have the option to initiate a compute job using one or more data assets. You can explore this functionality by utilizing the [ocean.py](../../data-scientists/ocean.py) and [ocean.js](../ocean.js) libraries. + +Here are the actors/components: + +* Consumers - The end users who need to use some computing services offered by the same Publisher as the data Publisher. +* Ocean Node - Monolith API that is handling the compute requests. + +Before the flow can begin, these pre-conditions must be met: + +* The Asset DDO has a `compute` service. +* The Asset DDO compute service must permit algorithms to run on it. +* The Asset DDO must specify an Ocean Provider endpoint exposed by the Publisher. + +### Access Control using Ocean Node + +Similar to the `access service`, the `compute service` within Ocean Protocol relies on the [Ocean Node](../ocean-node/README.md), which is a crucial component managed by the asset Publishers. The role of the Ocean Node is to facilitate interactions with users and handle the fundamental aspects of a Publisher's infrastructure, enabling seamless integration into the Ocean Protocol ecosystem. It serves as the primary interface for direct interaction with the infrastructure where the data is located. + + +The [Ocean Node](../ocean-node/README.md) encompasses the necessary credentials to establish secure and authorized interactions with the underlying infrastructure. Initially, this infrastructure may be hosted in cloud providers, although it also has the flexibility to extend to on-premise environments if required. By encompassing the necessary credentials, the Ocean Node ensures the smooth and controlled access to the infrastructure, allowing Publishers to effectively leverage the compute service within Ocean Protocol. + +The entire Compute-to-Data functionality is embedded in Ocean Node which includes handlers for operations that can be called via **HTTP** and **P2P** protocols: + +- `GetComputeEnvironments` - returns list of environments that can be selected to run the algorithm on +- `InitializeCompute` - generates provider fees necessary for asset's ordering +- `FreeStartCompute` - runs algorithms without necessary publish the assets on-chain (dataset and algorithm), using free resources from the selected environment +- `PaidStartCompute` - runs algorithms with on-chain assets (dataset and algorithm), using paid resources from the selected environment. The payment is requested at every start compute call, being handled by `Escrow` contract. +- `ComputeGetStatus` - retrieves compute job status. +- `ComputeStop` - stops compute job execution when the job is `Running`. +- `ComputeGetResult` - returns compute job results when job is `Finished`. + + +### C2D Engine + +The **C2D Engine** class within Ocean Node is in charge of orchestrating the compute infrastructure using `Docker` as backend where each compute job runs in an isolated [Container]. **C2D Engine** can be extended to run on multiple engine types, such as `Kubernetes` (where each compute job runs in an isolated [Kubernetes Pod](https://kubernetes.io/docs/concepts/workloads/pods/)), `Bachalau` and many more, currently it supports only **Docker**. + +When handler from Ocean Node (free or paid) is called, C2D Engine processes the job details (e.g. `id`, `payment`, `duration`) and manages the infrastructure necessary to complete the execution of the compute workflows. + +* Configuring the setup by downloading the compute job dependencies (datasets and algorithms). +* Creates the container including the algorithm to execute. +* Pulling the docker image in the algorithm container. + + +#### Configuration Phase + +One of its responsibility revolves around fetching and preparing the required assets and files, ensuring a smooth and seamless execution of the job. By meticulously handling the environment configuration, the **C2D Engine** guarantees that all necessary components are in place, setting the stage for a successful job execution. + +1. **Fetching Dataset Assets**: It downloads the files corresponding to datasets and saves them in the location `/data/inputs/DID/`. The files are named based on their array index ranging from 0 to X, depending on the total number of files associated with the dataset. +Datasets can be provided as the following supported formats: `did`, `url`, `arweave`, `ipfs`. +2. **Fetching Algorithm Files**: It retrieves the algorithm files and stores them in the `/data/transformations/` directory. The first file is named 'algorithm', and the subsequent files are indexed from 1 to X, based on the number of files present for the algorithm. Algorithms can be provided as the following supported formats: `did`, `url`, `arweave`, `ipfs`. +3. **Fetching DDOS**: Additionally, it fetches Decentralized Document Oriented Storage (DDOS) and saves them to the disk at the location `/data/ddos/`. +4. **Error Handling**: In case of any provisioning failures, whether during data fetching or algorithm processing, job status is updated in a SQLite database, and logs the relevant error messages. + +### Publishing Phase + +Publishing phase serves for efficient processing, logging, and uploading compute job outputs. C2D Engine streamlines the compute job management process, enabling easy and reliable handling of output data generated during computation tasks. +The outputs are published in `/data/outputs` folder which will be available for retrieving and `computeResult` is called by the time that job execution status is updated to `Finished` +in SQLite database. + +* C2D Engine does not provide storage capabilities; all state information is stored directly in the dedicated algorithm container diff --git a/developers/compute-to-data-v2.0/free-compute-to-data-flow.md b/developers/compute-to-data-v2.0/free-compute-to-data-flow.md new file mode 100644 index 000000000..26d9ee4c3 --- /dev/null +++ b/developers/compute-to-data-v2.0/free-compute-to-data-flow.md @@ -0,0 +1,101 @@ +# Free Compute Flow + +Free compute flow is designed to allow end users run their public algorithms on free resources of environments available from Ocean nodes network. + +## Prerequisites + +The prerequisite for this flow is the algorithm code which can be input for consumers components: [Ocean CLI](../ocean-cli/run-c2d.md), [VSCode Extension](../vscode/README.md) and it is open for integration with other systems (e.g. Ocean Enterprise Marketplace). + +## Flow Illustration + +

Sequence Diagram for Free Compute - Part 1

+ +

Sequence Diagram for Free Compute - Part 2

+ +## Flow Description + +### Setup +To run free compute jobs, the end user can deploy a node on its host of infrastructure setup or can use a hosted node from the Nodes network. +1. When end user deploys by itself the node, assure that `DOCKER_COMPUTE_ENVIRONMENTS` environment variable is exported before running the node. Those environments will be returned when `getComputeEnvironments` request is triggered in the consumer tool. +2. When end user uses an already hosted node, call `status` command on that node and see if `c2dClusters` are available or calling directly `getComputeEnvironments` command. + +As consumer tools for running free C2D, Ocean Protocol proposes the following: +- [VSCode extension](../vscode/README.md) +- [Ocean CLI](../ocean-cli/run-c2d.md) + +**Observation**: VSCode extension uses a dedicated OPF node URL. + +### Select compute environment +Each environment has details regarding ID, consumer address +of the environment operating system, architecture of the +operating system, total running jobs, fees for paid compute +resources and resources `min`, `max`, `inUse` for **paid** and **free** compute. + +In this scenario, `free` resources will be +selected by the user within available limits `min` and `max`. + +The consumer tool makes a request to the node **GET /computeEnvironments** and the node returns to the consumer tool the environments exported at the node startup from `DOCKER_COMPUTE_ENVIRONMENTS` variable. Consumer tool dispalys then to the end user the list of environemnts. + +The end user selects the free resources and fills in the consumer tool together with job duration that the user considers is needed +for the algorithm execution. + +### Free start compute +#### Nonce & signature check +Consumer tool calls through ocean.js `freeStartCompute` which requests in Ocean Node **POST /freeCompute**. Within Ocean Node, +nonce and signature provided by ocean.js is checked. In case of invalid nonce or signature, node returns __500, 'Invalid nonce or signature, unable to proceed.'__ + +#### Credentials check +If the node has configured `POLICY_SERVER_URL` and ddo contains credentials, the credentials check is performed in `Policy Server`, otherwise node performs credentials check for consumer address. + +Credentials checks are performed once at the DDO level, but also for services credentials within the DDO object. + +In case of failure, node returns to ocean.js __403, 'Error: Access to asset ${ddo.id} was denied'__ which will be passed back to the end user. + +After these checks are performed and are successful, job id is generated and C2D engine is called for the actual algorithm execution. + +#### C2D Engine + +The only supported engine for start compute (free and paid) is the one for Docker. +The following steps executed by C2D Docker engine class are: + +1. C2D Engine validates the Docker image by checking manifest operating system and operating system architecture with the ones from environment platform. The manifest from the Docker image is retrieved from the `tag` or `image digest hash` using Docker SDK. +If validation is failed, Node throws error withtin the engine: +__Unable to validate docker image__ and creation of the job stops. + +2. Creates the folders for datasets, algorithm and result folder of algorithm execution, such as `/data/inputs`, `/data/transformations`, `/data/ddos`, `/data/outputs`. + +3. Saves the job structure into `SQLite` database. + +4. Starts monitoring the job execution and persists journalizing the lifecycle status of the job in `SQLite` database. + +Whenever a job has started, an internal loop which monitors all the new jobs is triggered. The loop determines the lifecycle of a compute job execution. +**Lifecycle of a job according to statuses:** + +`JobStarted` -> `PullImage` or `PullImageFailed` -> `ConfiguringVolumes` or `VolumeCreationFailed` -> `Provisioning` or `ContainerCreationFailed` -> `RunningAlgorithm` or `AlgorithmFailed` -> `PublishingResults` or `ResultsUploadFailed` + +**Sequence of steps for internal loop:** +1. Pulling docker image for the algorithm - if failure -> throws error and returns to the consumer tool and updates job status `SQLite` database in `PullImageFailed`. +2. Configuring volumes for the dedicated algorithm container - if failure -> throws error and returns to the consumer tool and updates job status `SQLite` database in `VolumeCreationFailed`. +3. Create Docker container for the algorithm - if failure -> throws error and returns to the consumer tool and updates job status `SQLite` database in `ContainerCreationFailed`. +4. Triggers algorithm execution on dedicated container - if failure from the algorithm -> throws error and returns to the consumer tool and updates job status `SQLite` database in `AlgorithmFailed`. +5. Publish results in `/data/outputs` even if the algorithm execution was successful or not - if failure -> throws error and returns to the consumer tool and updates job status `SQLite` database in `ResultsUploadFailed`. +If publishing results step was executed successfully, the container and volumes will be deleted together with the folders +for datasets, algorithms and results. + +**Observation**: If a job exceeds its specified duration, C2DEngine internal loop will terminate the allocated container and volumes which facilitates algorithm's execution and sets the job to `PublishingResults`, meaning will perform a forced cleanup of the job setup. + +### Get job status + +To display the progress to the end user, the consumer tool requests the node at **GET /compute** with job ID for the job status through ocean.js method `computeStatus`. + +In case of request failure from node side, the error is retruned back to the consumer tool and displayed to the end user. + +### Retrieve compute job results + +If compute job status is `PublishingResults`, consumer tool will +call ocean.js `computeResult` method which requests from node +on endpoint `GET /computeResult`. Node returns to ocean.js the results content and ocean.js generates a downloadable URL to pass further to the consumer tools. + +In case of request failure from node side, the error is returned to the consumer tool and displayed to the end user. +Consumer tools received the downloadable URL and will fetch the BLOB content from it and store on end user's specified results folder path. + diff --git a/developers/compute-to-data-v2.0/ocean-node-API.md b/developers/compute-to-data-v2.0/ocean-node-API.md new file mode 100644 index 000000000..9feb3eea9 --- /dev/null +++ b/developers/compute-to-data-v2.0/ocean-node-API.md @@ -0,0 +1,301 @@ +--- +description: Compute to data version 2 (C2dv2) Node API +--- + +# Ocean Node API Specifications for C2D + +## GET /api/services/computeEnvironments + +Request payload: + +```json +{ + "command": "getComputeEnvironments", + "chainId": "8996", // optional, if not provided, the command will fetch the environments for all node supported chains + "node": "nodeId" // optional, if not provided, the command will fetch the environments from current node +} +``` + +Response: + +```json +[ + { + "id":"0x4170292f983ab0ca9fcc09630d61f5c30b313a0b1a9f3708254159154cdc27fe-0xe01b9c2d93fce9b07291803394b71f948330f192c53237828406cc84e83fb1cb", + "runningJobs":1, + "consumerAddress":"0x7D973DAbc9a81D3faAD1c3dD3EF6dF67631C85E0", + "platform":{ + "architecture":"x86_64", + "os":"Ubuntu 24.04.2 LTS" + }, + "fees":{ + "8996":[ + { + "feeToken":"0x2473f4F7bf40ed9310838edFCA6262C17A59DF64", + "prices":[ + { + "id":"cpu", + "price":1 + } + ] + } + ] + }, + "storageExpiry":604800, + "maxJobDuration":3600, + "resources":[ + { + "id":"cpu", + "total":4, + "max":4, + "min":1, + "inUse":1 + }, + { + "id":"ram", + "total":16766418944, + "max":16766418944, + "min":1000000000, + "inUse":1000000000 + }, + { + "id":"disk", + "total":1000000000, + "max":1000000000, + "min":0, + "inUse":0 + } + ], + "free":{ + "maxJobDuration":60, + "maxJobs":3, + "resources":[ + { + "id":"cpu", + "max":1, + "inUse":1 + }, + { + "id":"ram", + "max":1000000000, + "inUse":1000000000 + }, + { + "id":"disk", + "max":1000000000, + "inUse":0 + } + ] + }, + "runningfreeJobs":1 + } +] +``` + +## POST /api/services/freeCompute +Request payload: + +```json +{ + "command": "freeStartCompute", + "node": "nodeId", // optional, if not provided, the command will fetch the environments from current node + "consumerAddress": "0x", // it is the consumer wallet address + "signature": "hash", // we use the nonce as signature message and it is signed by consumer account + "environment": "0x4170292f983ab0ca9fcc09630d61f5c30b313a0b1a9f3708254159154cdc27fe-0xe01b9c2d93fce9b07291803394b71f948330f192c53237828406cc84e83fb1cb", // selected env with hash + "algorithm": { + "meta": { + "rawcode": "algorithmContent", // algorithm code can be parsed as string here + "container": { + "entrypoint": "python $ALGO", + "image": "oceanprotocol/c2d_examples", + "tag": "py-general", + "checksum": "sha:256:aasdd" // checksum of docker image + } + } + }, + "datasets": { // optional, free start compute does not require datasets, it can be triggered only with algorithm + [ + { + "documentId": "did:op:df45" + } + ] + }, + "resources": { // optional, node will fallback to available free resources if they are not specified + "id": "cpu", + "amount": 2 + }, + "maxJobDuration": 60, // optional, node will fallback to free environment maxJobDuration if not specified + "policyServer": { // optional, for session ID validation done by policy server from enterprise + "sessionId": "abcd-89fe" + }, + "output": { // optional + "publishAlgorithmLog": true, + "publishOutput": true + } +} +``` + +Respose: + +```json +{ + "owner": "consumerAddress", + "jobId": "jobId", + "dateCreated": "698765", // UNIX timestamp in milisecs + "dateFinished": null, // it just started + "status": 0, + "statusText": "Job started", + "results": [], + "maxJobDuration": 60, + "environment": "0x4170292f983ab0ca9fcc09630d61f5c30b313a0b1a9f3708254159154cdc27fe-0xe01b9c2d93fce9b07291803394b71f948330f192c53237828406cc84e83fb1cb" +} +``` + +## POST /api/services/initializeCompute +Request payload + + +Response + +```json +{ + "algorithm":{ + "validOrder":false, + "did":"did:op:7e2d9943c58e1960d1b04ec1e62ef060f6051668ceb81415e9894fd29b38fb67", + "serviceId":"did:op:7e2d9943c58e1960d1b04ec1e62ef060f6051668ceb81415e9894fd29b38fb67", + "datatoken":"0x7F0960586391A2Db440f1c0571503B3a1c19fD0A", + "chainId":8996, + "consumerAddress":"0xe08A1dAe983BC701D05E492DB80e0144f8f4b909", + "providerFee":{ + "providerFeeAddress":"0xe08A1dAe983BC701D05E492DB80e0144f8f4b909", + "providerFeeToken":"0x2473f4F7bf40ed9310838edFCA6262C17A59DF64", + "providerFeeAmount":"0", + "providerData":"0x7b226474223a22307837463039363035383633393141324462343430663163303537313530334233613163313966443041222c226964223a2264623136346331623938316534643239373465393065363162646131323135313265363930396331303335633930386436383933336165346366616261366230227d", + "v":27, + "r":"0x9a047f125b9180d8adc1d56e42389075a8bd6726c44383aad03e8e85e5058c61", + "s":"0x692d2ccb234b057c66e91217692884dc2417c3382ce96e76716e069bcd3568b0", + "validUntil":86400 + } + }, + "datasets":[ + { + "validOrder":false, + "did":"did:op:4e325bb46657ced0fbb37b80d475b54da0af3101ed161d5ae616450a35d3daa5", + "serviceId":"did:op:4e325bb46657ced0fbb37b80d475b54da0af3101ed161d5ae616450a35d3daa5", + "datatoken":"0x9145eAF48e8052aC112bc0cFea03df1DBB16F42B", + "chainId":8996, + "consumerAddress":"0xe08A1dAe983BC701D05E492DB80e0144f8f4b909", + "providerFee":{ + "providerFeeAddress":"0xe08A1dAe983BC701D05E492DB80e0144f8f4b909", + "providerFeeToken":"0x2473f4F7bf40ed9310838edFCA6262C17A59DF64", + "providerFeeAmount":"0", + "providerData":"0x7b226474223a22307839313435654146343865383035326143313132626330634665613033646631444242313646343242222c226964223a2263636233393863353064366162643562343536653864373234326264383536613137363761383930623533376332663863313062613862386131306536303235227d", + "v":28, + "r":"0xdfe25cbf9bad89598efeb49d64d74c6fd19419d60b77c0881b22bfbcf3ddf320", + "s":"0x7acaf7a51b799514c44c61f0e21508e9a74bd3abb5faef2021501d37db3e8742", + "validUntil":86400 + } + } + ], + "payment":{ + "escrowAddress":"0xE4f7c64C52085A6df2c7c2972466EEf3ba3aD081", + "payee":"0xe08A1dAe983BC701D05E492DB80e0144f8f4b909", + "chainId":8996, + "minLockSeconds":1500, + "token":"0x2473f4F7bf40ed9310838edFCA6262C17A59DF64", + "amount":"15000000000000000000" + } +} +``` + +## POST /api/services/compute + +Payload example: + +```json +"datasets": [ + { + "documentId": "did:op:9a0ada50a883e7e8af61fa313ff835ddf1416103ecf237eaa5bf9e8c5bfdc0d2", + "serviceId": "ccb398c50d6abd5b456e8d7242bd856a1767a890b537c2f8c10ba8b8a10e6025", + "transferTxId": "0xa9b907717cfb516590ae97c3dc203d06abaeb609568380bff4f0e75bc0a810be" + } +], +"algorithm": { + "documentId": "did:op:e7487c1eaa91015c833d0b0ae0b766d7a14e64dd2fd039b8708aef9775aa77e2", + "serviceId": "db164c1b981e4d2974e90e61bda121512e6909c1035c908d68933ae4cfaba6b0", + "meta": { + "language": "", + "version": "0.1", + "container": { + "entrypoint": "node $ALGO", + "image": "node", + "tag": "latest", + "checksum": "sha256:1155995dda741e93afe4b1c6ced2d01734a6ec69865cc0997daf1f4db7259a36" + } + }, + "transferTxId": "0x052aafc6961114216d82bd9e618109a40a5172a3d870ee0ed13d9a207b247689" +} + +``` + +Response: + +```json +[ + { + "owner": "0x529043886F21D9bc1AE0feDb751e34265a246e47", + "did": null, + "jobId": "4c4e56df-f134-4a9d-9bb9-5f328cf170a4", + "dateCreated": "1748863401.699", + "dateFinished": null, + "status": 40, + "statusText": "Running algorithm", + "results": [], + "inputDID": null, + "algoDID": null, + "agreementId": null, + "environment": "0x27bb418eca824bdc1fe31a946fcc094297282bc11fb6405612b40108f5d5c5e3-0x269eb044c415a04b372e807b5014b22a2775c07effb60d9c65b47c0569c7dce3", + "resources": [ + { "id": "cpu", "amount": 1 }, + { "id": "ram","amount": 1000000000 }, + { "id": "disk", "amount": 0 } + ], + "isFree": false, + "algoStartTimestamp": "1748863415.506", + "algoStopTimestamp": "0", + "maxJobDuration": 900 + } +] +``` + +## GET /api/services/compute + +Response: + +```json +[ + { + "owner": "0x529043886F21D9bc1AE0feDb751e34265a246e47", + "did": null, + "jobId": "4c4e56df-f134-4a9d-9bb9-5f328cf170a4", + "dateCreated": "1748863401.699", + "dateFinished": null, + "status": 40, + "statusText": "Running algorithm", + "results": [], + "inputDID": null, + "algoDID": null, + "agreementId": null, + "environment": "0x27bb418eca824bdc1fe31a946fcc094297282bc11fb6405612b40108f5d5c5e3-0x269eb044c415a04b372e807b5014b22a2775c07effb60d9c65b47c0569c7dce3", + "resources": [ + { "id": "cpu", "amount": 1 }, + { "id": "ram","amount": 1000000000 }, + { "id": "disk", "amount": 0 } + ], + "isFree": false, + "algoStartTimestamp": "1748863415.506", + "algoStopTimestamp": "0", + "maxJobDuration": 900 + } +] +``` diff --git a/developers/compute-to-data-v2.0/paid-compute-to-data-flow.md b/developers/compute-to-data-v2.0/paid-compute-to-data-flow.md new file mode 100644 index 000000000..ed9589082 --- /dev/null +++ b/developers/compute-to-data-v2.0/paid-compute-to-data-flow.md @@ -0,0 +1,168 @@ +# Paid Compute Flow + + +## Prerequisites + +## Flow Illustration + +

Sequence Diagram for Paid Compute - Part 1

+ +

Sequence Diagram for Paid Compute - Part 2

+ +## Flow Description + +### Setup +To run paid compute jobs, the end user can deploy a node on its host of infrastructure setup or can use a hosted node from the Nodes network. +1. When end user deploys by itself the node, assure that `DOCKER_COMPUTE_ENVIRONMENTS` environment variable is exported before running the node. Those environments will be returned when `getComputeEnvironments` request is triggered in the consumer tool. +2. When end user uses an already hosted node, call `status` command on that node and see if `c2dClusters` are available or calling directly `getComputeEnvironments` command. + +As consumer tools for running free C2D, Ocean Protocol proposes the following: +- [Ocean CLI](../ocean-cli/run-c2d.md) + +### Select compute environment +Each environment has details regarding ID, consumer address +of the environment operating system, architecture of the +operating system, total running jobs, fees for paid compute +resources and resources `min`, `max`, `inUse` for **paid** and **free** compute. + +In this scenario, paid resources (the resources which are **not** marked with `free`) will be +selected by the user within available limits `min` and `max`. + +The consumer tool makes a request to the node **GET /computeEnvironments** and the node returns to the consumer tool the environments exported at the node startup from `DOCKER_COMPUTE_ENVIRONMENTS` variable. Consumer tool dispalys then to the end user the list of environemnts. + +The end user selects the free resources and fills in the consumer tool together with job duration that the user considers is needed +for the algorithm execution. + +### Initialize compute + +The end user calls method `initializeCompute` from the consumer tool for provider fees generation, needed for assets ordering and payment details according to **maximum job duration** input from the end user and to the **prefered resources**. + +The consumer tool calls ocean.js dedicated method for initialize, followed by +request to **POST /initializeCompute** from Node to return the provider fees and payment details. + +#### Environment check +Ocean Node verifies if the passed environment id is valid from ocean.js, in case of failure, node returns to ocean.js the following error: __500, 'Invalid C2D Environment'__ propagated to consumer tool and displayed to end user. + +#### Resources check +Ocean Node does not allow input resources to be outside limits +`min` and `max`, therefore it will throw and error with status code **500** and a message with `Not enough resources`. + +#### Token payment check for environment +Ocean Node performs a validation if the token passed from ocean.js is available for the selected environment. In case of failure, node returns to ocean.js the following error: __500, 'This compute env does not accept payments on chain'__ propagated to consumer tool and displayed to end user. + +#### Escrow support for chain ID check +Ocean Node validates if for the specified chain ID, payments in escrow are supported. In case of failure, node returns to ocean.js the following error: __500, 'Cannot handle payments on chainId'__ propagated to consumer tool and displayed to end user. + +#### Credentials check +If the node has configured `POLICY_SERVER_URL` and ddo contains credentials, the credentials check is performed in `Policy Server`, otherwise node performs credentials check for consumer address. Credentials checks are performed once at the DDO level, but also for services credentials within the DDO object. + +In case of failure, node returns to ocean.js __403, 'Error: Access to asset ${ddo.id} was denied'__ which will be passed back to the end user. + + +#### Provider fees check +For each **orderable asset** (datasets and algorithm), Ocean Node validates provider fees availability according to these 3 scenarios: + +**1. New order, new provider fees** +Ocean Node returns to ocean.js `validOrder` false, because there was not a valid order before executed and new provider fees needs to be returned further until end user, including payment details with calculated cost per specified resources and job duration. + +**2. Existing order, valid provider fees** +Ocean Node returns to ocean.js `validOrder` the existing order transaction ID, because the order has still valid provider fees. Only escrow payment is returned to cocean.js and further to consumer tool and end user. + +**3. Expired order, new provider fees** +Ocean Node returns to ocean.js `validOrder` false, because existing provider fees have expired and new provider fees needs to be returned further until end user, including payment details. + +After these checks are performed and are successful, provider fees and escrow payment object are returned to ocean.js, then consumer tool and displayed to the end user to consult regarding the payment amount for compute power usage. + +### Funds check and Escrow Payment +Consumer tool calls a method available in ocean.js, `verifyFundsForEscrowPayment`, for checking balances and allowances of token required for Escrow payment and native chain token for gas fees for consumer address of the environment and for end user's address. + +Moreover, the function `verifyFundsForEscrowPayment` from coean.js calls Escrow smart contract to deposit the amount retrieved from initialize response into Escrow contract and to authorize the consumer address of the environment if it wasn't already performed before. + + +### Ordering assets +Consumer tool calls ocean.js method `handleComputeOrder` for each asset involved in compute job (datasets and algorithm). +This method trigger Datatoken smart contract internally for `startOrder` or `reuseOrder` according to availability of provider fees returned from initialize response. + +**1. New order, new provider fees-> startOrder** +`startOrder` transaction generates the order transaction ID used further when starting compute as proof of ordering the asset. + +**2. Existing order, valid provider fees-> returns existing valid order tx ID** +No smart contract is needed, existing valid order transaction id is returned. + +**3. Expired order, new provider fees-> reuseOrder** +`reuseOrder` transaction generates new order transaction ID used further when starting compute as proof of ordering the asset. + +### Start compute +#### Nonce & signature check +Consumer tool calls through ocean.js `startCompute` which requests in Ocean Node **POST /compute**. Within Ocean Node, +nonce and signature provided by ocean.js is checked. In case of invalid nonce or signature, node returns to ocean.js __500, 'Invalid nonce or signature, unable to proceed.'__ which will be passed back to the end user. + + +#### Credentials check +If the node has configured `POLICY_SERVER_URL` and ddo contains credentials, the credentials check is performed in `Policy Server`, otherwise node performs credentials check for consumer address. In case of failure, node returns to ocean.js __403, 'Error: Access to asset ${ddo.id} was denied'__ which will be passed back to the end user. + + +#### Validation of the algorithm for dataset +Ocean Node checks if the algorithm has defined `publisherTrustedAlgorithms` or `publisherTrustedAlgorithmPublishers` lists within the `compute` service of the dataset and there are 3 conditions: +- if these are empty, then the validation is **successful**. +- if `publisherTrustedAlgorithms` contains the algorithm DID, then the validation is **successful**, otherwise validation is **false**. +- if `publisherTrustedAlgorithmPublishers` contains the algorithm `nftAddress`, then the validation is **successful**, otherwise validation is **false**. + +#### Valid order check for asset +Ocean Node checks the order transaction ID from each asset as proof or ordering the asset. In case of failure, node returns to ocean.js __500, 'TxId Service ${elem.transferTxId} is not valid for DDO'__ which will be passed back to the end user. + +After these checks are performed and are successful, job id is generated and C2D engine is called for the actual algorithm execution. + +#### Create lock in Escrow +With the deposited funds, for each startCompute requests, Ocean Node calls Escrow smart contracts to create a lock, to apply the calculated cost per resources and per `maxJobDuration` prefered by end user, for that particular compute job. `createLock` generates a transactionID which is represented as **agreement ID**, used as proof for paying the compute power usage. + +In case of failure, node returns to ocean.js smart contract error with status code `400`. + + +#### C2D Engine + +The only supported engine for start compute (free and paid) is the one for Docker. +The following steps executed by C2D Docker engine class are: + +1. C2D Engine validates the Docker image by checking manifest operating system and operating system architecture with the ones from environment platform. The manifest from the Docker image is retrieved from the `tag` or `image digest hash` using Docker SDK. +If validation is failed, Node throws error withtin the engine: +__Unable to validate docker image__ and creation of the job stops. + +2. Creates the folders for datasets, algorithm and result folder of algorithm execution, such as `/data/inputs`, `/data/transformations`, `/data/ddos`, `/data/outputs`. + +3. Saves the job structure into `SQLite` database. + +4. Starts monitoring the job execution and persists journalizing the lifecycle status of the job in `SQLite` database. + +Whenever a job has started, an internal loop which monitors all the new jobs is triggered. The loop determines the lifecycle of a compute job execution. +**Lifecycle of a job according to statuses:** + +`JobStarted` -> `PullImage` or `PullImageFailed` -> `ConfiguringVolumes` or `VolumeCreationFailed` -> `Provisioning` or `ContainerCreationFailed` -> `RunningAlgorithm` or `AlgorithmFailed` -> `PublishingResults` or `ResultsUploadFailed` + +**Sequence of steps for internal loop:** +1. Pulling docker image for the algorithm - if failure -> throws error and returns to the consumer tool and updates job status `SQLite` database in `PullImageFailed`. +2. Configuring volumes for the dedicated algorithm container - if failure -> throws error and returns to the consumer tool and updates job status `SQLite` database in `VolumeCreationFailed`. +3. Create Docker container for the algorithm - if failure -> throws error and returns to the consumer tool and updates job status `SQLite` database in `ContainerCreationFailed`. +4. Triggers algorithm execution on dedicated container - if failure from the algorithm -> throws error and returns to the consumer tool and updates job status `SQLite` database in `AlgorithmFailed`. +5. Publish results in `/data/outputs` even if the algorithm execution was successful or not - if failure -> throws error and returns to the consumer tool and updates job status `SQLite` database in `ResultsUploadFailed`. +If publishing results step was executed successfully, the container and volumes will be deleted together with the folders +for datasets, algorithms and results. +For paid compute jobs, before cleanup the job, all the expired locks will be cancelled calling **Escrow Smart Contract**. + +**Observation**: If a job exceeds its specified duration, C2DEngine internal loop will terminate the allocated container and volumes which facilitates algorithm's execution and sets the job to `PublishingResults`, meaning will perform a forced cleanup of the job setup. + +### Get job status + +To display the progress to the end user, the consumer tool requests the node at **GET /compute** with job ID for the job status through ocean.js method `computeStatus`. + +In case of request failure from node side, the error is retruned back to the consumer tool and displayed to the end user. + +### Retrieve compute job results + +If compute job status is `PublishingResults`, consumer tool will +call ocean.js `computeResult` method which requests from node +on endpoint `GET /computeResult`. Node returns to ocean.js the results content and ocean.js generates a downloadable URL to pass further to the consumer tools. + +In case of request failure from node side, the error is returned to the consumer tool and displayed to the end user. +Consumer tools received the downloadable URL and will fetch the BLOB content from it and store on end user's specified results folder path. + diff --git a/developers/contracts/escrow.md b/developers/contracts/escrow.md new file mode 100644 index 000000000..93105375b --- /dev/null +++ b/developers/contracts/escrow.md @@ -0,0 +1,21 @@ +# C2D Payment - Escrow + +## What is the purpose of Escrow? +Escrow smart contract represents a contract used between a payer, being the end user, and a payee, being the application or the component that is performing a paid task. + +Applied to Ocean Protocol new version of Compute-to-Data, Escrow smart contract facilitates the payment for new version of paid compute by locking the amount available in the contract for algorithm execution at each compute job creation. + +## What can payer do with Escrow? +The `payer` flow looks like: +- payer deposits funds in payment token accepted by the `payee` +- payer authorizes `payee` by setting max amount, max process time for the service. + +## What can payee do with Escrow? +The `payee` flow looks like: +- payer requests for service (such as `compute`) offchain +- payee computes the maximum amount and locks that amount in the Escrow contract +- payee performs the service +- payee takes the actual amount from the lock and releases back the remaining. + +## Appendix +For more details regarding paid compute flow and how it is integrated with [Escrow contract](https://github.com/oceanprotocol/contracts/blob/main/contracts/escrow/Escrow.sol), kindly consult [this dedicated section](../compute-to-data-v2.0/paid-compute-to-data-flow.md). \ No newline at end of file diff --git a/developers/ocean.js/cod-asset.md b/developers/ocean.js/cod-asset.md index a1853c8e7..22b2e43c1 100644 --- a/developers/ocean.js/cod-asset.md +++ b/developers/ocean.js/cod-asset.md @@ -25,9 +25,8 @@ Please note that the implementation details of Compute-to-Data can vary dependin * [Install the dependencies](configuration.md#setup-dependencies) * [Create a configuration file](configuration.md#create-a-configuration-file) -{% hint style="info" %} -The variable **AQUARIUS\_URL** and **PROVIDER\_URL** should be set correctly in `.env` file -{% endhint %} + +The variable **NODE\_URL** should be set correctly in `.env` file #### Create a script that starts compute to data using an already published dataset and algorithm @@ -52,57 +51,74 @@ const algorithmDid = "did:op:a419f07306d71f3357f8df74807d5d12bddd6bcd738eb0b4614 const algorithm = await await oceanConfig.aquarius.resolve(algorithmDid); // Let's fetch the compute environments and choose the free one - const computeEnv = computeEnvs[resolvedDatasetDdo.chainId].find( - (ce) => ce.priceMin === 0 - ) - - // Request five minutes of compute access - const mytime = new Date() - const computeMinutes = 5 - mytime.setMinutes(mytime.getMinutes() + computeMinutes) - const computeValidUntil = Math.floor(mytime.getTime() / 1000 - - // Let's initialize the provider for the compute job - const asset: ComputeAsset[] = { - documentId: dataset.id, - serviceId: dataset.services[0].id - } - - const algo: ComputeAlgorithm = { - documentId: algorithm.id, - serviceId: algorithm.services[0].id - } - - const providerInitializeComputeResults = await ProviderInstance.initializeCompute( - assets, - algo, - computeEnv.id, - computeValidUntil, - providerUrl, - await consumerAccount.getAddress() - ) - - await approve( - consumerAccount, - config, - await consumerAccount.getAddress(), - addresses.Ocean, - datasetFreAddress, - '100' - ) - - await approve( - consumerAccount, - config, - await consumerAccount.getAddress(), - addresses.Ocean, - algoFreAddress, - '100' - ) + computeEnvs = await ProviderInstance.getComputeEnvironments(providerUrl) + const computeEnv = computeEnvs[0] // it is only one environment with paid and free resources - const fixedRate = new FixedRateExchange(fixedRateExchangeAddress, consumerAccount) - const buyDatasetTx = await fixedRate.buyDatatokens(datasetFreAddress, '1', '2') - const buyAlgoTx = await fixedRate.buyDatatokens(algoFreAddress, '1', '2') + // Let's choose paid available resources + const resources: ComputeResourceRequest[] = [ + { + id: 'cpu', + amount: 2 + }, + { + id: 'ram', + amount: 1000000000 + }, + { + id: 'disk', + amount: 0 + } + ] + const assets: ComputeAsset[] = [ + { + documentId: dataset.id, + serviceId: dataset.services[0].id + } + ] + const dtAddressArray = [dataset.services[0].datatokenAddress] + const algo: ComputeAlgorithm = { + documentId: algorithm.id, + serviceId: algorithm.services[0].id + } + + // Request five minutes of compute access + const mytime = new Date() + const computeMinutes = 5 + mytime.setMinutes(mytime.getMinutes() + computeMinutes) + const computeValidUntil = Math.floor(mytime.getTime() / 1000) + + // Let's initialize the provider fees and escrow payment for the compute job + const providerInitializeComputeResults = await ProviderInstance.initializeCompute( + assets, + algo, + computeEnv.id, + paymentToken, + computeValidUntil, + providerUrl, + consumerAccount, + resources + ) + + // Escrow adding funds for paid compute + const escrow = new EscrowContract( + ethers.utils.getAddress(providerInitializeComputeResults.payment.escrowAddress), + consumerAccount + ) + + const amountToDeposit = ( + providerInitializeComputeResults.payment.amount * 2 // make it double + ).toString() + + const chainId = (await consumerAccount.provider.getNetwork()).chainId + // Verifying funds + await escrow.verifyFundsForEscrowPayment( + computeEnv.fees[chainId][0].feeToken, + computeEnv.consumerAddress, + await unitsToAmount(consumerAccount, paymentToken, amountToDeposit), + providerInitializeComputeResults.payment.amount.toString(), + providerInitializeComputeResults.payment.minLockSeconds.toString(), + '10' + ) // We now order both the dataset and the algorithm @@ -124,12 +140,16 @@ const algorithmDid = "did:op:a419f07306d71f3357f8df74807d5d12bddd6bcd738eb0b4614 // Start the compute job for the given dataset and algorithm const computeJobs = await ProviderInstance.computeStart( - providerUrl, - consumerAccount, - computeEnv.id, - assets[0], - algo - ) + providerUrl, + consumerAccount, + computeEnv.id, + assets, + algo, + computeJobDuration, + paymentToken, + computeEnv.resources, + chainId + ) return computeJobs[0].jobId diff --git a/developers/compute-to-data/README.md b/developers/old-infrastructure/compute-to-data/README.md similarity index 100% rename from developers/compute-to-data/README.md rename to developers/old-infrastructure/compute-to-data/README.md diff --git a/developers/compute-to-data/compute-to-data-architecture.md b/developers/old-infrastructure/compute-to-data/compute-to-data-architecture.md similarity index 88% rename from developers/compute-to-data/compute-to-data-architecture.md rename to developers/old-infrastructure/compute-to-data/compute-to-data-architecture.md index 67c109d89..7acaee79a 100644 --- a/developers/compute-to-data/compute-to-data-architecture.md +++ b/developers/old-infrastructure/compute-to-data/compute-to-data-architecture.md @@ -17,14 +17,14 @@ In the C2D workflow, the following steps are performed: 6. The results and logs generated by the algorithm are securely returned to the user. 7. The execution pod deletes the dataset, algorithm, and itself to ensure data privacy and security. -

Compute architecture overview

+

Compute architecture overview

The interaction between the Consumer and the Provider follows a specific workflow. To initiate the process, the Consumer contacts the Provider by invoking the `start(did, algorithm, additionalDIDs)` function with parameters such as the data identifier (DID), algorithm, and additional DIDs if required. Upon receiving this request, the Provider generates a unique job identifier (`XXXX`) and returns it to the Consumer. The Provider then assumes the responsibility of overseeing the remaining steps. Throughout the computation process, the Consumer has the ability to check the status of the job by making a query to the Provider using the `getJobDetails(XXXX)` function, providing the job identifier (`XXXX`) as a reference. {% hint style="info" %} -You have the option to initiate a compute job using one or more data assets. You can explore this functionality by utilizing the [ocean.py](../../data-scientists/ocean.py) and [ocean.js](../ocean.js) libraries. +You have the option to initiate a compute job using one or more data assets. You can explore this functionality by utilizing the [ocean.py](../../../data-scientists/ocean.py/compute-flow.md) and [ocean.js](../../ocean.js/) libraries. {% endhint %} Now, let's delve into the inner workings of the Provider. Initially, it verifies whether the Consumer has sent the appropriate datatokens to gain access to the desired data. Once validated, the Provider interacts with the Operator-Service, a microservice responsible for coordinating the job execution. The Provider submits a request to the Operator-Service, which subsequently forwards the request to the Operator-Engine, the actual compute system in operation. @@ -46,9 +46,9 @@ Before the flow can begin, these pre-conditions must be met: ### Access Control using Ocean Provider -Similar to the `access service`, the `compute service` within Ocean Protocol relies on the [Ocean Provider](../old-infrastructure/provider/), which is a crucial component managed by the asset Publishers. The role of the Ocean Provider is to facilitate interactions with users and handle the fundamental aspects of a Publisher's infrastructure, enabling seamless integration into the Ocean Protocol ecosystem. It serves as the primary interface for direct interaction with the infrastructure where the data is located. +Similar to the `access service`, the `compute service` within Ocean Protocol relies on the [Ocean Provider](../provider/), which is a crucial component managed by the asset Publishers. The role of the Ocean Provider is to facilitate interactions with users and handle the fundamental aspects of a Publisher's infrastructure, enabling seamless integration into the Ocean Protocol ecosystem. It serves as the primary interface for direct interaction with the infrastructure where the data is located. -The [Ocean Provider](../old-infrastructure/provider/) encompasses the necessary credentials to establish secure and authorized interactions with the underlying infrastructure. Initially, this infrastructure may be hosted in cloud providers, although it also has the flexibility to extend to on-premise environments if required. By encompassing the necessary credentials, the Ocean Provider ensures the smooth and controlled access to the infrastructure, allowing Publishers to effectively leverage the compute service within Ocean Protocol. +The [Ocean Provider](../provider/) encompasses the necessary credentials to establish secure and authorized interactions with the underlying infrastructure. Initially, this infrastructure may be hosted in cloud providers, although it also has the flexibility to extend to on-premise environments if required. By encompassing the necessary credentials, the Ocean Provider ensures the smooth and controlled access to the infrastructure, allowing Publishers to effectively leverage the compute service within Ocean Protocol. ### Operator Service diff --git a/developers/compute-to-data/compute-to-data-datasets-algorithms.md b/developers/old-infrastructure/compute-to-data/compute-to-data-datasets-algorithms.md similarity index 100% rename from developers/compute-to-data/compute-to-data-datasets-algorithms.md rename to developers/old-infrastructure/compute-to-data/compute-to-data-datasets-algorithms.md diff --git a/developers/compute-to-data/compute-workflow.md b/developers/old-infrastructure/compute-to-data/compute-workflow.md similarity index 100% rename from developers/compute-to-data/compute-workflow.md rename to developers/old-infrastructure/compute-to-data/compute-workflow.md diff --git a/developers/vscode/README.md b/developers/vscode/README.md index 9a137b4c1..7942bd358 100644 --- a/developers/vscode/README.md +++ b/developers/vscode/README.md @@ -35,13 +35,14 @@ VS Code 1.96.0 or higher - Custom Compute Node: Enter your own node URL or use the default Ocean Protocol node - Wallet Integration: Use auto-generated wallet or enter private key for your own wallet -- Custom Docker Images. If you need a custom environment with your own dependencies installed, you can use a custom docker image. Default is oceanprotocol/algo_dockers (Python) or node (JavaScript) +- Custom Docker Images. If you need a custom environment with your own dependencies installed, you can use a custom docker image. Default is `oceanprotocol/c2d_examples` (Python - tag: `py-general`) or node (JavaScript - tag: `js-general`) - Docker Tags: Specify version tags for your docker image (like python-branin or latest) - Algorithm: The vscode extension automatically detects open JavaScript or Python files. Or alternatively you can specify the algorithm file manually here. -- Dataset: Optional JSON file for input data +Algorithms can be provided as the following supported formats: `did`, `url`, `arweave`, `ipfs`, `rawcode`. +- Dataset: Optional JSON file for input data. Datasets can be provided as the following supported formats: `did`, `url`, `arweave`, `ipfs`. - Results Folder: Where computation results will be saved -
Ocean Protocol VSCode Extension Optional Setup
Optional Setup Configuration
+
Ocean Protocol VSCode Extension Optional Setup
Optional Setup Configuration
## Contributing