Skip to content

Latest commit

 

History

History
139 lines (118 loc) · 4.08 KB

File metadata and controls

139 lines (118 loc) · 4.08 KB

netCDF File Processing

Overview

netCDF files are processed separately from other file types. They go through compliance checking only and are stored in a dedicated netcdf_files table. They do NOT go through:

  • SAGAR-QC quality control tests
  • Superset dashboard generation
  • RAG embedding generation

Processing Flow

  1. File Detection: Files with .nc or .nc4 extensions are identified as netCDF files
  2. Metadata Extraction: Basic metadata (dimensions, variables, coordinates, attributes) is extracted using xarray
  3. Compliance Checking: IOOS Compliance Checker runs CF and ACDD standard checks
  4. Report Generation: Compliance report is generated in the same format as QC reports
  5. Storage: netCDF file is uploaded directly to Supabase storage unencrypted (no conversion to Parquet, no encryption)
  6. Database Storage: Metadata and compliance report are stored in netcdf_files table

Storage

netCDF files are stored unencrypted in Supabase storage, similar to unencrypted Parquet files. This allows:

  • Direct access by netCDF-compatible tools
  • No decryption overhead
  • Standard netCDF file format preservation

Unlike other file types that have both encrypted and unencrypted versions, netCDF files are stored in a single unencrypted format.

Database Schema

netcdf_files Table

CREATE TABLE netcdf_files (
    id BIGSERIAL PRIMARY KEY,
    original_filename TEXT NOT NULL,
    file_location TEXT NOT NULL,
    status TEXT NOT NULL DEFAULT 'pending',
    file_type TEXT DEFAULT 'netcdf',
    compliance_report JSONB,
    metadata_payload JSONB,
    created_at TIMESTAMPTZ DEFAULT NOW(),
    updated_at TIMESTAMPTZ DEFAULT NOW()
);

Compliance Report Structure

The compliance report follows the same structure as QC reports for consistent PDF generation:

{
  "file_name": "example.nc",
  "report_generated_at": "2024-01-01T00:00:00",
  "qc_version": "5.4.0",
  "report_type": "compliance",
  "summary": {
    "compliance_status": "GOOD|FAIR|POOR|CRITICAL",
    "overall_quality_score": 85.5,
    "overall_passed": true,
    "total_checkers": 2,
    "passed_checkers": 2,
    "failed_checkers": 0,
    "checkers_run": ["acdd", "cf"]
  },
  "detailed_metrics": {
    "overall_quality_score": 85.5,
    "criteria_used": "normal",
    "checker_scores": {
      "acdd": 90.0,
      "cf": 81.0
    }
  },
  "tests_executed": ["acdd", "cf"],
  "test_rationale": {
    "acdd": "ACDD compliance check explanation...",
    "cf": "CF compliance check explanation...",
    "criteria": "Compliance criteria level explanation..."
  },
  "test_results": {
    "acdd": {
      "test_name": "acdd",
      "scored_points": 45,
      "possible_points": 50,
      "score_percentage": 90.0,
      "passed": true,
      "errors": [],
      "warnings": [...],
      "info": [...]
    },
    "cf": {
      "test_name": "cf",
      "scored_points": 81,
      "possible_points": 100,
      "score_percentage": 81.0,
      "passed": true,
      "errors": [],
      "warnings": [...],
      "info": [...]
    }
  },
  "recommendations": [
    "File passes all compliance checks...",
    "Consider addressing warnings..."
  ]
}

API Response

When a netCDF file is processed, the API returns:

{
  "status": "success",
  "processed_file": "example.nc",
  "file_type": "netcdf",
  "metadata": {
    "dimensions": {...},
    "variables": [...],
    "coordinates": [...],
    "attributes": {...}
  },
  "compliance_report": {
    // Full compliance report as shown above
  },
  "upload_result": {...}
}

PDF Generation

The compliance report uses the same PDF generator as QC reports. The frontend can detect report_type: "compliance" to customize the title (e.g., "Compliance Assessment Report" instead of "Data Quality Assessment Report"), but the structure is identical for consistent formatting.

Compliance Standards

  • ACDD (Attribute Convention for Data Discovery): Validates metadata attributes for data discovery
  • CF (Climate and Forecast): Validates adherence to CF conventions for climate and forecast data

Both standards are checked with "normal" criteria by default.