Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

GRIB source feature for NWP emulator #3

Open
wants to merge 3 commits into
base: develop
Choose a base branch
from

Conversation

cducher
Copy link

@cducher cducher commented Feb 5, 2025

This PR does not contain a script to run the emulator, it only consists of the class that can then be used by an emulator. For reference, that's how using it can look like (tested in parallel environment):

int main(int argc, char** argv) {
  atlas::initialize(argc, argv);

  // ---- mpi ----
  size_t root = 0;
  size_t nprocs = eckit::mpi::comm().size();
  size_t rank = eckit::mpi::comm().rank();

  GRIBFileReader freader(argv[1], rank, root);

  // ---------------- grid  -------------------
  atlas::Grid grid(freader.getGridName());
  // Use the default partitioner for the given grid type
  atlas::grid::Distribution distribution(
    grid, atlas::util::Config("type", grid.partitioner().getString("type")) |
            atlas::util::Config("bands", nprocs));

  // ----------- function space ---------------
  atlas::functionspace::StructuredColumns fs_ = atlas::functionspace::StructuredColumns(
    grid, distribution, atlas::util::Config("levels", 1));

  // ---------------- fields ------------------
  std::vector<atlas::Field> fields;
  std::vector<atlas::Field> glb_fields;
  for (size_t i = 0; i < freader.getParams().size(); ++i) {
    fields.push_back(createEmptyField(freader.getParams()[i], fs_, 1));
    glb_fields.push_back(fs_.createField(
      atlas::option::name(fields[i].name() + "_glb") | atlas::option::datatype(fields[i].datatype()) |
      atlas::option::levels(fields[i].levels()) | atlas::option::variables(fields[i].variables()) |
      atlas::option::global(root)
    ));
  }

  // -------------- model steps ---------------
  while (!freader.hasReadAll()) {
    freader.readNextStep(fields, glb_fields);
    // Do something with your fields
    eckit::mpi::comm().barrier();
  }
  eckit::mpi::finaliseAllComms();
  return 0;
}

Example of results of the extreme event detection plugin run on storm Christian data from ERA5:
process3

@FussyDuck
Copy link

FussyDuck commented Feb 5, 2025

CLA assistant check
All committers have signed the CLA.

@codecov-commenter
Copy link

codecov-commenter commented Feb 5, 2025

Codecov Report

Attention: Patch coverage is 57.82313% with 372 lines in your changes missing coverage. Please review.

Project coverage is 48.90%. Comparing base (0355966) to head (e90687f).

Files with missing lines Patch % Lines
src/nwp_emulator/grib_file_reader.cc 0.00% 215 Missing ⚠️
src/nwp_emulator/config_reader.cc 72.37% 71 Missing ⚠️
src/nwp_emulator/nwp_data_provider.cc 66.97% 36 Missing ⚠️
src/nwp_emulator/config_reader_funcs.cc 82.87% 31 Missing ⚠️
src/nwp_emulator/nwp_emulator.cc 85.71% 10 Missing ⚠️
tests/nwp_emulator/nwp_emulator_plugin.h 70.00% 3 Missing ⚠️
src/nwp_emulator/data_reader.h 71.42% 2 Missing ⚠️
src/nwp_emulator/grib_file_reader.h 0.00% 2 Missing ⚠️
src/nwp_emulator/config_reader.h 93.75% 1 Missing ⚠️
tests/nwp_emulator/nwp_emulator_plugin.cc 92.30% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff              @@
##           develop       #3       +/-   ##
============================================
+ Coverage    36.59%   48.90%   +12.31%     
============================================
  Files           44       55       +11     
  Lines         1585     2476      +891     
  Branches        67      241      +174     
============================================
+ Hits           580     1211      +631     
- Misses        1005     1265      +260     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

@cducher cducher requested a review from wdeconinck February 6, 2025 08:52
@cducher cducher changed the base branch from feature/nwp-emulator to develop February 14, 2025 17:28
@cducher cducher mentioned this pull request Feb 14, 2025
Copy link
Collaborator

@dsarmany dsarmany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Review to be continued.

* @class BaseDataReader
* @brief Base class for handling different types of data sources for the emulator.
*/
class BaseDataReader {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would drop the 'Base' bit. It should be clear from the structure.

class GRIBFileReader : public BaseDataReader {
private:
std::vector<eckit::PathName> srcFilenames_;
FILE* currentFile_ = nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Perhaps you have an std::fstream instead of the C-style FILE*.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

As eccodes requires a FILE* and there is no easy translation between an fstream and a FILE* I'll stick to this for now

Copy link
Collaborator

@dsarmany dsarmany left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I'm done with reviewing it in full. See comments. Overall, very nicely structured work.

}
// Broadcast emulator source params from main reader (root)
// 1. Grid identifier
std::vector<char> gridNameBuffer(10); // typical grid names are 4-6 chars
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Please use std::string instead. std::vector<char> is really for generic buffers (i.e. even binary data). You may have to use a different broadcast overload, but it is supported in eckit::mpi.

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

It doesn't look like there is a broadcast overload that accepts string so the other way I see to do this is to broadcast the size of the string first and then the string as iterators like so:

size_t gridNameSize = gridName_.size();
eckit::mpi::comm().broadcast(gridNameSize, root_);
if (rank_ != root_) {
    gridName_.resize(gridNameSize);
}
eckit::mpi::comm().broadcast(gridName_.begin(), gridName_.end(), root_);

Is that what you have in mind ?

gridName_.resize(strlen(gridName_.c_str()));
}
// 2. Field names & metadata (parameters)
std::vector<char> paramBuffer;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same here, using std::string would make the subsequent code simpler.

eckit::mpi::comm().broadcast(paramBuffer, root_); // broadcast a single string to limit communication
if (rank_ != root_) {
std::string paramBufferStr(paramBuffer.begin(), paramBuffer.end());
paramBufferStr.resize(strlen(paramBufferStr.c_str()));
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

What is this line doing exactly? Isn't this the same thing as calling shrink_to_fit? See https://en.cppreference.com/w/cpp/string/basic_string/shrink_to_fit .

// For later consistency checks & emulator setup :
openGribFile(srcPath, true);
eckit::Log::info() << "Number of messages : " << std::to_string(count_) << std::endl;
char buffer[64];
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Why not std::string? Then you can still pass the C-style string to eccodes by calling c_str() on it.

Copy link
Author

@cducher cducher Mar 18, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

c_str is const while codes_get_string attempts to populate the buffer with the data it found in the codes handle (and the parameter buffer should be a char *). I later convert it back to string but I don't think it can work with a buffer of type string, unless I'm missing something ?

readMsgMetadata(_, paramMd);
params_.push_back(paramMd);
eckit::Log::info() << params_[i] << "; ";
if (i == count_ - 1) {
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would do it slightly differently: loop to count_ - 1, then it is a clean body of the for loop. Add the last element after the for loop.

gridName = std::string(buffer);
}
paramMd = "";
std::vector<const char*> keywords = {"shortName", "levtype", "level"};
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this an vector of std::string. The overall principle is that we want to make the translation between C++ and C at the latest possible point, which is at the point of calling the C API.

* @enum DataSourceType
* @brief Enumeration of the supported types of data source for the emulator.
*/
enum DataSourceType
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Use enum class.

*
* @return true if model data has been successfully generated for the step, false otherwise.
*/
bool getLocalStepData();
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I would remove the get prefix from all of these member functions, so localStepData, globalStepData, etc. This looks a bit boilerplate-y.

class NWPDataProvider {
private:
const DataSourceType sourceType_;
BaseDataReader* dataReader = nullptr;
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can you use an std::unique_ptr instead of a raw pointer?

Copy link
Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I address this point in a different PR so leaving as is here for now

switch (sourceType_) {
case DataSourceType::GRIB:
eckit::Log::info() << "Emulator will use GRIB files as data source from " << inputPath << std::endl;
dataReader = new GRIBFileReader(inputPath, rank, root);
Copy link
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

No 'naked' new in C++ code if at all possible. If the dataReader is an std::unique_ptr then you please insatiate it with std::make_unique.

}
// Steps should be stored in alphabetical order
std::sort(srcFilenames_.begin(), srcFilenames_.end());
srcPath = eckit::PathName(srcFilenames_[0]);
Copy link
Author

@cducher cducher Mar 13, 2025

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

  • I should test if srcFilenames_ is empty here in case no grib files were found in the source directory

@cducher cducher force-pushed the feature/nwp-emulator-grib-source branch 2 times, most recently from d592ba8 to ad30730 Compare March 19, 2025 16:48
@cducher cducher requested a review from dsarmany March 19, 2025 16:52
@cducher cducher force-pushed the feature/nwp-emulator-grib-source branch from dacb4aa to ffe3b93 Compare March 21, 2025 10:02
dsarmany
dsarmany previously approved these changes Mar 21, 2025
@cducher cducher force-pushed the feature/nwp-emulator-grib-source branch from ffe3b93 to 6ce826a Compare March 24, 2025 14:22
@cducher cducher force-pushed the feature/nwp-emulator-grib-source branch from 06a0f01 to 7e1343d Compare March 24, 2025 14:50
@cducher cducher force-pushed the feature/nwp-emulator-grib-source branch from 7e1343d to e90687f Compare March 24, 2025 15:50
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants