Skip to content

Feature/metk 133 obstype #98

New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Open
wants to merge 16 commits into
base: develop
Choose a base branch
from
Open

Conversation

danovaro
Copy link
Member

Description

Contributor Declaration

By opening this pull request, I affirm the following:

  • All authors agree to the Contributor License Agreement.
  • The code follows the project's coding standards.
  • I have performed self-review and added comments where needed.
  • I have added or updated tests to verify that my changes are effective and functional.
  • I have run all existing tests and confirmed they pass.

@codecov-commenter
Copy link

codecov-commenter commented Jul 15, 2025

Codecov Report

Attention: Patch coverage is 94.14414% with 13 lines in your changes missing coverage. Please review.

Project coverage is 67.21%. Comparing base (d08a04d) to head (aeac1b2).

Files with missing lines Patch % Lines
src/metkit/mars/MarsLanguage.cc 33.33% 4 Missing ⚠️
src/metkit/mars/TypeEnum.cc 95.94% 3 Missing ⚠️
src/metkit/mars/Type.cc 83.33% 2 Missing ⚠️
src/metkit/mars/MarsRequest.cc 0.00% 1 Missing ⚠️
src/metkit/mars/Parameter.cc 0.00% 1 Missing ⚠️
src/metkit/mars/Type.h 50.00% 1 Missing ⚠️
tests/test_filter.cc 80.00% 1 Missing ⚠️
Additional details and impacted files
@@             Coverage Diff             @@
##           develop      #98      +/-   ##
===========================================
+ Coverage    66.89%   67.21%   +0.31%     
===========================================
  Files          123      124       +1     
  Lines         7274     7165     -109     
  Branches       715      721       +6     
===========================================
- Hits          4866     4816      -50     
+ Misses        2408     2349      -59     

☔ View full report in Codecov by Sentry.
📢 Have feedback on the report? Share it here.

🚀 New features to boost your workflow:
  • ❄️ Test Analytics: Detect flaky tests, report on failures, and find test suite problems.

Copy link
Member

@ChrisspyB ChrisspyB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Thanks Emanuele.

The amount of return {a} and ASSERT(x.size() == 1) this introduces makes me wonder if it would have been better to treat obstype differently from the rest of the language (assuming it is the only place where this hierarchical structure will exist). It seems to have introduced quite a lot of complexity into the type expansion.

Comment on lines 1286 to 1292
- name: all
group:
- name: [nsd, conventional, non satellite data all]
group:
- name: [lsd, land surface data]
group:
- [1, s, synop land]
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I strongly think we should include some comments above obstype explaining why it as this complex hierarchical structure.

Comment on lines 88 to 89
ASSERT(r.contains("vals"));
eckit::Value vv = r["vals"];
ASSERT(r.contains("values"));
eckit::Value vv = r["values"];
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I wish you renamed vals -> values in a separate PR to reduce noise! But that's okay.

@@ -111,7 +111,7 @@ std::unique_ptr<Context> Context::parseContext(eckit::Value c) {
//----------------------------------------------------------------------------------------------------------------------

Type::Type(const std::string& name, const eckit::Value& settings) :
name_(name), flatten_(true), multiple_(false), duplicates_(true) {
name_(name), flatten_(true), multiple_(false), duplicates_(true), deduplicate_(false) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do we really need duplicates_ and deduplicate?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Assuming it is never valid to have duplicates_ = false and deduplicate = true, I wonder if we could replace the two bools with one enum, something like

enum class DuplicatePolicy { Error, Ignore, Deduplicate };

And then we check the value of this enum instead of the pair of bools. I personally struggle to parse things like deduplicate_ || !duplicates_

if (deduplicate_ || !duplicates_) {
for (const auto& v : vv) {
if ((deduplicate_ || !duplicates_) && seen.find(v) != seen.end()) {
if (!duplicates_) {
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Can we rename duplicates_ to allowDuplicates_? Assuming my understanding is correct and that name is accurate.

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

(or see my enum class comment above)

@@ -32,7 +32,8 @@ class TypeAny : public Type {
private: // methods

void print(std::ostream& out) const override;
bool expand(const MarsExpandContext& ctx, std::string& value, const MarsRequest& request = {}) const override;
[[nodiscard]] std::vector<std::string> expand(const MarsExpandContext& ctx, const std::string& value,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Any particular reason for using [[nodiscard]] here, but not in some of the other classes (e.g. TypeDate::expand) ?

Comment on lines 86 to 90
std::vector<std::string> tidyVal = t->tidy(stringVal);
if (tidyVal.size() == 1 && stringVal == tidyVal[0]) // if t->tidy had no effect, set the original value
gather_.setValue(keyword, val);
else
gather_.setValue(keyword, tidyVal);
gather_.setValue(keyword, tidyVal[0]);
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible for t->tidy(stringVal) to return an empty vector? (If so, line 90 is an error).
If we only ever use the 0th value in tidyVal, should we be asserting the length of tidyVal is 1? If not, why not (why is the 0th value the preferred value?)?


virtual bool expand(const MarsExpandContext& ctx, const MarsRequest& request, std::vector<std::string>& values,
bool fail) const;
// bool expand(const MarsExpandContext& ctx, const MarsRequest& request, std::vector<std::string>& values,
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

delete

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

curious that the protected function is no longer required

firstName = eckit::StringTools::upper(firstName);
}
for (size_t i = 1; i < names.size(); ++i) {
// handle aliases
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is this supposed to be a todo?

Comment on lines 161 to 174
case 1: {
if (v[0].empty()) {
return {};
}
StringManyMap::const_iterator k = mapping_.find(eckit::StringTools::lower(v[0]));
ASSERT(k != mapping_.end());
cache_[val] = (*k).second;
return {(*k).second};
}
default: {
cache_[val] = v;
return v;
}
}
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

So if there's a single value, we return a canonical version, and if there are multiple values we return them all?

Comment on lines 269 to 272
names.insert({*j});
}
else {
names.insert((*k).second);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
names.insert({*j});
}
else {
names.insert((*k).second);
for (const auto& value : best) {
if (const auto iter = aliases.find(value); iter != aliases.end()) {
names.insert(iter->second);
}
else {
names.insert({value});

[this, ctx, request](const std::string& s) { return this->tidy(ctx, s, request); });

for (auto& v : values) {
auto expanded = expand(ctx, v, request);
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
auto expanded = expand(ctx, v, request);
for (const auto& value : values) {
auto expanded = expand(ctx, value, request);

Comment on lines 159 to 171
}
}
else {
if (tokens.size() == 1 &&
(!std::isdigit(value[0]) || value.size() <= 2)) { // month (i.e. TypeClimateMonthly)
value = month(tokens[0]);
return {month(tokens[0])};
}
else {
eckit::Date date(value);
value = l2s(date.yyyymmdd());
return {l2s(date.yyyymmdd())};
}
}
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

std::isdigit is in #include <cctype>

remove else after return

Suggested change
}
}
else {
if (tokens.size() == 1 &&
(!std::isdigit(value[0]) || value.size() <= 2)) { // month (i.e. TypeClimateMonthly)
value = month(tokens[0]);
return {month(tokens[0])};
}
else {
eckit::Date date(value);
value = l2s(date.yyyymmdd());
return {l2s(date.yyyymmdd())};
}
}
}
// month-day (i.e. TypeClimateDaily)
std::string m = month(tokens[0]);
long d = s2l(tokens[1]);
return {m + "-" + l2s(d)};
}
if (tokens.size() == 1 &&
(!std::isdigit(value[0]) || value.size() <= 2)) { // month (i.e. TypeClimateMonthly)
return {month(tokens[0])};
}
eckit::Date date(value);
return {l2s(date.yyyymmdd())};
}

for (size_t j = 0; j < val.size(); ++j) {
std::string VV = val[j];
std::string v = eckit::StringTools::lower(VV);
// LOG_DEBUG_LIB(LibMetkit) << "v[" << j << "] : " << v << std::endl;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove or log?

std::string MarsLanguage::bestMatch(const MarsExpandContext& ctx, const std::string& name,
const std::vector<std::string>& values, bool fail, bool quiet, bool fullMatch,
const std::map<std::string, std::string>& aliases) {
std::vector<std::string> MarsLanguage::bestMatch(const MarsExpandContext& ctx, const std::string& name,
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is it possible to remove parts of the functionality of this function. If I remember correctly, the fuzzy matching for names shouldn't be used anymore.

We are calling this function with fullmatch=true in almost all cases (besides in TypeParam). The amount of boolean param make it really hard to decipher what is going on in each scenario.


bool TypeParam::expand(const MarsExpandContext& ctx, const MarsRequest& request, std::vector<std::string>& values,
bool fail) const {
void TypeParam::pass2(const MarsExpandContext& ctx, MarsRequest& request) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

The merge of expand with pass2 combines two different steps now.

Is this what we want to have here? Also a comment would be very nice, what we want to achieve in the second pass.

pass2 is just to generic to derive any functional meaning from the name alone.


auto it = find(value);
if (it != values_.end()) {
return groups_.at(it->second).second;
}
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

This function may misses a return value if it == values_.end()

@@ -38,10 +38,10 @@
static pthread_once_t once = PTHREAD_ONCE_INIT;
static eckit::Value languages_;
static std::set<std::string> verbs_;
static eckit::StringDict verbAliases_;
static std::map<std::string, std::string> verbAliases_;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the includes are not used.

@@ -18,171 +18,137 @@
#include "metkit/mars/TypesFactory.h"
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Some of the includes are not used directly.

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Same for src/metkit/mars/TypeParam.cc

auto tmp = (*it).second->expand(ctx, value, request);
if (!tmp.empty()) {
return tmp;
std::string tmp = value;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
std::string tmp = value;

if (!tmp.empty()) {
return tmp;
std::string tmp = value;
if ((*it).second->expand(ctx, tmp, request)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
if ((*it).second->expand(ctx, tmp, request)) {
if ((*it).second->expand(ctx, value, request)) {

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbkr, the code modifies value only if expand() returns true


const Rule* rule = 0;
bool fail = true;
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Make this const. This isn't modified. I'd wish to remove it from the bestMatch function completely. Would be great to have the bestMatch function to always throw an error and have a wrapping function which makes it secure to remove the boolean parameters.

The other two boolean parameters can be deleted similarly: one is just for logging (set the logging level correctly and the other one is only modifying the score in the first line.

Copy link
Member

@ChrisspyB ChrisspyB left a comment

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Definitely better, imo

@@ -79,9 +77,9 @@ class MarsLanguage : private eckit::NonCopyable {
std::vector<std::pair<std::string, Type*>> typesByAxisOrder_;
std::vector<std::string> keywords_;

StringMap aliases_;
std::map<std::string, std::string> aliases_;
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Would be good to consistently use StringDict or std::map<std::string, std::string>

@@ -15,6 +15,8 @@

#pragma once

#include <mutex>
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Is mutex used?

Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

ah yes, for std::once_flag

@@ -1040,11 +1041,13 @@ CASE("test_metkit_expand_MARSC-219") {
const char* text =
"retrieve,class=od,date=20231205,expver=0001,obstype=gpsro,stream=lwda,time=18,type=ai,target=\"reference."
"E2RRc8.data\"";
/// @todo OBSTYPE, OBSGROUP
/// @todo OBSTYPE
Copy link
Member

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Does test_obstype cover this or no?

Comment on lines +17 to 21
#pragma once

#include "eckit/memory/NonCopyable.h"

#include <iosfwd>
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

as you touch, we can remove unused include

Suggested change
#pragma once
#include "eckit/memory/NonCopyable.h"
#include <iosfwd>
#pragma once
#include <iosfwd>

bool expand(const MarsExpandContext& ctx, std::string& value, const MarsRequest& /* request */) const override {
return true;
}
bool expand(const MarsExpandContext&, std::string& value, const MarsRequest&) const override { return true; }
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

while at it, remove value or better:

Suggested change
bool expand(const MarsExpandContext&, std::string& value, const MarsRequest&) const override { return true; }
bool expand(const MarsExpandContext& /*ctx*/, std::string& /*value*/,
const MarsRequest& /*request*/) const override {
return true;

@@ -221,6 +219,11 @@ class Type : public eckit::Counted {

virtual size_t count(const std::vector<std::string>& values) const;

protected: // members
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

members -> methods

Comment on lines +190 to 193
virtual bool expand(const MarsExpandContext& ctx, std::string& value, const MarsRequest& request = {}) const;
virtual void expand(const MarsExpandContext& ctx, std::vector<std::string>& values,
const MarsRequest& request = {}) const;
virtual bool expand(const MarsExpandContext& ctx, std::string& value, const MarsRequest& request = {}) const;

Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

not this PR but I'm not in favor of defaulted virtuals
one could do type.expand(ctx, value, {}); at call site

@@ -24,8 +24,7 @@ void TypeAny::print(std::ostream& out) const {
out << "TypeAny[name=" << name_ << "]";
}

bool TypeAny::expand(const MarsExpandContext& /* ctx */, std::string& /* value */,
const MarsRequest& /* request */) const {
bool TypeAny::expand(const MarsExpandContext&, std::string& value, const MarsRequest&) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

obviously messed during merge or some git ninja

bool uppercase = false;
if (settings.contains("uppercase")) {
uppercase = settings["uppercase"];
void TypeEnum::addValue(const std::string& vv, uint16_t idx, bool allowDuplicates) const {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

signature is const std::string& value, uint16_t idx, bool allowDuplicates but here becomes const std::string& vv ...
I'd do std::string value, uint16_t idx, bool allowDuplicates and value = eckit::StringTools::lower(value)

values_.push_back(v);
return groups_.at(idx).second;
}
else {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

no need for else here

if (!tmp.empty()) {
return tmp;
std::string tmp = value;
if ((*it).second->expand(ctx, tmp, request)) {
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

@tbkr, the code modifies value only if expand() returns true

@@ -221,7 +221,7 @@ Rule::Rule(const eckit::Value& matchers, const eckit::Value& values, const eckit
precedence[v] = j;
}

mapping_[v] = first;
mapping_[v] = {first};
Copy link
Contributor

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Suggested change
mapping_[v] = {first};
mapping_[v] = first;

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

5 participants