-
Notifications
You must be signed in to change notification settings - Fork 9
Feature/metk 133 obstype #98
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
base: develop
Are you sure you want to change the base?
Conversation
Codecov ReportAttention: Patch coverage is
Additional details and impacted files@@ Coverage Diff @@
## develop #98 +/- ##
===========================================
+ Coverage 66.89% 67.21% +0.31%
===========================================
Files 123 124 +1
Lines 7274 7165 -109
Branches 715 721 +6
===========================================
- Hits 4866 4816 -50
+ Misses 2408 2349 -59 ☔ View full report in Codecov by Sentry. 🚀 New features to boost your workflow:
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Thanks Emanuele.
The amount of return {a}
and ASSERT(x.size() == 1)
this introduces makes me wonder if it would have been better to treat obstype differently from the rest of the language (assuming it is the only place where this hierarchical structure will exist). It seems to have introduced quite a lot of complexity into the type expansion.
share/metkit/language.yaml
Outdated
- name: all | ||
group: | ||
- name: [nsd, conventional, non satellite data all] | ||
group: | ||
- name: [lsd, land surface data] | ||
group: | ||
- [1, s, synop land] |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I strongly think we should include some comments above obstype
explaining why it as this complex hierarchical structure.
src/metkit/mars/Type.cc
Outdated
ASSERT(r.contains("vals")); | ||
eckit::Value vv = r["vals"]; | ||
ASSERT(r.contains("values")); | ||
eckit::Value vv = r["values"]; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I wish you renamed vals -> values in a separate PR to reduce noise! But that's okay.
src/metkit/mars/Type.cc
Outdated
@@ -111,7 +111,7 @@ std::unique_ptr<Context> Context::parseContext(eckit::Value c) { | |||
//---------------------------------------------------------------------------------------------------------------------- | |||
|
|||
Type::Type(const std::string& name, const eckit::Value& settings) : | |||
name_(name), flatten_(true), multiple_(false), duplicates_(true) { | |||
name_(name), flatten_(true), multiple_(false), duplicates_(true), deduplicate_(false) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Do we really need duplicates_
and deduplicate
?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Assuming it is never valid to have duplicates_ = false and deduplicate = true, I wonder if we could replace the two bools with one enum, something like
enum class DuplicatePolicy { Error, Ignore, Deduplicate };
And then we check the value of this enum instead of the pair of bools. I personally struggle to parse things like deduplicate_ || !duplicates_
src/metkit/mars/Type.cc
Outdated
if (deduplicate_ || !duplicates_) { | ||
for (const auto& v : vv) { | ||
if ((deduplicate_ || !duplicates_) && seen.find(v) != seen.end()) { | ||
if (!duplicates_) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Can we rename duplicates_
to allowDuplicates_
? Assuming my understanding is correct and that name is accurate.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
(or see my enum class comment above)
src/metkit/mars/TypeAny.h
Outdated
@@ -32,7 +32,8 @@ class TypeAny : public Type { | |||
private: // methods | |||
|
|||
void print(std::ostream& out) const override; | |||
bool expand(const MarsExpandContext& ctx, std::string& value, const MarsRequest& request = {}) const override; | |||
[[nodiscard]] std::vector<std::string> expand(const MarsExpandContext& ctx, const std::string& value, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Any particular reason for using [[nodiscard]] here, but not in some of the other classes (e.g. TypeDate::expand) ?
src/metkit/odb/OdbMetadataDecoder.cc
Outdated
std::vector<std::string> tidyVal = t->tidy(stringVal); | ||
if (tidyVal.size() == 1 && stringVal == tidyVal[0]) // if t->tidy had no effect, set the original value | ||
gather_.setValue(keyword, val); | ||
else | ||
gather_.setValue(keyword, tidyVal); | ||
gather_.setValue(keyword, tidyVal[0]); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible for t->tidy(stringVal) to return an empty vector? (If so, line 90 is an error).
If we only ever use the 0th value in tidyVal, should we be asserting the length of tidyVal is 1? If not, why not (why is the 0th value the preferred value?)?
src/metkit/mars/TypeParam.h
Outdated
|
||
virtual bool expand(const MarsExpandContext& ctx, const MarsRequest& request, std::vector<std::string>& values, | ||
bool fail) const; | ||
// bool expand(const MarsExpandContext& ctx, const MarsRequest& request, std::vector<std::string>& values, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
delete
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
curious that the protected function is no longer required
src/metkit/mars/TypeEnum.cc
Outdated
firstName = eckit::StringTools::upper(firstName); | ||
} | ||
for (size_t i = 1; i < names.size(); ++i) { | ||
// handle aliases |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is this supposed to be a todo?
src/metkit/mars/TypeEnum.cc
Outdated
case 1: { | ||
if (v[0].empty()) { | ||
return {}; | ||
} | ||
StringManyMap::const_iterator k = mapping_.find(eckit::StringTools::lower(v[0])); | ||
ASSERT(k != mapping_.end()); | ||
cache_[val] = (*k).second; | ||
return {(*k).second}; | ||
} | ||
default: { | ||
cache_[val] = v; | ||
return v; | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
So if there's a single value, we return a canonical version, and if there are multiple values we return them all?
src/metkit/mars/MarsLanguage.cc
Outdated
names.insert({*j}); | ||
} | ||
else { | ||
names.insert((*k).second); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
names.insert({*j}); | |
} | |
else { | |
names.insert((*k).second); | |
for (const auto& value : best) { | |
if (const auto iter = aliases.find(value); iter != aliases.end()) { | |
names.insert(iter->second); | |
} | |
else { | |
names.insert({value}); |
src/metkit/mars/Type.cc
Outdated
[this, ctx, request](const std::string& s) { return this->tidy(ctx, s, request); }); | ||
|
||
for (auto& v : values) { | ||
auto expanded = expand(ctx, v, request); |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
auto expanded = expand(ctx, v, request); | |
for (const auto& value : values) { | |
auto expanded = expand(ctx, value, request); |
src/metkit/mars/TypeDate.cc
Outdated
} | ||
} | ||
else { | ||
if (tokens.size() == 1 && | ||
(!std::isdigit(value[0]) || value.size() <= 2)) { // month (i.e. TypeClimateMonthly) | ||
value = month(tokens[0]); | ||
return {month(tokens[0])}; | ||
} | ||
else { | ||
eckit::Date date(value); | ||
value = l2s(date.yyyymmdd()); | ||
return {l2s(date.yyyymmdd())}; | ||
} | ||
} | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::isdigit is in #include <cctype>
remove else
after return
} | |
} | |
else { | |
if (tokens.size() == 1 && | |
(!std::isdigit(value[0]) || value.size() <= 2)) { // month (i.e. TypeClimateMonthly) | |
value = month(tokens[0]); | |
return {month(tokens[0])}; | |
} | |
else { | |
eckit::Date date(value); | |
value = l2s(date.yyyymmdd()); | |
return {l2s(date.yyyymmdd())}; | |
} | |
} | |
} | |
// month-day (i.e. TypeClimateDaily) | |
std::string m = month(tokens[0]); | |
long d = s2l(tokens[1]); | |
return {m + "-" + l2s(d)}; | |
} | |
if (tokens.size() == 1 && | |
(!std::isdigit(value[0]) || value.size() <= 2)) { // month (i.e. TypeClimateMonthly) | |
return {month(tokens[0])}; | |
} | |
eckit::Date date(value); | |
return {l2s(date.yyyymmdd())}; | |
} |
src/metkit/mars/TypeEnum.cc
Outdated
for (size_t j = 0; j < val.size(); ++j) { | ||
std::string VV = val[j]; | ||
std::string v = eckit::StringTools::lower(VV); | ||
// LOG_DEBUG_LIB(LibMetkit) << "v[" << j << "] : " << v << std::endl; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Remove or log?
src/metkit/mars/MarsLanguage.cc
Outdated
std::string MarsLanguage::bestMatch(const MarsExpandContext& ctx, const std::string& name, | ||
const std::vector<std::string>& values, bool fail, bool quiet, bool fullMatch, | ||
const std::map<std::string, std::string>& aliases) { | ||
std::vector<std::string> MarsLanguage::bestMatch(const MarsExpandContext& ctx, const std::string& name, |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is it possible to remove parts of the functionality of this function. If I remember correctly, the fuzzy matching for names shouldn't be used anymore.
We are calling this function with fullmatch=true
in almost all cases (besides in TypeParam). The amount of boolean param make it really hard to decipher what is going on in each scenario.
|
||
bool TypeParam::expand(const MarsExpandContext& ctx, const MarsRequest& request, std::vector<std::string>& values, | ||
bool fail) const { | ||
void TypeParam::pass2(const MarsExpandContext& ctx, MarsRequest& request) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
The merge of expand
with pass2
combines two different steps now.
Is this what we want to have here? Also a comment would be very nice, what we want to achieve in the second pass.
pass2
is just to generic to derive any functional meaning from the name alone.
|
||
auto it = find(value); | ||
if (it != values_.end()) { | ||
return groups_.at(it->second).second; | ||
} |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
This function may misses a return value if it == values_.end()
@@ -38,10 +38,10 @@ | |||
static pthread_once_t once = PTHREAD_ONCE_INIT; | |||
static eckit::Value languages_; | |||
static std::set<std::string> verbs_; | |||
static eckit::StringDict verbAliases_; | |||
static std::map<std::string, std::string> verbAliases_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the includes are not used.
@@ -18,171 +18,137 @@ | |||
#include "metkit/mars/TypesFactory.h" |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Some of the includes are not used directly.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Same for src/metkit/mars/TypeParam.cc
auto tmp = (*it).second->expand(ctx, value, request); | ||
if (!tmp.empty()) { | ||
return tmp; | ||
std::string tmp = value; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
std::string tmp = value; |
if (!tmp.empty()) { | ||
return tmp; | ||
std::string tmp = value; | ||
if ((*it).second->expand(ctx, tmp, request)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
if ((*it).second->expand(ctx, tmp, request)) { | |
if ((*it).second->expand(ctx, value, request)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbkr, the code modifies value
only if expand() returns true
|
||
const Rule* rule = 0; | ||
bool fail = true; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Make this const. This isn't modified. I'd wish to remove it from the bestMatch function completely. Would be great to have the bestMatch function to always throw an error and have a wrapping function which makes it secure to remove the boolean parameters.
The other two boolean parameters can be deleted similarly: one is just for logging (set the logging level correctly and the other one is only modifying the score in the first line.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Definitely better, imo
@@ -79,9 +77,9 @@ class MarsLanguage : private eckit::NonCopyable { | |||
std::vector<std::pair<std::string, Type*>> typesByAxisOrder_; | |||
std::vector<std::string> keywords_; | |||
|
|||
StringMap aliases_; | |||
std::map<std::string, std::string> aliases_; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Would be good to consistently use StringDict or std::map<std::string, std::string>
@@ -15,6 +15,8 @@ | |||
|
|||
#pragma once | |||
|
|||
#include <mutex> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Is mutex used?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
ah yes, for std::once_flag
@@ -1040,11 +1041,13 @@ CASE("test_metkit_expand_MARSC-219") { | |||
const char* text = | |||
"retrieve,class=od,date=20231205,expver=0001,obstype=gpsro,stream=lwda,time=18,type=ai,target=\"reference." | |||
"E2RRc8.data\""; | |||
/// @todo OBSTYPE, OBSGROUP | |||
/// @todo OBSTYPE |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Does test_obstype cover this or no?
#pragma once | ||
|
||
#include "eckit/memory/NonCopyable.h" | ||
|
||
#include <iosfwd> |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
as you touch, we can remove unused include
#pragma once | |
#include "eckit/memory/NonCopyable.h" | |
#include <iosfwd> | |
#pragma once | |
#include <iosfwd> |
bool expand(const MarsExpandContext& ctx, std::string& value, const MarsRequest& /* request */) const override { | ||
return true; | ||
} | ||
bool expand(const MarsExpandContext&, std::string& value, const MarsRequest&) const override { return true; } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
while at it, remove value
or better:
bool expand(const MarsExpandContext&, std::string& value, const MarsRequest&) const override { return true; } | |
bool expand(const MarsExpandContext& /*ctx*/, std::string& /*value*/, | |
const MarsRequest& /*request*/) const override { | |
return true; |
@@ -221,6 +219,11 @@ class Type : public eckit::Counted { | |||
|
|||
virtual size_t count(const std::vector<std::string>& values) const; | |||
|
|||
protected: // members |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
members -> methods
virtual bool expand(const MarsExpandContext& ctx, std::string& value, const MarsRequest& request = {}) const; | ||
virtual void expand(const MarsExpandContext& ctx, std::vector<std::string>& values, | ||
const MarsRequest& request = {}) const; | ||
virtual bool expand(const MarsExpandContext& ctx, std::string& value, const MarsRequest& request = {}) const; | ||
|
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
not this PR but I'm not in favor of defaulted virtuals
one could do type.expand(ctx, value, {});
at call site
@@ -24,8 +24,7 @@ void TypeAny::print(std::ostream& out) const { | |||
out << "TypeAny[name=" << name_ << "]"; | |||
} | |||
|
|||
bool TypeAny::expand(const MarsExpandContext& /* ctx */, std::string& /* value */, | |||
const MarsRequest& /* request */) const { | |||
bool TypeAny::expand(const MarsExpandContext&, std::string& value, const MarsRequest&) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
obviously messed during merge or some git ninja
bool uppercase = false; | ||
if (settings.contains("uppercase")) { | ||
uppercase = settings["uppercase"]; | ||
void TypeEnum::addValue(const std::string& vv, uint16_t idx, bool allowDuplicates) const { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
signature is const std::string& value, uint16_t idx, bool allowDuplicates
but here becomes const std::string& vv ...
I'd do std::string value, uint16_t idx, bool allowDuplicates
and value = eckit::StringTools::lower(value)
values_.push_back(v); | ||
return groups_.at(idx).second; | ||
} | ||
else { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
no need for else here
if (!tmp.empty()) { | ||
return tmp; | ||
std::string tmp = value; | ||
if ((*it).second->expand(ctx, tmp, request)) { |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
@tbkr, the code modifies value
only if expand() returns true
@@ -221,7 +221,7 @@ Rule::Rule(const eckit::Value& matchers, const eckit::Value& values, const eckit | |||
precedence[v] = j; | |||
} | |||
|
|||
mapping_[v] = first; | |||
mapping_[v] = {first}; |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
mapping_[v] = {first}; | |
mapping_[v] = first; |
Description
Contributor Declaration
By opening this pull request, I affirm the following: