-
Notifications
You must be signed in to change notification settings - Fork 122
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Add vstrtonum util as replacement for atof/strtod, etc. #19309
base: develop
Are you sure you want to change the base?
Conversation
src/common/utility/StringHelpers.h
Outdated
template<typename T> inline T _vstrtonum(char const *numstr, char **eptr, int /* unused */) { return static_cast<T>(strtold(numstr, eptr)); } | ||
|
||
// Specialize int/long cases to use int conversion strtol which with base of 0 can handle octal and hex also | ||
#define _VSTRTONUMI(T,F) template<> inline T _vstrtonum<T>(char const *numstr, char **eptr, int base) { return static_cast<T>(F(numstr, eptr, base)); } |
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
instead of macros, we can use std::function
along with the template to provide a generic way to forward and apply for each type
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
I don't think the macro is being used quite the way your comment suggests. Consumers of this interface use only the vstrtonum
templatized function...never a macro.
strtold()
would work for everything if we didn't care about handling bases other than 10 in conversions (e.g. octal or hex) and we would not need the specializations for integer data which include a base
argument.
This macro here is simply being used to easily instantiate several specializations for both signed and unsigned integer data to also support conversions involving bases other than 10. The specialization for unsigned cases is added error detection for passing negative values.
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Sure, why we don't just due a normal instantiation? Templates should be the tool, not sure why we need both templates + macros?
There was a problem hiding this comment.
Choose a reason for hiding this comment
The reason will be displayed to describe this comment to others. Learn more.
Not sure what the issue is with using macros.
That said, my reason for using them is that it makes a) for a lot less typing and reading and, in particular, b) reading a lot of text that is substantially similar and trying to identify what the key difference is between them.
IMHO, macros make it very clear that the only difference in the three instantiations is the type and function called to perform the conversion.
It occurs to me...maybe we should be thinking about unicode string data as well? |
From March 20, 2025 special topics meeting conversation...
|
…sit into feat-mcm86-12jan24-strtonum
Ok, @JustinPrivitera I looked at Given that new information, what do you think? |
Hi @markcmiller86. That makes sense then to not use I was overruled, however, by the collective wisdom of the team, so you should go ahead. |
@JustinPrivitera the concerns regarding "complexity" certainly resonate with me. More on that below. Speaking to the ...adding new features that do the same thing as features built in to C++ seems like..., what is proposed here does more than what C++ does. If you've ever read large ascii data files into VisIt and have them fail for some obscure reason way down in the middle of the file read, it can be next to impossible to figure out where (e.g. which line and text in the input file) VisIt is having trouble. Maybe the whole read fails and nothing is plottable or maybe something small fails, something is plottable but it gets the plot wrong due to a problem with some data that was read (and a reader that didn't bother to error check the input). Getting this right means each and every ascii reader needs to have been coded to account for all the possible ways ascii reads can fail, repeating that error detection and messaging code EVERYWHERE. Hardly anyone is willing to do that and so few readers do any of that. The work proposed here is meant to SIMPLIFY this for ANY cases where we read ASCII data into some internal integer/float/double data. And, if we're lucky, maybe even handle variations in conventions for handling things like decimal point, comma separator, octal or hex data, etc. That last part should come almost for free if we play our cards right here. As an aside, it would be most useful to report input file line numbers where failures occurred and this requires potentially many adjustments to existing logic loops where reading ascii data in readers is performed. Finally, all of this code needs to be fast because its in the critical path of problem-sized data reads. Ok, back to the "complexity" argument...I am confused about how using |
@markcmiller86 Thanks for your detailed and thorough response! I understand much better what we are after now. Making reading large ASCII files easier is great. Why is using this new utility an advantage when it comes to simple string to number conversions that occur incidentally in source code? Unrelated to reading data files.
In my view, building any tools on top of built-in language features is adding complexity. Sometimes that complexity is warranted, other times it is not. You've made an excellent case for having a utility for reading large ASCII data files, in which case the complexity is warranted. For simple conversions, it seems to me that built-in language features are more accessible to the average programmer than using a special utility unique to our project, that we have to educate all developers about. For example, I was confused by |
I am now rethinking Great follow-up — ✅
|
Task | strto* |
std::from_chars (C++17) |
---|---|---|
Integer parsing | 20–80 ns | 3–10 ns ✔️ (5–10× faster) |
Float parsing | 100–300 ns | 15–80 ns ✔️ (2–5× faster) |
Benchmarks from platforms like x86-64 with GCC ≥ 10, Clang ≥ 11, MSVC ≥ 2019. Actual numbers vary with string length and format.
🚀 Why Is std::from_chars
Faster?
- No locale support (unlike
strtol
/strtod
, which must obey locale rules). - No memory allocation, no
errno
, no thread-local state. - Designed for high performance: it's a low-level, parse-only tool (not format-aware like
std::stringstream
). - Can be inlined and SIMD-optimized by the standard library.
🧪 Example: Integer Parsing
#include <charconv>
#include <string_view>
#include <iostream>
void parse_fast(std::string_view input) {
int value;
auto result = std::from_chars(input.data(), input.data() + input.size(), value);
if (result.ec == std::errc()) {
std::cout << "Parsed value = " << value << "\n";
} else {
std::cerr << "Parse failed\n";
}
}
🧪 Example: Floating Point (C++17/20)
double d;
auto [ptr, ec] = std::from_chars(str.data(), str.data() + str.size(), d);
- Fast and non-allocating
- If the string is invalid,
ec == std::errc::invalid_argument
orec == std::errc::result_out_of_range
⚠️ Caveats
-
std::from_chars
for floating-point (double
,float
) was only added in C++17, but was implemented fully in:- GCC 11+
- Clang 13+ with libc++
- MSVC 2019+
-
from_chars
for floating-point may not be available or optimized in older compilers — check your STL version. -
No support for
long double
orhalf
(binary16) in currentfrom_chars
.
🧠 Bottom Line
- For integer parsing,
std::from_chars
is a clear winner — 5–10× faster thanstrtol
. - For floating-point, it's still 2–5× faster than
strtod
, but performance depends on STL version. - If you're parsing large volumes of ASCII numeric data and care about speed, use
from_chars
wherever supported.
Would you like a ready-to-run benchmark comparing strtol
vs from_chars
on your platform?
However, in the context of ASCII file I/O... Fantastic and practical question — and you're thinking in exactly the right direction. ✅ Short Answer:
But… it depends on:
Let’s dig in with actual numbers and some rules of thumb. ⏱️ Ballpark Timing Estimates
Example:If you're reading 1 million lines, each with 5 numbers:
🔍 Line-by-line I/O is Not FreeReading line-by-line with
✅ Optimization TipsIf performance starts to matter: 1. Use
|
Situation | Best Parser |
---|---|
Fast bulk parse, known format | std::from_chars |
Locale-specific input | strtod / std::istringstream + locale |
High-volume integer parsing | std::from_chars |
Reading ~10 values per line, moderate-size files | I/O dominates, don’t worry too much |
Massive input (e.g., GBs) | Optimize I/O & use from_chars |
✅ Final Answer:
If you're doing typical line-by-line parsing of scientific ASCII data, and not parsing thousands of numbers per line, then:
Yes — I/O performance will dominate. Use
from_chars
for its speed and precision, but don’t over-optimize until you have profiling data showing parsing is your bottleneck.
Let me know if you'd like a fast file reader pattern using mmap
, buffer-based scanning, or multi-threaded parsing!
Some late night chat GPT adventures... sounds like |
Having a wrapper method that standardizes our use cases is helpful. Say C++27 includes a better mouse trap, we update our wrapper implementation and don't have to update the entire code base. ( |
Description
This is a draft PR for developers to see where I am headed with this. It defines
vstrtonum<sometype>()
with some special sauce to handle things like default values, error checking, range checking, etc. Read the comment inStringHelpers.h
for an overview...visit/src/common/utility/StringHelpers.h
Lines 93 to 160 in dc5eefb