-
Notifications
You must be signed in to change notification settings - Fork 110
New issue
Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.
By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.
Already on GitHub? Sign in to your account
Metric.Timer has NaN for Mean #80
Comments
Issue arises in Metrics.Net - ExponentiallyDecayingReservoir.Update method (itemWeight goes to Infinity causing Nan for aggregates). The startTime field is instantiated in the constructor and reset in ResetReservoir method. If the instance of ExponentiallyDecayingReservoir is alive for 47319 seconds (with Default Alpha at 0.015), the Math.Exp (in Update method) returns double.Infinity causing Mean to be NaN.
The patch is to reset the reservoir if we hit infinity: Patch available at https://github.com/sunkor/Metrics.NET |
@PaulParau could this be merged into trunk? |
@sunkor, I understand what you're saying, however the Under which conditions have you observed |
Hi Paul I have left the organization where the dev team uses Metrics.Net, so unfortunately have to rely on my memory on how the issue gets triggered. If you initialize a Timer (internally using ExponentiallyDecayingReservoir) and keep it alive for 47319 seconds (just more than 13 hours), the Math.Exp returns double.Infinity. So if we init ResponseTime, and use it in a long running process (a service perhaps) to record metrics, we will run into the issue. |
I cannot reproduce the issue. The I believe the issue @hhansell is experiencing is caused by something other than this. |
@PaulParau, after logging out some values, I see that the NaN comes ultimately as the result of a list where all items have a weight of zero. Perhaps a change to the Mean function to handle this and return 0? This won't fix the bigger issue further down though.
Having logged out the values, here is an example using real data (from a logged version of the Rescale method in the ExponentiallyDecayingReservoir) that has lead to a weight of zero:
Ultimately, this comes down to the 'sampleWeight * scalingFactor' being too small for a double. 6.73314423785015E-24 * 6.30093530948943E-302 == double(0). I presume that the purpose of ResetReservoir is to prevent the weights from decaying to this point, however, I cannot see how this is ultimately called though. |
We also had found this issue within our application when using timers. We had a metric that posted once on application startup, and we would get the exact same error logged about 13hrs later. We found that the issue resided in the constructor of WeightedSnapshot class on the following line As time moves on and no updates to the metric were received i.e. not calling NewContext() in our case, the weight of the sample tends towards 0, thus after a period of time we have a divide by zero issue causing the mean to become NaN. The reason that there were no exceptions thrown for the divide by zero case, is because the Weight is a double and not an int, resulting in NaN (our sample being 0 too) instead of an exception. This is the fix we recommend. |
I can confirm existence of this issue, and proposed patch has fixed this issue for our application. @PaulParau can it be merged and new version published to nuget? |
This fix does indeed prevent NaN values for the weights, and ultimately for the mean value of a histogram snapshot. But this only fixes the symptoms, not the cause - the reservoir will essentially 'lose' values after ~13 hours of inactivity. I believe that this issue should be fixed at the RescaleReservoir level, to prevent weights from becoming zero - but I can't tell right now what the implications of such a change would be. So I'm going to make the change and publish a new package. Thanks @GitAtSmsPortal for the detailed analysis! |
I've fixed the issue & pushed a new package (v0.5.6-pre) to nuget. Thanks for all the help everyone! |
After leaving a process running for some time, I've come across the following issue.
Could someone confirm if this is a metrics.net issue or whether it is an issue for the Metrics.NET.InfluxDB project?
Thanks
public static readonly Timer ResponseTime = Metric.Timer("Response Time", Unit.Requests);
2017-07-05 05:45:05, 226 [22 ] Metrics: Unhandled exception in Metrics.NET Library Error while uploading 51 measurements (5.72 KiB) to InfluxDB over HTTP [http://localhost:8070/write?db=metrics&u=user&p=password&precision=s] [ResponseStatus: ProtocolError] [Response: {
"error":"partial write: unable to parse '
Response Time Active Sessions=0i,
Total Time=4i,
Count=1i,
Mean Rate=1.9858004319199677E-05,
1 Min Rate=2.96439387504748E-314,
5 Min Rate=5.0414622397472423E-76,
15 Min Rate=5.9199171854273278E-28,
Last=4.0835409999999994,
Min=4.0835409999999994,
Mean=NaN,
Max=4.0835409999999994,
StdDev=0,
Median=4.0835409999999994,
Sample Size=1i,
Percentile 75%=4.0835409999999994,
Percentile 95%=4.0835409999999994,
Percentile 98%=4.0835409999999994,
Percentile 99%=4.0835409999999994,
Percentile 99.9%=4.0835409999999994 1499197505': invalid number"}
The text was updated successfully, but these errors were encountered: