Experiment with some clustering algorithms on top of the similarity metric #8

marco-c · 2017-03-16T00:17:37Z

No description provided.

mansimarkaur · 2017-03-23T12:01:15Z

I want to work on this. Could you please provide a description explaining this issue a little. Thanks!

marco-c · 2017-03-25T00:53:39Z

We have a way to evaluate the distance between two crash traces (WMD, but in the future we might experiment with other distance metrics #9), we can use this distance to cluster the stack traces in groups.

We should test with some clustering algorithms (http://scikit-learn.org/stable/modules/clustering.html) and see how they perform.

If the implemented algorithm turns out to be too slow (it's possible, as WMD is really slow), we can try two things:

see if alternative distance metrics (Experiment with alternative distance metrics to WMD #9) are fast enough;
create a clustering algorithm customized for our task. E.g. instead of clustering on the complete list of stack traces, we could implement a two-level clustering, where the first level is generated by the algorithm currently used on Socorro [signature] and the second level is used to split stack traces from a given signature in multiple groups. Or, where the first level is generated by a simper distance metric and the second level by WMD.

But let's not worry about this slowness problem for now. Let's try with the WMD distance and a well-known algorithm first.

aditya-iitd · 2017-03-28T12:59:52Z

With WMD, we can only use those clustering algorithms in which metric used is distance between points or others (eg-: Euclidean distance ) can also be used?

marco-c · 2017-03-30T23:51:14Z

With WMD, we can only use those clustering algorithms in which metric used is distance between points or others (eg-: Euclidean distance ) can also be used?

Can you rephrase this question? Euclidean distance is a distance between points too.

mansimarkaur · 2017-03-31T18:39:34Z

How exactly should we compare the various algorthims?
One metric would be speed which can be covered by speed benchmarks.
Another one would be accuracy, how do we go about testing this?

marco-c · 2017-04-08T14:12:07Z

Another one would be accuracy, how do we go about testing this?

A possible approach is #39.

ibrahimsharaf added the enhancement label Feb 14, 2018

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Experiment with some clustering algorithms on top of the similarity metric #8

Experiment with some clustering algorithms on top of the similarity metric #8

marco-c commented Mar 16, 2017

mansimarkaur commented Mar 23, 2017 •

edited

Loading

marco-c commented Mar 25, 2017 •

edited

Loading

aditya-iitd commented Mar 28, 2017

marco-c commented Mar 30, 2017

mansimarkaur commented Mar 31, 2017

marco-c commented Apr 8, 2017

Experiment with some clustering algorithms on top of the similarity metric #8

Experiment with some clustering algorithms on top of the similarity metric #8

Comments

marco-c commented Mar 16, 2017

mansimarkaur commented Mar 23, 2017 • edited Loading

marco-c commented Mar 25, 2017 • edited Loading

aditya-iitd commented Mar 28, 2017

marco-c commented Mar 30, 2017

mansimarkaur commented Mar 31, 2017

marco-c commented Apr 8, 2017

mansimarkaur commented Mar 23, 2017 •

edited

Loading

marco-c commented Mar 25, 2017 •

edited

Loading