Open
Description
Would there be interest in adding a function for minimum covariance determinant estimator (MCD), for robust covariance estimation of multivariate data ? The reference is :
Rousseeuw, P. J. and Van Driessen, K. (1999). A fast algorithm for the minimum covariance determinant estimator. Technometrics, 41, 212-223.
I wrote this simple version some time ago, it could become a PR if there's interest. If so, where would it belong, StatsBase or MultivariateStats ?
function fastMCD(X,p = ceil(Int,(sum(size(X))+1)/2);nrepeats=500)
rng = Base.GLOBAL_RNG
hMin = nothing
sMin = Inf
for i in 1:nrepeats
idx = randperm(size(X,1))
h1 = X[idx[1:p],:]
s0 = 0
s1= 1
while ((det(s1)!=det(s0)) & (det(s1)!=0))
h0 = h1
s0 = cov(h0)
m = vec(mean(h0,1))
Dis = vec(mapslices(x -> mahalanobis(x,m,inv(s0)),X,2))
ord = sortperm(Dis)
h1 = X[ord[1:p],:]
s1=cov(h1)
end
if det(s1)<det(sMin)
hMin = h1
sMin = s1
end
end
## Reweighting
sfull = cov(hMin)
tmcd = vec(mean(hMin,1))
dfull = vec(mapslices(x -> mahalanobis(x,tmcd,inv(sfull)),hMin,2))
smcd = (median(dfull.^2)/pdf(Chisq(size(X,2)),0.5))*sfull
dmcd = vec(mapslices(x -> mahalanobis(x,tmcd,inv(smcd)),hMin,2));
w = FrequencyWeights(((dmcd.^2).<pdf(Chisq(size(X,2)),0.975))*1)
t1 = mean(hMin,w,1)
s1 = cov(hMin,w,corrected=true)
(t1,s1)
end
Metadata
Metadata
Assignees
Labels
No labels