-
Notifications
You must be signed in to change notification settings - Fork 1
/
Copy pathtoDoList.rmd
62 lines (55 loc) · 2.24 KB
/
toDoList.rmd
1
2
3
4
5
6
7
8
9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
61
62
## More features
- About mpTune:
+ track on start time, running time, memory usage, warning and error messages in the tuning loop and summary procedure.
+ trace
+ retained predictions, and used for ensemble
- survival support and test (remember that some prediciton gives risk instead of survival time!)
+ survival support and test (DONE!)
+ need more survival models (coxph, RandomForestSRC, cforest, earth, mboost, ...)
+ insistant prediction(some are survival times some are risk score): which is mostly the case? if both, how to solve this issue? (CURRENTLY just reverse the result if CI < 0.5, corr < 0)
- test on regression
- a script for SGE support, run in background and wait to collect results
- a fit function to automatically fit the best models
- (DONE) CV or bootstrap for predictability estimate (support 2 levels of parallel)
- print fitting progress? Add logging: start time of the whole fiting, running time/memory of each model
- resample: trace
- requireMetric: partial match, built-in individule metric visible
## Individual model
- gbm: verbose disable by default
- glmnet2: fix problem, check if a level has only two unique value
- rf: mtry range
## feature interface
```{r}
mpTune.default(
x, y, weights,
modelList = list(RF = 'rf', balancedRF = 'rf', 'gbm'),
modelControl = list(
balancedRF = list(sampsize = quote(rep(min(table(y)),2))),
gbm = list(verbose = TRUE)),
preProcessControl = list(FUN = PCA, nPC = c(2,3,4), ...),
tuningControl = list(
tuneGrid = list(rf = ....),
FUN = makeTuneGrid,
gridLength = 5, randomizedLength = 20, # other parameter for makTuneGrid
...),
samplingControl = list(
FUN = createCVFolds, args = ...,
trainIndex = NULL,
testIndex = NULL,
returnSample = TRUE),
performanceControl = list(
FUN = defaultSummary, args = ...,
metric = NA,
targetMetric = 'AUC'),
parallelControl = list(backend = 'doMC', batchSize = 10),
classProbs = TRUE,
survivalPrediction = 'risk',
returnData = TRUE,
returnPrediction = TRUE,
seeds = NA
);
mpTune.data(data, ...)
# save all these controls as mpTuneControl
# a flow like interface
lazyML(data) %>% preProcess() %>% sampling() %>% modelBlender(preProcess(), modelList, modelControl) %>% tuning(, performance()) %>% bestFit %>% run();
```