Should you make the Explicit-duration Hidden Markov model your asset when analysing intensive longitudinal data for behavioral research?
Methodology and Statistics for the Behavioural, Biomedical and Social Sciences (MSBBSS), Utrecht University
This repository contains R code to reproduce the results of the master thesis.
In the following research archive you can find the following folders1:
MHMM_vs_MEDHMM/
├── README.md
├── Session_info.txt - includes information regarding all packages loaded during the analysis as well as the R software specifications
├── data/
│ ├── data_simulation.R - simulated data generating script
│ └── read_me.txt - description of the data files that should be included
├── convergence_run/
│ ├── medhmm/
│ │ ├── convergence_run_medhmm*.R - main scripts running the convergence inspections
│ │ ├── doit* - doit files necessary to run to submit jobs included in jobs/ folder
│ │ ├── jobs/ - jobs that have been submitted
│ │ ├── outputs/ - all outputs are stored in the folder including pdfs of the runs that had been included ad hoc for reader inspection
│ │ └── start_convergence_medhmm*.sh - .sh files used to submit jobs to snellius
│ │
│ └── mhmm/
│ ├── convergence_runs.R - main scripts running the convergence inspections
│ ├── doit - doit files necessary to run to submit jobs included in jobs/ folder
│ ├── jobs/ - jobs that have been submitted
│ ├── outputs/ - all outputs are stored in the folder including pdfs of the runs that had been included ad hoc for reader inspection
│ └── start_convergence_mhmm.sh - .sh file used to submit jobs to snellius
├── medhmm_run/
│ ├── model_fitting2_and_post_processes.R - main scripts running the simulation study for the multilevel explicit duration hidden markov model
│ ├── model_fitting2_and_post_processes_extra_run.R - main scripts running the additional scenario of simulation study for the multilevel explicit duration hidden markov model
│ ├── doit* - doit files necessary to run to submit jobs included in jobs/ folder
│ ├── jobs/ - jobs that have been submitted
│ ├── outputs/ - all outputs are stored in the folder including including data that is later used to obtain the statistics
│ ├── start_2dep_medhmm_*.sh - .sh files used to submit jobs to snellius
│ └── parameters/ - folder including all parameters of the simulation scenarios that are further passed to snellius engine
│
├── mhmm_run/
│ ├── model_fitting2_and_post_processes.R - main scripts running the simulation study for the multilevel hidden markov model
│ ├── model_fitting2_and_post_processes_extra.R - main scripts running the additional scenario of simulation study for the multilevel hidden markov model
│ ├── doit* - doit files necessary to run to submit jobs included in jobs/ folder
│ ├── jobs/ - jobs that have been submitted
│ ├── outputs/ - all outputs are stored in the folder including including data that is later used to obtain the statistics
│ ├── start_2dep_medhmm_*.sh - .sh files used to submit jobs to snellius
│ └── parameters/ - folder including all parameters of the simulation scenarios that are further passed to snellius engine
│
├── simulation_main/
│ ├── parse outputs.R - script parsing outputs included in mhmm_run/outputs/ and medhmm_run/outputs/ that saves the final post-processed files in simulation_main/post_processed_data/
│ ├── plot_scripts/ - includes three files that generate the plots that are further used in the manuscript
│ ├── outputs plots/ - includes all plots generate by the scripts included in simulation_main/plot_scripts/
│ └── post_processed_data/ include final outputs that are further used for table generating and plot building. Main Results.xlsx included in this folder has been manually parsed and included for and easy overview.
│
├── manuscript/ - includes all laTeX files and graphics that were used to generate the manuscript of the thesis
│
└── example/
├── outputs/ - includes the outputs of the empirical example model fitting
├── plots/ - includes all plots included in the final manuscript that are
├── train_models.R - code to train both MEDHMM and MHMM with a use of the empirical data
├── models_postprocess.R - scripts generating trace plots and other statistics reported in main manuscript. In addition includes code to generate figures 9, 10, 12, C4. Also includes script to simulate data for the posterior predictive checks
├── summarise_data.R - scripts summarizing in sample data to obtain the group-level and patient-level statistics. The data is further used for the posterior predictive checks
├── utility_functions.R - script including a function to summariese, simulate and plot data
├── ppc_mean_code.R - code to generate the plots of the posterior predictive checks i.e. figures C1 and C2 from the manuscript
└── decoding_figures_11_C5.R - script generation figures 11 and C5 from the manuscript
To reproduce the results of the study, follow these steps:
- Download the code in the
convergence_run,example,medhmm_run,mhmm_runanddatafolder and the empirical data (upon request from Groningen Medical Centre). - Run the R scripts in the following order to reproduce the results from main simulation study:
data/data_simulation.R: simulate the data.convergence_run/medhmm/convergence_run_medhmm1.R,convergence_run/medhmm/convergence_run_medhmm2.R,convergence_run/mhmm/convergence_runs.R: analyse the convergence results from fitting the Bayesian multilevel EDHMM and HMM on the simulated data.- R (.R) scripts included in
medhmm_run/andmhmm_run/directories: run the simulation. simulatipon_main/parse_outputs.R: summarize and parse all results. (resultsproduced to ./post_processed_data/).- R (.R) scripts included in
simulatipon_main/plot_scripts/directory: produce figures (results./produced to plot_scripts/).
- Run the R scripts in the following order to reproduce the results from empirical example:
example/train_models.R: fit the data to MEDHMM and the MHMM.example/summarise_data.R: get the summary statistics of individuals and study group.example/utility_functions.R: load functions needed.example/models_postprocess.R: summarize data results, produce plots and convergence trace plots.example/decoding_figures_11_C5.Randexample/ppc_mean_code.R: to obtain figures and posterior predictive check analisys results.
Notice that the simulation is computationally intensive; it was run in the Dutch National Supercomputer Snellius using a single node with varying number of cores. The cluster computer scripts are available in convergence_run/medhmm/, convergence_run/mhmm/,medhmm_run/, and mhmm_run/ and consist of the job files, doit files and executable .sh files. Those files can be re-used however they need to be adjusted to match future user file directories.
The study was approved by the Ethical Review Board of the Faculty of Social and Behavioural Sciences of Utrecht University (ref.no 22-1845 and 22-1844).
This work made use of the Dutch national e-infrastructure Snellius with the support of the SURF Cooperative using grant no. EINF-2570, which is (partly) financed by the Dutch Research Council (NWO)
The empirical data cannot be found in this repository due to privacy issues. However, the data can be made available upon reasonable request from researchers. Please contact [email protected] for more information.
For any further questions, please contact [email protected].
Footnotes
-
The "*" symbol indicates a wildcard for the file names as a lot of them were generated with the same structure. ↩