GitHub - a-dacko/MEDHMM_vs_MHMM: This repository contains all progress on my master thesis

Should you make the Explicit-duration Hidden Markov model your asset when analysing intensive longitudinal data for behavioral research?

Methodology and Statistics for the Behavioural, Biomedical and Social Sciences (MSBBSS), Utrecht University

Aleksandra Dacko

Reasearch archive

This repository contains R code to reproduce the results of the master thesis.

Repository structure

In the following research archive you can find the following folders¹:

MHMM_vs_MEDHMM/
├── README.md
├── Session_info.txt - includes information regarding all packages loaded during the analysis as well as the R software specifications
├── data/
│   ├── data_simulation.R - simulated data generating script
│   └── read_me.txt  - description of the data files that should be included
├── convergence_run/   
│    ├── medhmm/
│    │   ├── convergence_run_medhmm*.R - main scripts running the convergence inspections
│    │   ├── doit* - doit files necessary to run to submit jobs included in jobs/ folder
│    │   ├── jobs/ - jobs that have been submitted 
│    │   ├── outputs/  - all outputs are stored in the folder including pdfs of the runs that had been included ad hoc for reader inspection
│    │   └── start_convergence_medhmm*.sh - .sh files used to submit jobs to snellius
│    │   
│    └── mhmm/
│        ├── convergence_runs.R - main scripts running the convergence inspections
│        ├── doit - doit files necessary to run to submit jobs included in jobs/ folder
│        ├── jobs/ - jobs that have been submitted 
│        ├── outputs/  - all outputs are stored in the folder including pdfs of the runs that had been included ad hoc for reader inspection
│        └── start_convergence_mhmm.sh - .sh file used to submit jobs to snellius
├── medhmm_run/
│    ├── model_fitting2_and_post_processes.R - main scripts running the simulation study for the multilevel explicit duration hidden markov model
│    ├── model_fitting2_and_post_processes_extra_run.R - main scripts running the additional scenario of simulation study for the multilevel explicit duration hidden markov model
│    ├── doit* - doit files necessary to run to submit jobs included in jobs/ folder
│    ├── jobs/ - jobs that have been submitted
│    ├── outputs/  - all outputs are stored in the folder including including data that is later used to obtain the statistics
│    ├── start_2dep_medhmm_*.sh - .sh files used to submit jobs to snellius
│    └── parameters/ - folder including all parameters of the simulation scenarios that are further passed to snellius engine
│
├── mhmm_run/
│    ├── model_fitting2_and_post_processes.R - main scripts running the simulation study for the multilevel  hidden markov model
│    ├── model_fitting2_and_post_processes_extra.R - main scripts running the additional scenario of simulation study for the multilevel hidden markov model
│    ├── doit* - doit files necessary to run to submit jobs included in jobs/ folder
│    ├── jobs/ - jobs that have been submitted
│    ├── outputs/  - all outputs are stored in the folder including including data that is later used to obtain the statistics
│    ├── start_2dep_medhmm_*.sh - .sh files used to submit jobs to snellius
│    └── parameters/ - folder including all parameters of the simulation scenarios that are further passed to snellius engine
│
├── simulation_main/
│    ├── parse outputs.R - script parsing outputs included in mhmm_run/outputs/ and medhmm_run/outputs/ that saves the final post-processed files in simulation_main/post_processed_data/
│    ├── plot_scripts/ - includes three files that generate the plots that are further used in the manuscript 
│    ├── outputs plots/ - includes all plots generate by the scripts included in simulation_main/plot_scripts/
│    └── post_processed_data/ include final outputs that are further used for table generating and plot building. Main Results.xlsx included in this folder has been manually parsed and included for and easy overview.
│
├── manuscript/ - includes all laTeX files and graphics that were used to generate the manuscript of the thesis
│
└── example/ 
    ├── outputs/ - includes the outputs of the empirical example model fitting 
    ├── plots/ - includes all plots included in the final manuscript that are 
    ├── train_models.R - code to train both MEDHMM and MHMM with a use of the empirical data
    ├── models_postprocess.R - scripts generating trace plots and other statistics reported in main manuscript. In addition includes code to generate figures 9, 10, 12, C4. Also includes script to simulate data for the posterior predictive checks 
    ├── summarise_data.R - scripts summarizing in sample data to obtain the group-level and patient-level statistics. The data is further used for the posterior predictive checks 
    ├── utility_functions.R - script including a function to summariese, simulate and plot data
    ├── ppc_mean_code.R - code to generate the plots of the posterior predictive checks i.e. figures C1 and C2 from the manuscript
    └── decoding_figures_11_C5.R - script generation figures 11 and C5 from the manuscript

Reproducing Results

To reproduce the results of the study, follow these steps:

Download the code in the convergence_run,example,medhmm_run,mhmm_run and data folder and the empirical data (upon request from Groningen Medical Centre).
Run the R scripts in the following order to reproduce the results from main simulation study:
1. data/data_simulation.R: simulate the data.
2. convergence_run/medhmm/convergence_run_medhmm1.R,convergence_run/medhmm/convergence_run_medhmm2.R, convergence_run/mhmm/convergence_runs.R : analyse the convergence results from fitting the Bayesian multilevel EDHMM and HMM on the simulated data.
3. R (.R) scripts included in medhmm_run/ and mhmm_run/ directories: run the simulation.
4. simulatipon_main/parse_outputs.R: summarize and parse all results. (results produced to ./post_processed_data/).
5. R (.R) scripts included in simulatipon_main/plot_scripts/directory: produce figures (results ./produced to plot_scripts/).
Run the R scripts in the following order to reproduce the results from empirical example:
1. example/train_models.R: fit the data to MEDHMM and the MHMM.
2. example/summarise_data.R: get the summary statistics of individuals and study group.
3. example/utility_functions.R: load functions needed.
4. example/models_postprocess.R: summarize data results, produce plots and convergence trace plots.
5. example/decoding_figures_11_C5.R and example/ppc_mean_code.R: to obtain figures and posterior predictive check analisys results.

Notice that the simulation is computationally intensive; it was run in the Dutch National Supercomputer Snellius using a single node with varying number of cores. The cluster computer scripts are available in convergence_run/medhmm/, convergence_run/mhmm/,medhmm_run/, and mhmm_run/ and consist of the job files, doit files and executable .sh files. Those files can be re-used however they need to be adjusted to match future user file directories.

Ethical Approval

The study was approved by the Ethical Review Board of the Faculty of Social and Behavioural Sciences of Utrecht University (ref.no 22-1845 and 22-1844).

Funding

This work made use of the Dutch national e-infrastructure Snellius with the support of the SURF Cooperative using grant no. EINF-2570, which is (partly) financed by the Dutch Research Council (NWO)

Data availability statement

The empirical data cannot be found in this repository due to privacy issues. However, the data can be made available upon reasonable request from researchers. Please contact [email protected] for more information.

Contact

For any further questions, please contact [email protected].

The "*" symbol indicates a wildcard for the file names as a lot of them were generated with the same structure. ↩

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Should you make the Explicit-duration Hidden Markov model your asset when analysing intensive longitudinal data for behavioral research?

Methodology and Statistics for the Behavioural, Biomedical and Social Sciences (MSBBSS), Utrecht University

Aleksandra Dacko

Reasearch archive

Repository structure

Reproducing Results

Ethical Approval

Funding

Data availability statement

Contact

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
convergence_run		convergence_run
data		data
example		example
manuscript		manuscript
medhmm_run		medhmm_run
mhmm_run		mhmm_run
simulation_main		simulation_main
MHMM_vs_MEDHMM.Rproj		MHMM_vs_MEDHMM.Rproj
README.html		README.html
README.md		README.md
Session_info.txt		Session_info.txt

a-dacko/MEDHMM_vs_MHMM

Folders and files

Latest commit

History

Repository files navigation

Should you make the Explicit-duration Hidden Markov model your asset when analysing intensive longitudinal data for behavioral research?

Methodology and Statistics for the Behavioural, Biomedical and Social Sciences (MSBBSS), Utrecht University

Aleksandra Dacko

Reasearch archive

Repository structure

Reproducing Results

Ethical Approval

Funding

Data availability statement

Contact

Footnotes

About

Resources

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages