NeurIPS2024

Code for NeurIPS2024 paper: 'Towards Transparency: Exploring LLM Trainings Datasets through Visual Topic Modeling and Semantic Frames'

Code

The codes for the figures is in the different colab and the figures are in results

Download model form HF Hub

git-lfs clone https://huggingface.co/teknium/OpenHermes-2.5-Mistral-7B

Name		Name	Last commit message	Last commit date
Latest commit History 4 Commits
latex		latex
results		results
.DS_Store		.DS_Store
.gitignore		.gitignore
DPO_graph.ipynb		DPO_graph.ipynb
README.md		README.md
arc_challenge_OpenHermes-2.5-Mistral-7B.txt		arc_challenge_OpenHermes-2.5-Mistral-7B.txt
bourdieu_analysis.ipynb		bourdieu_analysis.ipynb
compare_bourdieu_mistral_instruct.ipynb		compare_bourdieu_mistral_instruct.ipynb
dpo_topics.ipynb		dpo_topics.ipynb
embedding_comparison.ipynb		embedding_comparison.ipynb
eval.sh		eval.sh
matrix_embedding_results.ipynb		matrix_embedding_results.ipynb
topics_overalping.ipynb		topics_overalping.ipynb