latent-multi-hop-reasoning

This repository contains the code and datasets used in the following two papers:

Sohee Yang, Elena Gribovskaya, Nora Kassner, Mor Geva*, Sebastian Riedel*. Do Large Language Models Latently Perform Multi-Hop Reasoning?. In ACL 2024.
Sohee Yang, Nora Kassner, Elena Gribovskaya, Sebastian Riedel*, Mor Geva*. Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?. arXiv, 2024.

Installation

# Create and activate conda environment
conda create -n reasoning python=3.10
conda activate reasoning

# Install dependencies
pip install -r requirements.txt

# Set HuggingFace token environment variable (required for accessing certain models)
export HF_TOKEN="your_token_here"

Datasets

The datasets are under the datasets directory.

TwoHopFact

Introduced in Do Large Language Models Latently Perform Multi-Hop Reasoning?
Contains 45,595 pairs of one-hop and two-hop factual prompts of 52 fact composition types with balanced distribution, designed to probe the internal mechanism of latent multi-hop reasoning
datasets/TwoHopFact.csv (91MB)
TwoHopFact is also available in huggingface datasets as soheeyang/TwoHopFact.

SOCRATES (ShOrtCut-fRee lATent rEaSoning)

Introduced in Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?
Contains 7,232 pairs of one-hop and two-hop factual prompts of 17 fact composition types, carefully created to evaluate latent multi-hop reasoning ability of LLMs with accuracy-based metrics while minimizing the risk of shortcuts
datasets/SOCRATES_v1.csv (14MB): A cleaned-up version of the dataset which does not contain grammatical errors.
datasets/SOCRATES_v0.csv (14MB): Used for the experiments in the paper which contains a few grammatical errors.
SOCRATES v1 is also available in huggingface as soheeyang/SOCRATES.

Code Usage

Do Large Language Models Latently Perform Multi-Hop Reasoning?

Inspection of Latent Multi-Hop Reasoning Pathway

python inspect_latent_reasoning.py \
    --model_name_or_path $MODEL_NAME_OR_PATH \
    --input_csv_path datasets/TwoHopFact.csv \
    --rq1_batch_size 256 \
    --rq2_batch_size 8 \
    --completion_batch_size 64 \
    --hf_token $HF_TOKEN \
    --run_rq1 --run_rq2 --run_appositive --run_cot --run_completion

Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

Shortcut-Free Evaluation

python evaluate_latent_reasoning.py \
    --model_name_or_path $MODEL_NAME_OR_PATH \
    --input_csv_path datasets/SOCRATES.csv \
    --tensor_parallel_size 2 \
    --batch_size 256 \
    --hf_token $HF_TOKEN

Patchscopes Analysis

python run_patchscopes.py \
    --model_name_or_path $MODEL_NAME_OR_PATH \
    --input_csv_path datasets/SOCRATES.csv \
    --batch_size 64 \
    --source_layer_idxs 1,2 \
    --target_layer_idxs 30,31 \
    --hf_token $HF_TOKEN \
    --run_evaluation --run_patchscopes_evaluation

Code Structure

datasets: contains datasets introduced in the two works.
- TwoHopFact.csv
- SOCRATES.csv
src: contains the core functions.
- data_utils.py, model_utils.py, tokenization_utils.py contain the common code used in both papers.
- inspection_utils.py contains the code used in Do Large Language Models Latently Perform Multi-Hop Reasoning?.
- evaluation_utils.py contains the code used in Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?.
- patchscopes_utils.py contains the code used in the Patchscopes analysis in Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?.
results: The result files from the experiments will be stored under this directory. This can be set by --output_dir argument.

Citing our works

Do Large Language Models Latently Perform Multi-Hop Reasoning?

@inproceedings{
    yang2024latentreasoning,
    title={Do Large Language Models Latently Perform Multi-Hop Reasoning?},
    author={Sohee Yang and Elena Gribovskaya and Nora Kassner and Mor Geva and Sebastian Riedel},
    booktitle={Association for Computational Linguistics},
    year={2024},
    url={https://aclanthology.org/2024.acl-long.550}
}

Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

@article{
    yang2024shortcutfree,
    title={Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?},
    author={Sohee Yang and Nora Kassner and Elena Gribovskaya and Sebastian Riedel and Mor Geva},
    journal={arXiv},
    year={2024},
    url={https://arxiv.org/abs/2411.16679}
}

License and disclaimer

All software is licensed under the Apache License, Version 2.0 (Apache 2.0); you may not use this file except in compliance with the Apache 2.0 license. You may obtain a copy of the Apache 2.0 license at: https://www.apache.org/licenses/LICENSE-2.0

All other materials are licensed under the Creative Commons Attribution 4.0 International License (CC-BY). You may obtain a copy of the CC-BY license at: https://creativecommons.org/licenses/by/4.0/legalcode

Unless required by applicable law or agreed to in writing, all software and materials distributed here under the Apache 2.0 or CC-BY licenses are distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the licenses for the specific language governing permissions and limitations under those licenses.

This is not an official Google product.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Uh oh!

Repository files navigation

latent-multi-hop-reasoning

Installation

Datasets

TwoHopFact

SOCRATES (ShOrtCut-fRee lATent rEaSoning)

Code Usage

Do Large Language Models Latently Perform Multi-Hop Reasoning?

Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

Code Structure

Citing our works

Do Large Language Models Latently Perform Multi-Hop Reasoning?

Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

License and disclaimer

About

Uh oh!

Releases

Packages

Uh oh!

Languages

Name		Name	Last commit message	Last commit date
Latest commit History 5 Commits
datasets		datasets
src		src
CONTRIBUTING.md		CONTRIBUTING.md
LICENSE		LICENSE
README.md		README.md
evaluate_latent_reasoning.py		evaluate_latent_reasoning.py
inspect_latent_reasoning.py		inspect_latent_reasoning.py
requirements.txt		requirements.txt
run_patchscopes.py		run_patchscopes.py

License

google-deepmind/latent-multi-hop-reasoning

Folders and files

Latest commit

History

Repository files navigation

latent-multi-hop-reasoning

Installation

Datasets

TwoHopFact

SOCRATES (ShOrtCut-fRee lATent rEaSoning)

Code Usage

Do Large Language Models Latently Perform Multi-Hop Reasoning?

Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

Code Structure

Citing our works

Do Large Language Models Latently Perform Multi-Hop Reasoning?

Do Large Language Models Perform Latent Multi-Hop Reasoning without Exploiting Shortcuts?

License and disclaimer

About

Resources

License

Contributing

Uh oh!

Stars

Watchers

Forks

Releases

Packages 0

Uh oh!

Languages

Packages