PersianMCQ-Instruct

PersianMCQ-Instruct is a comprehensive resource that includes datasets and advanced models for generating multiple-choice questions (MCQs) in standard Iranian Persian—a low-resource language spoken by over 80 million people. This project provides valuable tools for researchers and educators, aiming to enhance Persian-language educational technology.

Overview

We present three state-of-the-art models for Persian MCQ generation:

PMCQ-Gemma2-9b
PMCQ-Llama3.1-8b
PMCQ-Mistral-7B

Inspired by the Agent Instruct framework and GPT-4, we created a dataset by curating over 4,000 unique Persian Wikipedia pages and generating three MCQs per page, resulting in a total of over 12,000 questions. Both human evaluations and model fine-tuning were conducted to ensure dataset quality, showing substantial performance improvements in Persian MCQ generation.

Features

Multiple Pre-trained Models: Choose from three advanced models fine-tuned for Persian MCQ generation.
Customizable Generation Parameters: Adjust settings like temperature to control the creativity of the generated content.
Command-Line Interface: Easily run the script with different parameters through a simple CLI.
Supports Persian Language: Tailored for standard Iranian Persian, catering to a significant linguistic community.

Requirements

Python: Python 3.7 or higher
Hardware:
- CUDA-compatible GPU (recommended for faster processing and to handle larger models)
- At least 16 GB of RAM (more may be required depending on the model)
Dependencies:
- See requirements.txt for a list of Python packages needed.

Installation

Clone the Repository:

git clone https://github.com/yourusername/Persian_MCQ.git
cd Persian_MCQ

Create a Virtual Environment (Recommended):

python -m venv venv
source venv/bin/activate  # On Windows use `venv\Scripts\activate`

Install the Required Packages:
```
pip install -r requirements.txt
```
Download the Models:

The models will be automatically downloaded the first time you run the script. Ensure you have a stable internet connection.

Usage

The script can be executed from the command line with various arguments to customize its behavior.

Basic Command Structure:

python main.py --model-name <MODEL_NAME> --input-file <INPUT_CSV> --output-file <OUTPUT_CSV> [--temperature <TEMPERATURE>]

Arguments

--model-name: (Required) Name of the pre-trained model to use.
- Options: "PMCQ-Gemma2-9b", "PMCQ-Llama3.1-8b", "PMCQ-Mistral-7B"
--input-file: (Required) Path to the input CSV file containing the prompts.
--output-file: (Required) Path where the output CSV file will be saved.
--temperature: (Optional) Controls the randomness of the generation.
- Default is 0.1. Higher values like 0.8 will make the output more random, while lower values like 0.2 will make it more deterministic.

Example Command

python main.py \
  --model-name PMCQ-Gemma2-9b \
  --input-file data/input_prompts.csv \
  --output-file data/generated_mcqs.csv \
  --temperature 0.7

Input File Format

The input CSV file should contain at least one column named text, which includes the prompts for MCQ generation.

Sample input_prompts.csv:

text
این یک نمونه سوال برای تولید گزینه‌ها است.

Output

The script generates a CSV file containing the original prompts and the generated MCQs.

Sample generated_mcqs.csv:

text	Generated Persian MCQ
این یک نمونه سوال برای تولید گزینه‌ها است.	(Generated MCQ text here)

Notes

GPU Memory: Ensure your GPU has enough memory to load and run the models. Models like PMCQ-Gemma2-9b are resource-intensive.
Model Loading Time: Loading large models may take several minutes. Please be patient.
Internet Connection: A stable internet connection is required to download models from Hugging Face's model hub if they are not present locally.
Customization: Feel free to adjust parameters like max_new_tokens in the script for finer control over the output length.

Installing Dependencies Using `requirements.txt`:

pip install -r requirements.txt

License

This project is licensed under the MIT License.

We hope that PersianMCQ-Instruct serves as a valuable resource in advancing educational technology and research for the Persian language community.

For any questions or contributions, please open an issue or submit a pull request.

Reference

This project is based on the research presented in the paper:

Title: PersianMCQ-Instruct: A Resource for Persian Multiple Choice Question Generation

Abstract:

We present PersianMCQ-Instruct, a comprehensive resource comprising a dataset and advanced models for generating multiple-choice questions (MCQs) in standard Iranian Persian, a low-resource language spoken by over 80 million people. This resource includes three state-of-the-art models for Persian MCQ generation: PMCQ-Gemma2-9b, PMCQ-Llama3.1-8b, and PMCQ-Mistral-7B. Inspired by the Agent Instruct framework and GPT-4, we created the dataset by curating over 4,000 unique Persian Wikipedia pages, generating three MCQs per page for a total of over 12,000 questions.

To ensure the quality of the dataset, we conducted both human evaluations and model fine-tuning, which showed substantial performance improvements in the Persian MCQ generation. The dataset and models are publicly available, providing valuable tools for researchers and educators, with a particular impact on enhancing Persian-language educational technology.

Note: The models and datasets are intended for research and educational purposes. Always ensure compliance with local regulations and ethical guidelines when using AI models and data.

Name		Name	Last commit message	Last commit date
Latest commit History 15 Commits
LICENSE.txt		LICENSE.txt
README.md		README.md
main.py		main.py
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

PersianMCQ-Instruct

Overview

Features

Table of Contents

Requirements

Installation

Usage

Arguments

Example Command

Input File Format

Output

Notes

Installing Dependencies Using `requirements.txt`:

License

Reference

About

Releases

Packages

Languages

License

KamyarZeinalipour/Persian_MCQ

Folders and files

Latest commit

History

Repository files navigation

PersianMCQ-Instruct

Overview

Features

Table of Contents

Requirements

Installation

Usage

Arguments

Example Command

Input File Format

Output

Notes

Installing Dependencies Using requirements.txt:

License

Reference

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Installing Dependencies Using `requirements.txt`:

Packages