From e6a4a22e4ba77fd0af1c05dec1e7f822b83cc26d Mon Sep 17 00:00:00 2001 From: locch Date: Wed, 1 Jan 2025 17:53:59 +0700 Subject: [PATCH] update --- docs/terms.md | 219 ++++++++++++++++++++++++++++++++++++++++++++++++++ 1 file changed, 219 insertions(+) create mode 100644 docs/terms.md diff --git a/docs/terms.md b/docs/terms.md new file mode 100644 index 0000000..f14bcbc --- /dev/null +++ b/docs/terms.md @@ -0,0 +1,219 @@ +# Glossary of Terms + +## General Machine Learning Concepts + +### Overfitting +- Occurs when a model learns the training data too well, including noise, and fails to generalize to unseen data. + +### Underfitting +- Happens when a model is too simple to capture the underlying patterns in the data, leading to poor performance on both training and test sets. + +### Catastrophic Forgetting +- A phenomenon in which a neural network forgets previously learned tasks when fine-tuned on new tasks. + +### Data Leakage +- Occurs when information from outside the training dataset is used to create the model, leading to overly optimistic performance. + +--- + +## Evaluation Metrics + +### BLEU Score +- A metric for evaluating the quality of text generation tasks by comparing machine-generated text to human references using n-gram overlap. + +### BERT Score +- Uses contextual embeddings from BERT to evaluate the similarity between machine-generated text and reference text. + +### ROUGE Score +- Measures the overlap of n-grams, word sequences, and word pairs between machine-generated and reference texts, often used for summarization tasks. + +### Perplexity +- Evaluates the quality of probabilistic models by measuring how well they predict a sample. Lower perplexity indicates better performance. + +### METEOR +- Evaluates text generation by aligning words semantically and syntactically, focusing on recall. + +### Pass@k +- Measures the probability of solving a task in at least one of the top-k attempts. + +### Accuracy +- The proportion of correctly classified instances out of the total instances. + +### Recall +- The proportion of relevant instances correctly identified out of all relevant instances. + +### Precision +- The proportion of relevant instances out of all instances identified as relevant. + +### F1-Score +- The harmonic mean of precision and recall, balancing both metrics. + +--- + +## Pretraining and Fine-Tuning Techniques + +### Self-Supervised Fine-Tuning +- Uses unlabeled data to learn representations by creating pseudo-labels or tasks. + +### Masked Language Modeling (MLM) +- Predicts masked tokens in a sentence, often used in transformer models like BERT. + +### Next Sentence Prediction (NSP) +- Predicts whether two sentences are consecutive, used to improve understanding of sentence relationships. + +### Supervised Fine-Tuning +- Fine-tuning a model on labeled data to adapt it to specific tasks. + +### Full Fine-Tuning +- Updating all model parameters on task-specific data. + +### Parameter-Efficient Fine-Tuning (PEFT) +- Fine-tunes a subset of parameters, reducing computational cost. + +### Selective Fine-Tuning +- Fine-tunes specific layers or modules of the model. + +### Additive Fine-Tuning +- Adds new parameters to the model for specific tasks while keeping original parameters fixed. + +### Adapters +- Lightweight modules inserted into the model that can be fine-tuned for new tasks. + +### Reparameterization Fine-Tuning +- Modifies parameterization for efficient adaptation to new tasks. + +### LoRA +- Fine-tunes low-rank matrices in the model, saving memory and computation. + +### DoRA +- A derivative of LoRA designed for better efficiency and scaling. + +### QLoRA +- Combines LoRA with quantized models for memory-efficient fine-tuning. + +### 4-bit, 8-bit Quantization +- Reduces model size by representing parameters with lower-bit precision, saving memory and improving efficiency. + +### Double Quantization +- Applies quantization at multiple stages for further efficiency. + +### Soft-Prompts +- Learnable vectors added to prompts for task-specific adaptation. + +### Prompt Tuning +- Fine-tunes prompts instead of model parameters to adapt models to new tasks. + +### Prefix Tuning +- Optimizes prefixes added to input sequences for task-specific performance. + +### P-Tuning +- Enhances prompt tuning with continuous parameter optimization. + +### Multi-Task Tuning +- Fine-tunes a model on multiple tasks simultaneously to improve generalization. + +--- + +## Advanced Fine-Tuning Techniques + +### Instruction Tuning +- Fine-tunes a model to follow natural language instructions across tasks. + +### Reinforcement Learning from Human Feedback (RLHF) +- Combines human feedback with reinforcement learning to fine-tune models for desired behavior. + +### Constitutional AI +- Fine-tunes models using predefined principles instead of human feedback. + +### Meta-Instruction Tuning +- Fine-tunes models on diverse tasks with hierarchical or meta-contextual instructions. + +### Task-Specific Tuning (Few-Shot or Zero-Shot) +- Adapts models for specific tasks with limited (few-shot) or no (zero-shot) labeled data. + +### Multimodal Instruction Tuning +- Extends instruction tuning to handle multiple modalities, like text and images. + +### Chain-of-Thought Instruction Tuning +- Fine-tunes models to generate intermediate reasoning steps for better problem-solving. + +### Expert-Tuning (Mixture of Experts) +- Trains specialized modules (experts) for specific tasks, enabling efficient task routing. + +### Self-Reflective Training +- Models evaluate and refine their outputs iteratively to improve performance. + +### Neural Symbolic Integration +- Combines neural networks with symbolic reasoning for better generalization and interpretability. + +### Alignment Tuning +- Aligns models with specific values, societal norms, or operational goals. + +### Hyper-Instruction Tuning +- Fine-tunes models for understanding abstract or multi-task instructions. + +--- + +## Reinforcement Learning Techniques + +### Reward Modeling +- Creates a model to predict human preferences for fine-tuning with reinforcement learning. + +### RLHF +- Uses reward modeling and reinforcement learning to align models with human feedback. + +### PPO (Proximal Policy Optimization) +- A reinforcement learning algorithm commonly used in RLHF. + +### DPO (Direct Preference Optimization) +- Optimizes models directly based on preference scores without complex reward modeling. + +### RLEF (Reinforcement Learning from Expert Feedback) +- Extends RLHF by incorporating feedback from domain experts. + +--- + +## In-Context Learning Techniques + +### In-Context Learning +- Enables models to perform tasks by providing task examples in the input context. + +### Dynamic In-Context Learning +- Dynamically adapts the context or examples during inference for better performance. + +### Self-Adaptive In-Context Learning +- Models generate or refine in-context examples during inference. + +### Active In-Context Learning +- Selects task-relevant examples dynamically based on a scoring mechanism. + +### Meta-In-Context Learning +- Uses meta-learning principles to improve example selection and adaptation. + +### Contrastive In-Context Learning +- Includes contrasting examples to help the model distinguish patterns. + +### Reinforced In-Context Learning +- Incorporates reinforcement learning to optimize example selection. + +### Augmented In-Context Learning +- Combines in-context learning with external retrieval or data augmentation. + +### Dynamic Prompt Optimization +- Iteratively refines prompts based on model outputs or feedback. + +### Gradient-Guided Prompting +- Optimizes prompts using gradient-based methods for specific tasks. + +--- + +## Reasoning Techniques + +### Chain-of-Thought +- Encourages models to generate intermediate reasoning steps to improve task performance. + +### Tree-of-Thought +- Models explore multiple reasoning paths, like a decision tree, for complex tasks. + +### Graph-of-Thought +- Represents reasoning as a graph structure, capturing complex relationships between steps.