The Cyclic Attention Transformer (CAT) is a novel transformer architecture that introduces cyclic attention mechanisms to enhance contextual learning. This non-pretrained model demonstrates exceptional performance on text classification tasks, achieving state-of-the-art results on multiple benchmarks without the need for extensive pretraining.
- 🔄 Cyclic Attention Mechanism: Advanced contextual modeling through cyclic shifts and gating
- 🚀 Zero-Shot Architecture: Efficient training from scratch without pretraining requirements
- 📊 Strong Benchmark Results:
- AG News Dataset: 91.00% accuracy
- DBpedia Dataset: 98.05% accuracy
- 🎯 Efficient Training: Optimized for both speed and performance
-
Cyclic Attention Block
- Innovative cyclic shift mechanism
- Adaptive gating for attention filtering
- Enhanced global dependency capture
-
Multi-Head Attention System
- Hierarchical attention layers
- Intermediate normalization
- Advanced feedforward networks
-
Processing Pipeline
- Custom n-gram tokenization
- Global pooling for sequence aggregation
- Specialized classification head
Parameter | Default Value | Description |
---|---|---|
embed_dim | 1024 | Embedding dimension |
num_heads | 8 | Number of attention heads |
ff_dim | 2048 | Feedforward layer dimension |
num_layers | 3 | Number of transformer layers |
batch_size | 128 | Training batch size |
epochs | 5 | Training epochs |
Metric | Score |
---|---|
Accuracy | 91.00% |
F1 Score | 90.99% |
Precision | 91.02% |
Recall | 91.00% |
Dataset Details:
- Vocabulary Size: 50,002
- Training Samples: 120,000
Metric | Score |
---|---|
Accuracy | 98.05% |
F1 Score | 98.05% |
Precision | 98.06% |
Recall | 98.05% |
Dataset Details:
- Training Set: 560,000 samples
- Test Set: 70,000 samples
- Vocabulary Size: 50,002
Epoch | Loss |
---|---|
1 | 0.1299 |
2 | 0.0681 |
3 | 0.0520 |
4 | 0.0416 |
5 | 0.0344 |
-
Tokenization
- Custom tokenizer supporting unigram and bigram tokenization
- Vocabulary size: 50,002 (including special tokens)
-
Attention Implementation
- Cyclic shift attention mechanism
- Gated attention filtering
- Multi-head attention processing
-
Training Configuration
- Optimizer: AdamW
- Loss Function: Cross-entropy
- Learning Rate: 5e-5
- Python 3.x
- PyTorch
- Transformers library
- Jupyter Notebook
Clone the repository:
git clone https://github.com/VijayendraDwari/CAT.git
cd CAT
Install dependencies:
pip install -r requirements.txt
Run Jupiter Notebook:
jupyter notebook
📚 Documentation
For detailed information about:
Model architecture Training procedures Dataset preparation Evaluation metrics
Please refer to the notebooks in the repository.
If you use this implementation in your research, please cite:
@misc{dwari2025cat,
title={Cyclic Attention Transformer (CAT)},
author={Vijayendra Dwari},
year={2025},
publisher={GitHub},
journal={GitHub repository},
howpublished={\url{https://github.com/VijayendraDwari/CAT}}
}