license: mit datasets: - Rogendo/English-Swahili-Sentence-Pairs language: - en - sw metrics: - accuracy library_name: transformers
Model Card for Rogendo/en-sw model
This is a pre-trained language translation model that aims to create a translation system for English and Swahili lanuages. It is a fine-tuned version of Helsinki-NLP/opus-mt-en-swc on an unknown dataset.
Model Details
- Transformer architecture used
- Trained on a 210000 corpus pairs
- Pre-trained Helsinki-NLP/opus-mt-en-swc
- 2 models to enforce biderectional translation
Model Description
- Developed by: Peter Rogendo, Frederick Kioko
- Model type: Transformer
- Language(s) (NLP): Transformer, Pandas, Numpy
- License: Distributed under the MIT License
- Finetuned from model [Helsinki-NLP/opus-mt-en-swc]: This pre-trained model was re-trained on a swahili-english sentence pairs that were collected across Kenya. Swahili is the national language and is among the top three of the most spoken language in Africa. The sentences that were used to train this model were 210000 in total.
Model Sources [optional]
- Repository: https://github.com/Rogendo/Eng-Swa-Translator
Uses
This translation model is intended to be used in many cases, from language translators, screen assistants, to even in official cases such as translating legal documents.
Direct Use
pip install sentencepiece
from transformers import pipeline
model_checkpoint = "Rogendo/en-sw"
fine_tuned_model = pipeline("translation", model=model_checkpoint)
fine_tuned_model("Earlier today, I saw her going through the stalls in the market")
Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text2text-generation", model="Rogendo/sw-en")
Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Rogendo/sw-en")
model = AutoModelForSeq2SeqLM.from_pretrained("Rogendo/sw-en")
Downstream Use [optional]
[More Information Needed]
Out-of-Scope Use
[More Information Needed]
Bias, Risks, and Limitations
[More Information Needed]
Recommendations
Users (both direct and downstream) should be made aware of the risks, biases and limitations of the model. More information needed for further recommendations.
How to Get Started with the Model
Use a pipeline as a high-level helper
from transformers import pipeline
pipe = pipeline("text2text-generation", model="Rogendo/sw-en")
Load model directly
from transformers import AutoTokenizer, AutoModelForSeq2SeqLM
tokenizer = AutoTokenizer.from_pretrained("Rogendo/sw-en")
model = AutoModelForSeq2SeqLM.from_pretrained("Rogendo/sw-en")
Training Details
Training Data
View More https://huggingface.co/datasets/Rogendo/English-Swahili-Sentence-Pairs
Training Procedure
Preprocessing [optional]
[More Information Needed]
Training Hyperparameters
- Training regime: [More Information Needed]
Speeds, Sizes, Times [optional]
[More Information Needed]
Evaluation
Testing Data, Factors & Metrics
Testing Data
[More Information Needed]
Factors
[More Information Needed]
Metrics
[More Information Needed]
Results
[More Information Needed]
Summary
Model Examination [optional]
[More Information Needed]
Environmental Impact
Carbon emissions can be estimated using the Machine Learning Impact calculator presented in Lacoste et al. (2019).
- Hardware Type: [More Information Needed]
- Hours used: [More Information Needed]
- Cloud Provider: [More Information Needed]
- Compute Region: [More Information Needed]
- Carbon Emitted: [More Information Needed]
Technical Specifications [optional]
Model Card Authors [optional]
Peter Rogendo
Model Card Contact
- Downloads last month
- 9