- Brainstorm possible research/engineering candidates that could let us train better chatbots than ChatGPT/GPT4 in selected aspects.
- Solid foundation pretrained models
- Get close to ChatGPT by “behavioral cloning” or self-align?
- More [sub-domain] knowledge than ChatGPT?
- Longer context than GPT4?
- Lower cost of training and inference?
- Multi-query attention
- FlashAttention
- FastTranformer
- PEFT by HuggingFace
- DeepSpeedChat
- Better reward model and RL?
- More modalities than GPT4?
- Very very important, evaluation!