Parrot
https://github.com/AIDC-AI/Parrot
📊 Stats
⭐ Stars: 77
📝 Language: Python
📝 Description: 🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.
⭐ Star Growth (12 months)
🔬 Research Notes
Stats
Description
🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.
Topics
mixture-of-experts, multilingual, multimodal-large-language-models, vision-language-model
Research Summary
Key Features
Architecture
Use Cases
Assessment
README Excerpt
```
# 🦜 Parrot: Multilingual Visual Instruction Tuning
🦜Model •
🔥Train •
---
🎉 Introduction
Welcome to Parrot [[paper](https://arxiv.org/abs/2406.02539)], a novel method that utilizes textual guidance to drive visual token alignment at the language level. Parrot makes the visual tokens condition on diverse language inputs and uses Mixture-of-Experts (MoE) to promote the alignment of multilingual tokens. Moreover, considering the current lack of benchmarks for evaluating multilingual capabilities within the field, we collect and make available a Massive Multilingual Multimodal Benchmark which includes 6 languages, 15 categories, and 12,000 questions, named as MMMB.
If you find Parrot useful for your research and applications, please cite using this BibTeX:
```bibtex
@inproceedings{sun2025parrot,
title={Parrot: Multilingual Visual Instruction Tuning},
author={Sun, Hai-Long and Zhou, Da-Wei and Li, Yang and Lu, Shiyin and Yi, Chao and Chen, Qing-Guo and Xu, Zhao and Luo, Weihua and Zhang, Kaifu and Zhan, De-Chuan and others},
booktitle={ICML},
year={2025}
}
```
📰 What's New
MMMB and MTL_MMBench_DEV to obtain the results of 6 langs at the a time. Welcome to have a try!☄️ Install
Please follow the instructions below to install the required packages.
1. Clone this repository and navigate to Parrot folder
```bash
git clone https://github.com/AIDC-AI/Parrot.git
cd Parrot
```
2. Install Package
```Shell
conda create -n parrot python=3.10 -y
conda activate parrot
pip install --upgrade pip
pip install -e .
```
Upgrade to latest code base
```Shell
git pull
pip install -e . --no-deps
```
🦜 Model
Parrot is a multilingual multimodal large language model. We provide our fully finetuned models below:
| Model | Base LLM | Vision Encoder | Stage | Download |
| --- | --- | :---: | :---: | :---: |
| Parrot-7B | Qwen-1.5-7B-Chat | CLIP-ViT-Large-patch14-336 | SFT | [ckpt](https://huggingface.co/AIDC-AI/Parrot-7B) |
| Parrot-14B | Qwen-1.5-14B-Chat | CLIP-ViT-Large-patch14-336 | SFT | [ckpt](https://huggingface.co/AIDC-AI/Parrot-14B) |

🔥 Train
Parrot is trained in two stages: modality alignment and instruction tuning for multilingual alignment. Each stage's training script is provided in the scripts folder. Before starting the training, ensure you properly set the ROOT variable in the training script. Below are the commands to train Parrot for each stage:
```shell
bash scripts/train/pretrain.sh
bash scripts/train/finetune.sh
```
Hyperparameters
We use a similar set of hyperparameters as Vicuna in finetuning. Both hyperparameters used in pretraining and finetuning are provided below.
1. Pretraining
```
---
*Researched: 2026-03-28*