← Back to all repos

Parrot

https://github.com/AIDC-AI/Parrot

📊 Stats

⭐ Stars: 77

📝 Language: Python

📝 Description: 🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.

⭐ Star Growth (12 months)

🔬 Research Notes

Stats

  • ⭐ Stars: 77
  • 🍴 Forks: 3
  • 📝 Language: Python
  • 📅 Created: 2024-08-02
  • 🔄 Updated: 2026-01-15
  • 🏷️ Latest Release: No releases
  • Description

    🎉 The code repository for "Parrot: Multilingual Visual Instruction Tuning" in PyTorch.

    Topics

    mixture-of-experts, multilingual, multimodal-large-language-models, vision-language-model

    Research Summary

    Key Features

  • Architecture

  • Use Cases

  • Assessment

  • Maturity:
  • Documentation:
  • Community:
  • Recommendation:
  • README Excerpt

    ```

    # 🦜 Parrot: Multilingual Visual Instruction Tuning

    🎉Introduction

    📰What's New

    ☄️Install

    🦜Model

    🔥Train

    🌟Datasets

    🎄MMMB

    🔑Evaluation

    📍Quick Start

    👨‍🏫Acknowledgement

    🤗Contact

    ---

    🎉 Introduction

    Welcome to Parrot [[paper](https://arxiv.org/abs/2406.02539)], a novel method that utilizes textual guidance to drive visual token alignment at the language level. Parrot makes the visual tokens condition on diverse language inputs and uses Mixture-of-Experts (MoE) to promote the alignment of multilingual tokens. Moreover, considering the current lack of benchmarks for evaluating multilingual capabilities within the field, we collect and make available a Massive Multilingual Multimodal Benchmark which includes 6 languages, 15 categories, and 12,000 questions, named as MMMB.

    If you find Parrot useful for your research and applications, please cite using this BibTeX:

    ```bibtex

    @inproceedings{sun2025parrot,

    title={Parrot: Multilingual Visual Instruction Tuning},

    author={Sun, Hai-Long and Zhou, Da-Wei and Li, Yang and Lu, Shiyin and Yi, Chao and Chen, Qing-Guo and Xu, Zhao and Luo, Weihua and Zhang, Kaifu and Zhan, De-Chuan and others},

    booktitle={ICML},

    year={2025}

    }

    ```

    📰 What's New

  • [05/01] 🔥 Parrot is accepted by ICML 2025.
  • [08/21] 🔥 We have supported our multilingual MLLM Parrot in [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), now you can evaluate Parrot easily. Welcome to have a try!
  • [08/20] 🔥 We have supported MMMB and Multilingual MMBench in [VLMEvalKit](https://github.com/open-compass/VLMEvalKit), now you can use the name MMMB and MTL_MMBench_DEV to obtain the results of 6 langs at the a time. Welcome to have a try!
  • [08/02] 🔥 We release the [code](https://github.com/AIDC-AI/Parrot), inhouse multilingual [dataset](https://huggingface.co/datasets/AIDC-AI/Parrot-dataset/tree/main/sharegpt_4v), benchmark [MMMB](https://huggingface.co/datasets/AIDC-AI/Parrot-dataset/tree/main/mmmb), and [model](https://huggingface.co/AIDC-AI/Parrot-7B), welcome to have a try!
  • [06/05] 🔥 Parrot is coming! We release the [paper](https://arxiv.org/abs/2406.02539)!
  • ☄️ Install

    Please follow the instructions below to install the required packages.

    1. Clone this repository and navigate to Parrot folder

    ```bash

    git clone https://github.com/AIDC-AI/Parrot.git

    cd Parrot

    ```

    2. Install Package

    ```Shell

    conda create -n parrot python=3.10 -y

    conda activate parrot

    pip install --upgrade pip

    pip install -e .

    ```

    Upgrade to latest code base

    ```Shell

    git pull

    pip install -e . --no-deps

    ```

    🦜 Model

    Parrot is a multilingual multimodal large language model. We provide our fully finetuned models below:

    | Model | Base LLM | Vision Encoder | Stage | Download |

    | --- | --- | :---: | :---: | :---: |

    | Parrot-7B | Qwen-1.5-7B-Chat | CLIP-ViT-Large-patch14-336 | SFT | [ckpt](https://huggingface.co/AIDC-AI/Parrot-7B) |

    | Parrot-14B | Qwen-1.5-14B-Chat | CLIP-ViT-Large-patch14-336 | SFT | [ckpt](https://huggingface.co/AIDC-AI/Parrot-14B) |

    🔥 Train

    Parrot is trained in two stages: modality alignment and instruction tuning for multilingual alignment. Each stage's training script is provided in the scripts folder. Before starting the training, ensure you properly set the ROOT variable in the training script. Below are the commands to train Parrot for each stage:

    ```shell

    bash scripts/train/pretrain.sh

    bash scripts/train/finetune.sh

    ```

    Hyperparameters

    We use a similar set of hyperparameters as Vicuna in finetuning. Both hyperparameters used in pretraining and finetuning are provided below.

    1. Pretraining

    ```

    ---

    *Researched: 2026-03-28*

    Generated: 2026-03-28