Code for "Lion: Adversarial Distillation of Proprietary Large Language Models (EMNLP 2023)"
[📄 Paper] | [🤗 Lion Weights]
The high-level overview of our adversarial distillation framework, where we craft a compact Student LLM based on a superior closed-source LLM that serves three roles: the Teacher, the Referee, and the Generator. From left to right, there are three stages in an iteration:
We release Lion weights as delta weights to comply with the LLaMA model license.
You can add our delta to the original LLaMA weights to obtain the Lion weights. Instructions:
python src/weight_diff.py recover --path_raw huggyllama/llama-7b --path_diff YuxinJiang/lion-7b --path_tuned <path_to_store_recovered_weights>
For inference and training of Lion, please first install the requirements:
pip install -r requirements.txt
We provide the decoding script for Lion, which reads a input file and generates corresponding responses for each sample, and finally consolidates them into an output file. It can be run on a single machine with 16GB GPU.
python src/lion_inference.py \
--model_dir <path_to_hf_converted_lion_ckpt_and_tokenizer> \
--data_dir <path_to_input_json_file> \
--output_dir <path_to_output_json_file> \
--num_gpus 1
Below shows one iteration of our adversarial distillation framework.
python src/chatgpt_inference.py \
-q <path_to_json_file_for_the_Train_Pool> \
-o <path_to_chatgpt_inference_for_the_Train_Pool> \
--api_key <your_openai_api_key>
Fine-tuning was conducted on a machine with 8 A100 80G GPUs.
torchrun --nproc_per_node=8 --master_port=<your_random_port> src/train.py \
--model_name_or_path <path_to_hf_converted_ckpt_and_tokenizer> \
--data_path <path_to_chatgpt_inference_for_the_Train_Pool> \
--bf16 True \
--output_dir result \
--num_train_epochs 3 \
--model_max_length 1024 \
--per_device_train_batch_size 2 \
--per_device_eval_batch_size 2 \
--gradient_accumulation_steps 8 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 600 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--weight_decay 0. \
--warmup_ratio 0.03 \
--lr_scheduler_type "cosine" \
--logging_steps 1 \
--fsdp "full_shard auto_wrap" \
--fsdp_transformer_layer_cls_to_wrap 'LlamaDecoderLayer' \
--tf32 True
Addressing OOM
Naively, fine-tuning a 7B model requires about 7 x 8 x 2 = 112 GB of VRAM. Commands given above enable parameter sharding, so no redundant model copy is stored on any GPU. If you'd like to further reduce the memory footprint, here are some options:
Turn on CPU offload for FSDP with --fsdp "full_shard auto_wrap offload"
. This saves VRAM at the cost of longer runtime.
In our experience, DeepSpeed stage-3 (with offload) can at times be more memory efficient than FSDP with offload. Here's an example to use DeepSpeed stage-3 with 8 GPUs with both parameter and optimizer offload:
deepspeed src/train_deepspeed.py \
--model_name_or_path <path_to_hf_converted_ckpt_and_tokenizer> \
--data_path <path_to_chatgpt_inference_for_the_Train_Pool> \
--output_dir result \
--num_train_epochs 3 \
--model_max_length 1024 \
--per_device_train_batch_size 16 \
--per_device_eval_batch_size 1 \
--gradient_accumulation_steps 1 \
--evaluation_strategy "no" \
--save_strategy "steps" \
--save_steps 600 \
--save_total_limit 1 \
--learning_rate 2e-5 \
--warmup_ratio 0.03 \
--logging_steps 1 \
--lr_scheduler_type "cosine" \
--report_to "tensorboard" \
--gradient_checkpointing True \
--deepspeed srcs/configs/deepspeed_config.json \
--fp16 True
LoRA fine-tunes low-rank slices of the query, key, and value embedding heads. This can reduce the total memory footprint from 112GB to about 7x4=28GB. We may release our re-implemention of this in the future, but for now the peft codebase can be a useful resource.
python src/chatgpt_inference.py \
-q <path_to_json_file_for_the_Cache_Pool> \
-o <path_to_chatgpt_inference_for_the_Cache_Pool> \
--api_key <your_openai_api_key>
python src/lion_inference.py \
--model_dir <path_to_hf_converted_lion_ckpt_and_tokenizer> \
--data_dir <path_to_json_file_for_the_Cache_Pool> \
--output_dir <path_to_lion_inference_for_the_Cache_Pool> \
--num_gpus 8
To mitigate the position bias of the LLM referee, we conduct two runs by exchanging the positions of the teacher's response and the student's response.
python src/chatgpt_referee.py \
-a <path_to_chatgpt_inference_for_the_Cache_Pool> <path_to_lion_inference_for_the_Cache_Pool> \
-o <path_to_output_review_chatgpt_lion_file> \
--api_key <your_openai_api_key>
python src/chatgpt_referee.py \
-a <path_to_lion_inference_for_the_Cache_Pool> <path_to_chatgpt_inference_for_the_Cache_Pool> \
-o <path_to_output_review_lion_chatgpt_file> \
--api_key <your_openai_api_key>
python src/discrimination.py \
--review12_path <path_to_output_review_chatgpt_lion_file> \
--review21_path <path_to_output_review_lion_chatgpt_file> \
--chatgpt_inference_path <path_to_chatgpt_inference_for_the_Cache_Pool> \
--lion_inference_path <path_to_lion_inference_for_the_Cache_Pool> \
--hard_save_path <path_to_identified_hard_instructions> \
--easy_save_path <path_to_identified_easy_instructions>
python -m src/generate_hard_instruction generate_instruction_following_data \
--seed_tasks_path <path_to_identified_hard_instructions> \
--all_tasks_path <path_to_json_file_for_the_Cache_Pool> \
--output_dir <path_to_generated_hard_instructions> \
--num_instructions_to_generate 3000 \
--api_key <your_openai_api_key>
python -m src/generate_easy_instruction generate_instruction_following_data \
--seed_tasks_path <path_to_identified_easy_instructions> \
--all_tasks_path <path_to_json_file_for_the_Cache_Pool> \
--output_dir <path_to_generated_easy_instructions> \
--num_instructions_to_generate 3000 \
--api_key <your_openai_api_key>
We leverage GPT-4 to automatically assess the quality of responses (rated on a scale of 1 to 10) between a reference model (ChatGPT) and a candidate model. Subsequently, we calculate the candidate model’s performance as the percentage of the total score it achieves compared to the reference model.
Please cite our paper if you use the code in this repo.
@inproceedings{jiang-etal-2023-lion,
title = "Lion: Adversarial Distillation of Proprietary Large Language Models",
author = "Jiang, Yuxin and
Chan, Chunkit and
Chen, Mingyang and
Wang, Wei",
editor = "Bouamor, Houda and
Pino, Juan and
Bali, Kalika",
booktitle = "Proceedings of the 2023 Conference on Empirical Methods in Natural Language Processing",
month = dec,
year = "2023",
address = "Singapore",
publisher = "Association for Computational Linguistics",
url = "https://aclanthology.org/2023.emnlp-main.189",
doi = "10.18653/v1/2023.emnlp-main.189",
pages = "3134--3154",
}
⚠️ Lion is intended and licensed for research use ONLY. Commercial use is strictly prohibited. The content produced by any version of Lion is influenced by uncontrollable variables such as randomness, and therefore, the accuracy of the output cannot be guaranteed by this project. This project does not accept any legal liability for the content of the model output, nor does it assume responsibility for any losses incurred due to the use of associated resources and output results.