AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval (https://arxiv.org/abs/2406.11200)
Paper: Arxiv preprint
AvaTaR is a novel and automatic framework that optimizes an LLM agent to effectively use the provided tools and improve its performance on a given task/domain. During optimization, we design a comparator module to iteratively provide insightful and holistic prompts to the LLM agent via reasoning between positive and negative examples sampled from training data.
conda create -n avatar python=3.11
pip install stark-qa typeguard
export ANTHROPIC_API_KEY=YOUR_API_KEY
export OPENAI_API_KEY=YOUR_API_KEY
export OPENAI_ORG=YOUR_ORGANIZATION
sh scripts/emb_download_all.sh
data
├── flickr30k_entities
│ ├── raw
│ │ ├── Annotations
│ │ │ ├── 36979.xml
│ │ │ ├── ...
│ │ ├── flickr30k-images
│ │ ├── 36979.jpg
│ │ ├── ...
│ ├── split
│ │ ├── test.index
│ │ ├── train.index
│ │ ├── val.index
│ ├── qa.csv
├── ...
We already include the VSS results locally under output/eval
and the grouping (for STaRK only) under output/agent
. With these files, you should be able to optimize actor actions directly following the AvaTaR pipeline.
config/default_args.json
, run the following command to optimize the actor actions for a group of queries:
sh scripts/run_avatar_stark.sh
You can specify the dataset name and group in scripts/run_avatar_stark.sh
.
sh run_avatar_flickr30k_entities.sh
sh scripts/run_eval_avatar_stark.sh
or
sh scripts/run_eval_avatar_flickr30k_entities.sh
@article{wu24avatar,
title = {AvaTaR: Optimizing LLM Agents for Tool-Assisted Knowledge Retrieval},
author = {
Shirley Wu and Shiyu Zhao and
Qian Huang and Kexin Huang and
Michihiro Yasunaga and Kaidi Cao and
Vassilis N. Ioannidis and Karthik Subbian and
Jure Leskove and James Zou
},
eprinttype = {arXiv},
eprint = {2406.11200},
year = {2024}
}