LAVIS - A One-stop Library for Language-Vision Intelligence
A one stop repository for generative AI research updates, interview reso...
Code for ALBEF: a new vision-language pre-training method
Multimodal-GPT
Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Tra...
The implementation of "Prismer: A Vision-Language Model with Multi-Task ...
Recent Advances in Vision and Language PreTrained Models (VL-PTMs)
Oscar and VinVL
X-modaler is a versatile and high-performance codebase for cross-modal a...
My Reading Lists of Deep Learning and Natural Language Processing
日本語LLMまとめ - Overview of Japanese LLMs
Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Represen...
Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Lingui...
[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-it...
[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want