Best 80 Vision And Language Open Source Projects

LAVIS - A One-stop Library for Language-Vision Intelligence

A one stop repository for generative AI research updates, interview reso...

Code for ALBEF: a new vision-language pre-training method

Multimodal-GPT

Code for the ICML 2021 (long talk) paper: "ViLT: Vision-and-Language Tra...

The implementation of "Prismer: A Vision-Language Model with Multi-Task ...

Recent Advances in Vision and Language PreTrained Models (VL-PTMs)

Oscar and VinVL

X-modaler is a versatile and high-performance codebase for cross-modal a...

My Reading Lists of Deep Learning and Natural Language Processing

日本語LLMまとめ - Overview of Japanese LLMs

Research code for ECCV 2020 paper "UNITER: UNiversal Image-TExt Represen...

Code for ICLR 2020 paper "VL-BERT: Pre-training of Generic Visual-Lingui...

[CVPR 2024 🔥] Grounding Large Multimodal Model (GLaMM), the first-of-it...

[CVPR 2024] Alpha-CLIP: A CLIP Model Focusing on Wherever You Want