:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Video-LLaVA: Learning United Visual Representation by Alignment Before P...
Extract markdown and images from URLs, PDFs, docs, slides, and more, rea...
This repo contains evaluation code for the paper "Are We on the Right Wa...
Latest Papers and Datasets on Visual Instruction Tuning
[arXiv'24] CARES: A Comprehensive Benchmark of Trustworthiness in Medica...