☁️ Build multimodal AI applications with cloud-native stack
[NeurIPS'23 Oral] Visual Instruction Tuning (LLaVA) built towards GPT-4V...
🏄 Scalable embedding, reasoning, ranking for images and sentences with ...
:sparkles::sparkles:Latest Advances on Multimodal Large Language Models
Simple command line tool for text to image generation using OpenAI's CLI...
Extract markdown and images from URLs, PDFs, docs, slides, and more, rea...
Algorithms and Publications on 3D Object Tracking
Collaborative Diffusion (CVPR 2023)
Effortless plugin and play Optimizer to cut model training costs by 50%...
[CVPR'23] MM-Diffusion: Learning Multi-Modal Diffusion Models for Joint ...
[ICCV2019] Robust Multi-Modality Multi-Object Tracking
Unifying Voxel-based Representation with Transformer for 3D Object Detec...
This repo contains the official code of our work SAM-SLR which won the C...
[CVPR'24] RLHF-V: Towards Trustworthy MLLMs via Behavior Alignment from ...
Official code for NeurIPS2023 paper: CoDA: Collaborative Novel Box Disco...