Like . Comment . Subscribe .
Discord: / discord
https://github.com/hupo/docs
What matters when building visionlanguage models?
https://arxiv.org/pdf/2405.02246
Mirasol3B: A Multimodal Autoregressive Model for TimeAligned and Contextual Modalities
https://arxiv.org/pdf/2311.05698
Gemini 1.5: Unlocking multimodal understanding across millions of tokens of context
https://storage.googleapis.com/deepmi...
Scaling Autoregressive MultiModal Models: Pretraining and Instruction Tuning
https://arxiv.org/pdf/2309.02591