The Allen Institute for AI (Ai2) has introduced Molmo 2, a new set of open-source AI vision models that excel in analyzing and answering questions about videos. Recently showcased in Seattle, these models demonstrated their capabilities by tracking objects and providing detailed analyses of various clips.
Molmo 2 surpassed both open-source benchmarks and outperformed closed systems, such as Google's Gemini 3, particularly in video tracking tasks. In practical demonstrations, the model answered queries related to soccer and baseball clips, identifying teams, players, and crucial plays while also extracting structured recipes from cooking videos.
In a notable tracking demonstration, Molmo 2 followed four penguins, accurately maintaining their identities even when they overlapped. The model also effectively tracked a car in a racing video, showcasing its ability to understand complex queries about moving objects.
Recently announced, Molmo 2 signifies a significant achievement for Ai2, which has gained recognition for its commitment to creating fully open AI systems, distinguishing itself from major industry players that utilize more restrictive approaches.