2024-10-04 21:15 Weblink Research & Reports
Meta Movie Gen is our latest research breakthrough that allows you to use simple text inputs to create videos and sounds, edit existing videos or transform your personal image into a unique video.
2024-07-30 09:02 Weblink News
Today, we’re publicly releasing SAM 2, the first-ever unified model for segmenting anything in videos and images.
2024-04-18 19:25 ML Model News
Today, we’re introducing Meta Llama 3, the next generation of our state-of-the-art open source large language model. In the coming months, we expect to share new capabilities, additional model sizes, and more.
2024-02-16 16:18 Weblink
We’re releasing the Video Joint Embedding Predictive Architecture (V-JEPA) model, a crucial step in advancing machine intelligence with a more grounded understanding of the world.
2023-12-07 15:35 Weblink Research & Reports

“An agent that can play at the level of humans in a game as strategically complex as Diplomacy is a true breakthrough for cooperative AI.”

Yann LeCun

VP & Chief AI Scientist, AI at Meta

By building CICERO, AI at Meta has created the first AI agent to achieve human-level performance in the complex natural language strategy game Diplomacy*. CICERO demonstrated this by playing with humans on, an online version of the game, where CICERO achieved more than double the average score of the human players and ranked in the top 10% of participants who played more than one game.

This breakthrough rests in the achievement of combining two different areas of AI: strategic reasoning and natural language processing. The integration of these techniques gives CICERO the ability to reason and strategize with regard to players’ motivations, then use natural language to communicate, reach agreements to achieve shared objectives, form alliances and coordinate plans.

2023-12-04 10:48 Weblink Tools & Methods
A significant step towards removing language barriers through expressive, fast and high-quality AI translation,

Seamless merges the quality and multilinguality of SeamlessM4T v2, the low latency of SeamlessStreaming and the expression preservation of SeamlessExpressive into one unified system. It’s the first streaming translation model to maintain both vocal style and prosody, which can be particularly challenging in streaming, where the system only has access to partial input.

2023-09-29 14:19 Weblink Research & Reports
Training text-to-image models with web scale image-text pairs enables the generation of a wide range of visual concepts from text. However, these pre-trained models often face challenges when it comes to generating highly aesthetic images. This creates the need for aesthetic alignment post pre-training. In this paper, we propose quality-tuning to effectively guide a pre-trained model to exclusively generate highly visually appealing images, while maintaining generality across visual concepts. Our key insight is that supervised fine-tuning with a set of surprisingly small but extremely visually appealing images can significantly improve the generation quality. We pre-train a latent diffusion model on 1.1 billion image-text pairs and fine-tune it with only a few thousand carefully selected high-quality images. The resulting model, Emu, achieves a win rate of 82.9% compared with its pre-trained only counterpart. Compared to the state-of-the-art SDXLv1.0, Emu is preferred 68.4% and 71.3% of the time on visual appeal on the standard PartiPrompts and our Open User Input benchmark based on the real-world usage of text-to-image models. In addition, we show that quality-tuning is a generic approach that is also effective for other architectures, including pixel diffusion and masked generative transformer models.
2023-08-22 17:35 ML Model Research & Reports
SeamlessM4T is a foundational speech/text translation and transcription model that overcomes the limitations of previous systems with state-of-the-art results.

