All about Meta's Llama announcements

My views after attending Meta Connect

We have over 50,000 subscribers! Thank you all for being part of my newsletter. Please share it with your friends and colleagues, and letโ€™s keep growing the community.

My views from inside Meta's Developer Conference

This Wednesday, I had the privilege of attending Meta Connect. This event initially focused on developments in mixed reality and wearable technology. Still, it has become a general conference for Meta to showcase their work in other areas, especially AI.

Meta Connect is an invite-only event, and I had a chance to join, including the Llama Track, which focused on AI development. In this newsletter issue, I want to update you about all the Meta announcements and give you my point of view.

Today, Iโ€™ll cover the following:

  • Feedback from the community about Llama

  • Llama Stack - for me, the most significant announcement of all

  • First Multimodal Llama models

  • Lightweight models

  • Llama Guard to implement Safe AI

  • Why Open Source is the king

  • My POV on Meta Developments

Letโ€™s dive in ๐Ÿคฟ

Feedback from the community about Llama

The community loves Llama. Remember, though, Llama is not even two years old. Since the release of Llama 1 in February 2023 (not that long ago), Llama got 400M+ downloads! Since then, Meta released Llama 2, then this year Llama 3, and now we are getting new releases every other month, with 3.1 in the summer and this week the fantastic 3.2 release. They come with a series of new capabilities and sizes. While the developer community embraces Llama, there are a few things everybody is asking:

What developers want with llama:

  • Low Latency, they need to run the models very fast to provide a better user experience

  • Low Cost of Executions. GPUs are expensive and need to be able to run AI at the lowest price possible

  • Local, developers need to run models anywhere, even on edge devices, on-prem, and close to the data.

  • Multimodality, models need to work with text and language and interpret any media, starting with images.

  • Safe, make sure the models are trained with safe principles but also get tools to apply guardrails for all sorts of use cases without losing performance

  • Easy to use! To build fast on top of Llama, developers need clean APIs, standardized APIs, and plenty of documentation and tutorials.

andโ€ฆ Meta delivered on all fronts!

Llama Stack - for me, the most significant announcement of all

๐—ช๐—ต๐—ฎ๐˜ ๐—ถ๐˜€ ๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ ๐—ฆ๐˜๐—ฎ๐—ฐ๐—ธ?
One of the biggest pain points for developers using Llama is how to use it, get started, and make it more developer-friendly overall. With that, Meta created a stack that standardizes the building blocks needed to bring generative AI applications to market.
 
These blocks span the entire development lifecycle:
- model training
- fine-tuning
- ai product evaluation
- building and running AI agents in production and more
 
๐—ช๐—ต๐—ฎ๐˜ ๐—ถ๐˜€ ๐—ถ๐—ป๐—ฐ๐—น๐˜‚๐—ฑ๐—ฒ๐—ฑ ๐—ถ๐—ป ๐˜๐—ต๐—ถ๐˜€ ๐—ฟ๐—ฒ๐—น๐—ฒ๐—ฎ๐˜€๐—ฒ?
- Llama CLI (command line interface) to build, configure, and run Llama Stack distributions
- Client code in multiple languages, including python, node, kotlin, and swift
- Docker containers for Llama Stack - Distribution Server and Agents API Provider
 
๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ ๐—ฆ๐˜๐—ฎ๐—ฐ๐—ธ ๐—ฐ๐—ผ๐—บ๐—ฒ๐˜€ ๐˜„๐—ถ๐˜๐—ต ๐˜๐—ต๐—ฒ ๐—ณ๐—ผ๐—น๐—น๐—ผ๐˜„๐—ถ๐—ป๐—ด ๐—ฐ๐—ผ๐—ฟ๐—ฒ ๐—ฐ๐—ผ๐—บ๐—ฝ๐—ผ๐—ป๐—ฒ๐—ป๐˜๐˜€:

  1. Agentic System API: Powers end-to-end agentic applications, enabling integration with memory systems, orchestrators, and assistants.

  2. Model Toolchain API: Offers core tools for model development and production, including pre-training, batch and real-time inference, reward scoring, and fine-tuning capabilities.

  3. Data and Models: Provides access to high-quality data and customizable models, ensuring that companies have control over their AI systems.

  4. Hardware Compatibility: Fully optimized for modern hardware architectures, including GPUs and accelerators.

๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ ๐—ฆ๐˜๐—ฎ๐—ฐ๐—ธ ๐—ถ๐˜€ ๐—ฎ๐˜ƒ๐—ฎ๐—ถ๐—น๐—ฎ๐—ฏ๐—น๐—ฒ ๐—ถ๐—ป ๐— ๐˜‚๐—น๐˜๐—ถ๐—ฝ๐—น๐—ฒ ๐—ฑ๐—ถ๐˜€๐˜๐—ฟ๐—ถ๐—ฏ๐˜‚๐˜๐—ถ๐—ผ๐—ป๐˜€:
- Single-node Llama Stack Distribution via Meta internal implementation and Ollama
- Cloud Llama Stack distributions via Cloud providers
- On-device Llama Stack Distribution on iOS implemented via PyTorch ExecuTorch
- On-prem Llama Stack Distribution supported by Dell

Llama Stack is positioned to become the standard for enterprise AI due to its open-source foundation, rapid adoption, and strong leadership. Meta, under Mark Zuckerberg, has a proven track record of building successful open-source ecosystems, as seen with PyTorch.

It think Llama Stack will become the Operating System for AI.

The building block of Llama Stack today (Source: Meta AI Blog)

First Multimodal Llama models

The Llama 3.2 collection's largest models, 11B, and 90B, support image reasoning tasks such as:

  • document-level understanding (including charts and graphs)

  • image captioning

  • and visual grounding (e.g., locating objects based on descriptions).

They can analyze graphs to answer questions, interpret maps for trail information, and generate captions that connect vision and language, enhancing contextual storytelling.

Example of Llama 3.2 90B Vision understanding stock chart

Hereโ€™s a summary of the capabilities:

๐—Ÿ๐—น๐—ฎ๐—บ๐—ฎ ๐Ÿฏ.๐Ÿฎ ๐—ฉ๐—ถ๐˜€๐—ถ๐—ผ๐—ป ๐Ÿญ๐Ÿญ๐—• & ๐Ÿต๐Ÿฌ๐—•:
- Now supports image-in, text-out use cases.
- Excels in document-level understanding, chart interpretation, and image captioning.
- Designed for powerful visual reasoning tasks, bringing open models closer to closed ones in performance.
 
๐—›๐—ถ๐—ด๐—ต-๐—ฅ๐—ฒ๐˜€๐—ผ๐—น๐˜‚๐˜๐—ถ๐—ผ๐—ป ๐—œ๐—บ๐—ฎ๐—ด๐—ฒ ๐—ฆ๐˜‚๐—ฝ๐—ฝ๐—ผ๐—ฟ๐˜:
- Can reason on images up to 1120x1120 pixels.
- Supports tasks like classification, object detection, OCR, contextual Q&A, and data extraction.
 
๐—˜๐—ณ๐—ณ๐—ถ๐—ฐ๐—ถ๐—ฒ๐—ป๐˜ ๐— ๐˜‚๐—น๐˜๐—ถ๐—บ๐—ผ๐—ฑ๐—ฎ๐—น ๐—”๐—ฝ๐—ฝ๐—ฟ๐—ผ๐—ฎ๐—ฐ๐—ต:
Uses image reasoning adaptor weights, keeping the base LLM parameters unchanged. Three key benefits:
1. Retains language performance while adding image reasoning.
2. Efficiently updates just 0.04% of parameters.
3. Activates additional compute resources only when necessary.

The benchmarks are impressive. The vision models are competitive with top AI models like Claude 3 Haiku and GPT4o-mini in image recognition and visual understanding.

Vision Model benchmarks are very competitive (Source: Meta AI Blog)

Hereโ€™s the link to the full collection of Llama 3.2 models: https://huggingface.co/collections/meta-llama/llama-32-66f448ffc8c32f949b04c8cf

Lightweight models

Meta released small, efficient models designed for local use on most hardware, including smartphones. Two key advantages:

  1. Low latency on modest hardware.

  2. Enhanced privacy by eliminating the need for off-device data transmission.

With that, Meta released two new versions of Llama 3.2 models (1B and 3B), which were developed using pruning and distillation techniques. Pruning reduced model size by systematically removing parts while retaining performance. Knowledge distillation involved using larger models (Llama 3.1 8B and 70B) to enhance the smaller modelsโ€™ capabilities, resulting in efficient, high-performing versions that can fit on devices easily.

Pruning & Distillation schema used by Meta (Source: Meta AI Blog)

If you are interested in this technique to reduce the size of models, check out this paper from Nvidia: link

Paper from Nvidia about Pruning and Knowledge Distillation

Llama Guard to implement Safe AI

Llama Guard is a set of advanced moderation models designed to ensure the safe and responsible use of Llama language models. It helps developers detect various types of content violations by filtering both input prompts and output responses by content guidelines.

๐—ช๐—ต๐—ฎ๐˜ ๐—ฎ๐—ฟ๐—ฒ ๐—”๐—œ ๐—š๐˜‚๐—ฎ๐—ฟ๐—ฑ๐—ฟ๐—ฎ๐—ถ๐—น๐˜€?
 
AI guardrails remove potentially harmful content from foundation model output and input, such as hate speech, abuse, and profanity.
 
๐—›๐—ผ๐˜„ ๐—ฑ๐—ผ ๐—š๐˜‚๐—ฎ๐—ฟ๐—ฑ๐—ฟ๐—ฎ๐—ถ๐—น๐˜€ ๐˜„๐—ผ๐—ฟ๐—ธ?

  1. Guardrail systems use AI to apply a classification task to foundation model input and output text, specifically identifying harmful content.

  2. The sentence classifier, known as a HAP (hate, abuse, and profanity) detector or filter, is created by fine-tuning an LLM.

  3. The classifier analyzes sentences by assessing each word, their relationships, and the context to flag harmful content and assigns a score indicating the likelihood of inappropriate content.

The new Meta Guard models introduced are:

  • Llama Guard 3 11B Vision: Designed to support Llama 3.2's image understanding capabilities by filtering text+image prompts and responses.

  • Llama Guard 3 1B: An optimized, pruned, and quantized version based on Llama 3.2 1B, reduced from 2,858 MB to 438 MB for efficient deployment in constrained environments.

Llama Guard integrates "Prompt Guard" to protect models from prompt-based attacks, like prompt injections and jailbreaking, which aim to exploit or bypass safety measures. Another new addition is "Code Shield," designed to filter and prevent the use of insecure or malicious code generated by LLMs, thereby ensuring safer use of code interpreters within Llama systems.

These safeguards enhance Llama modelsโ€™ reliability and ensure safer integration across multiple use cases.

Meta also shared the Safety Dev Cycle to train their Llama models with all the steps to ensure models have proper evaluation and red teaming.

Why Open Source is the king

Last July, Marck Zuckerberg wrote the following blog: Open Source AI Is the Path. These are the main points why Open Source is set to shape the AI industry:

  • Accessibility and Flexibility: Open source AI allows customization, reduces vendor lock-in, and gives developers more control over their use and data privacy.

  • Security and Transparency: Open development ensures community scrutiny, making open source AI safer and more trustworthy than closed models.

  • Broad Industry Impact: Open source prevents monopolization, accelerates progress, and fosters collaboration, aiming to make AI accessible to all, much like Linux did for computing.

Open source AI is the key to unlocking innovation for everyone, ensuring that the power of technology remains in the hands of many, not just a privileged few.

My POV on Meta Developments

Meta Connect was about far more than just AI and Llama models. The event featured a new VR headset, an upgraded version of their popular Ray-Ban glasses now powered by Llama models, and even a futuristic prototype of Orion glasses with holographic capabilities. On the AI front, Meta is poised to disrupt the market, committing significant resources to bring cutting-edge technology to their products while generously making these innovations available to the open source community.

Zuckerbergโ€™s ambition and leadership are evident. In the Bay Area, the competitive atmosphere between Meta, Apple, Google, and others is palpable. Zuckerberg knows that to win, Meta must shape the next computing platform and deliver the best services and experiences around it, including AI. Meta, built on the foundations of open source, understands the power of open collaboration for scaling talent and fostering rapid innovation.

The progress Meta is making with Llama is truly remarkable.

and thatโ€™s it for today.

Sorry, I didnโ€™t write for a whileโ€ฆbut now, I am BACK!

Cheers,

Armand ๐Ÿ’ช

Whenever you're ready, learn AI with me:

  • The 15-day Generative AI course: Join my 15-day Generative AI email course, and learn with just 5 minutes a day. You'll receive concise daily lessons focused on practical business applications. It is perfect for quickly learning and applying core AI concepts. 25,000+ Business Professionals are already learning with it.

Reply

or to participate.