All about Meta's Llama announcements

My views after attending Meta Connect

We have over 50,000 subscribers! Thank you all for being part of my newsletter. Please share it with your friends and colleagues, and let’s keep growing the community.

My views from inside Meta's Developer Conference

This Wednesday, I had the privilege of attending Meta Connect. This event initially focused on developments in mixed reality and wearable technology. Still, it has become a general conference for Meta to showcase their work in other areas, especially AI.

Meta Connect is an invite-only event, and I had a chance to join, including the Llama Track, which focused on AI development. In this newsletter issue, I want to update you about all the Meta announcements and give you my point of view.

Today, I’ll cover the following:

  • Feedback from the community about Llama

  • Llama Stack - for me, the most significant announcement of all

  • First Multimodal Llama models

  • Lightweight models

  • Llama Guard to implement Safe AI

  • Why Open Source is the king

  • My POV on Meta Developments

Let’s dive in 🤿

Feedback from the community about Llama

The community loves Llama. Remember, though, Llama is not even two years old. Since the release of Llama 1 in February 2023 (not that long ago), Llama got 400M+ downloads! Since then, Meta released Llama 2, then this year Llama 3, and now we are getting new releases every other month, with 3.1 in the summer and this week the fantastic 3.2 release. They come with a series of new capabilities and sizes. While the developer community embraces Llama, there are a few things everybody is asking:

What developers want with llama:

  • Low Latency, they need to run the models very fast to provide a better user experience

  • Low Cost of Executions. GPUs are expensive and need to be able to run AI at the lowest price possible

  • Local, developers need to run models anywhere, even on edge devices, on-prem, and close to the data.

  • Multimodality, models need to work with text and language and interpret any media, starting with images.

  • Safe, make sure the models are trained with safe principles but also get tools to apply guardrails for all sorts of use cases without losing performance

  • Easy to use! To build fast on top of Llama, developers need clean APIs, standardized APIs, and plenty of documentation and tutorials.

and… Meta delivered on all fronts!

Llama Stack - for me, the most significant announcement of all

𝗪𝗵𝗮𝘁 𝗶𝘀 𝗟𝗹𝗮𝗺𝗮 𝗦𝘁𝗮𝗰𝗸?
One of the biggest pain points for developers using Llama is how to use it, get started, and make it more developer-friendly overall. With that, Meta created a stack that standardizes the building blocks needed to bring generative AI applications to market.
 
These blocks span the entire development lifecycle:
- model training
- fine-tuning
- ai product evaluation
- building and running AI agents in production and more
 
𝗪𝗵𝗮𝘁 𝗶𝘀 𝗶𝗻𝗰𝗹𝘂𝗱𝗲𝗱 𝗶𝗻 𝘁𝗵𝗶𝘀 𝗿𝗲𝗹𝗲𝗮𝘀𝗲?
- Llama CLI (command line interface) to build, configure, and run Llama Stack distributions
- Client code in multiple languages, including python, node, kotlin, and swift
- Docker containers for Llama Stack - Distribution Server and Agents API Provider
 
𝗟𝗹𝗮𝗺𝗮 𝗦𝘁𝗮𝗰𝗸 𝗰𝗼𝗺𝗲𝘀 𝘄𝗶𝘁𝗵 𝘁𝗵𝗲 𝗳𝗼𝗹𝗹𝗼𝘄𝗶𝗻𝗴 𝗰𝗼𝗿𝗲 𝗰𝗼𝗺𝗽𝗼𝗻𝗲𝗻𝘁𝘀:

  1. Agentic System API: Powers end-to-end agentic applications, enabling integration with memory systems, orchestrators, and assistants.

  2. Model Toolchain API: Offers core tools for model development and production, including pre-training, batch and real-time inference, reward scoring, and fine-tuning capabilities.

  3. Data and Models: Provides access to high-quality data and customizable models, ensuring that companies have control over their AI systems.

  4. Hardware Compatibility: Fully optimized for modern hardware architectures, including GPUs and accelerators.

𝗟𝗹𝗮𝗺𝗮 𝗦𝘁𝗮𝗰𝗸 𝗶𝘀 𝗮𝘃𝗮𝗶𝗹𝗮𝗯𝗹𝗲 𝗶𝗻 𝗠𝘂𝗹𝘁𝗶𝗽𝗹𝗲 𝗱𝗶𝘀𝘁𝗿𝗶𝗯𝘂𝘁𝗶𝗼𝗻𝘀:
- Single-node Llama Stack Distribution via Meta internal implementation and Ollama
- Cloud Llama Stack distributions via Cloud providers
- On-device Llama Stack Distribution on iOS implemented via PyTorch ExecuTorch
- On-prem Llama Stack Distribution supported by Dell

Llama Stack is positioned to become the standard for enterprise AI due to its open-source foundation, rapid adoption, and strong leadership. Meta, under Mark Zuckerberg, has a proven track record of building successful open-source ecosystems, as seen with PyTorch.

It think Llama Stack will become the Operating System for AI.

The building block of Llama Stack today (Source: Meta AI Blog)

First Multimodal Llama models

The Llama 3.2 collection's largest models, 11B, and 90B, support image reasoning tasks such as:

  • document-level understanding (including charts and graphs)

  • image captioning

  • and visual grounding (e.g., locating objects based on descriptions).

They can analyze graphs to answer questions, interpret maps for trail information, and generate captions that connect vision and language, enhancing contextual storytelling.

Example of Llama 3.2 90B Vision understanding stock chart

Here’s a summary of the capabilities:

𝗟𝗹𝗮𝗺𝗮 𝟯.𝟮 𝗩𝗶𝘀𝗶𝗼𝗻 𝟭𝟭𝗕 & 𝟵𝟬𝗕:
- Now supports image-in, text-out use cases.
- Excels in document-level understanding, chart interpretation, and image captioning.
- Designed for powerful visual reasoning tasks, bringing open models closer to closed ones in performance.
 
𝗛𝗶𝗴𝗵-𝗥𝗲𝘀𝗼𝗹𝘂𝘁𝗶𝗼𝗻 𝗜𝗺𝗮𝗴𝗲 𝗦𝘂𝗽𝗽𝗼𝗿𝘁:
- Can reason on images up to 1120x1120 pixels.
- Supports tasks like classification, object detection, OCR, contextual Q&A, and data extraction.
 
𝗘𝗳𝗳𝗶𝗰𝗶𝗲𝗻𝘁 𝗠𝘂𝗹𝘁𝗶𝗺𝗼𝗱𝗮𝗹 𝗔𝗽𝗽𝗿𝗼𝗮𝗰𝗵:
Uses image reasoning adaptor weights, keeping the base LLM parameters unchanged. Three key benefits:
1. Retains language performance while adding image reasoning.
2. Efficiently updates just 0.04% of parameters.
3. Activates additional compute resources only when necessary.

The benchmarks are impressive. The vision models are competitive with top AI models like Claude 3 Haiku and GPT4o-mini in image recognition and visual understanding.

Vision Model benchmarks are very competitive (Source: Meta AI Blog)

Here’s the link to the full collection of Llama 3.2 models: https://huggingface.co/collections/meta-llama/llama-32-66f448ffc8c32f949b04c8cf

Lightweight models

Meta released small, efficient models designed for local use on most hardware, including smartphones. Two key advantages:

  1. Low latency on modest hardware.

  2. Enhanced privacy by eliminating the need for off-device data transmission.

With that, Meta released two new versions of Llama 3.2 models (1B and 3B), which were developed using pruning and distillation techniques. Pruning reduced model size by systematically removing parts while retaining performance. Knowledge distillation involved using larger models (Llama 3.1 8B and 70B) to enhance the smaller models’ capabilities, resulting in efficient, high-performing versions that can fit on devices easily.

Pruning & Distillation schema used by Meta (Source: Meta AI Blog)

If you are interested in this technique to reduce the size of models, check out this paper from Nvidia: link

Paper from Nvidia about Pruning and Knowledge Distillation

Llama Guard to implement Safe AI

Llama Guard is a set of advanced moderation models designed to ensure the safe and responsible use of Llama language models. It helps developers detect various types of content violations by filtering both input prompts and output responses by content guidelines.

𝗪𝗵𝗮𝘁 𝗮𝗿𝗲 𝗔𝗜 𝗚𝘂𝗮𝗿𝗱𝗿𝗮𝗶𝗹𝘀?
 
AI guardrails remove potentially harmful content from foundation model output and input, such as hate speech, abuse, and profanity.
 
𝗛𝗼𝘄 𝗱𝗼 𝗚𝘂𝗮𝗿𝗱𝗿𝗮𝗶𝗹𝘀 𝘄𝗼𝗿𝗸?

  1. Guardrail systems use AI to apply a classification task to foundation model input and output text, specifically identifying harmful content.

  2. The sentence classifier, known as a HAP (hate, abuse, and profanity) detector or filter, is created by fine-tuning an LLM.

  3. The classifier analyzes sentences by assessing each word, their relationships, and the context to flag harmful content and assigns a score indicating the likelihood of inappropriate content.

The new Meta Guard models introduced are:

  • Llama Guard 3 11B Vision: Designed to support Llama 3.2's image understanding capabilities by filtering text+image prompts and responses.

  • Llama Guard 3 1B: An optimized, pruned, and quantized version based on Llama 3.2 1B, reduced from 2,858 MB to 438 MB for efficient deployment in constrained environments.

Llama Guard integrates "Prompt Guard" to protect models from prompt-based attacks, like prompt injections and jailbreaking, which aim to exploit or bypass safety measures. Another new addition is "Code Shield," designed to filter and prevent the use of insecure or malicious code generated by LLMs, thereby ensuring safer use of code interpreters within Llama systems.

These safeguards enhance Llama models’ reliability and ensure safer integration across multiple use cases.

Meta also shared the Safety Dev Cycle to train their Llama models with all the steps to ensure models have proper evaluation and red teaming.

Why Open Source is the king

Last July, Marck Zuckerberg wrote the following blog: Open Source AI Is the Path. These are the main points why Open Source is set to shape the AI industry:

  • Accessibility and Flexibility: Open source AI allows customization, reduces vendor lock-in, and gives developers more control over their use and data privacy.

  • Security and Transparency: Open development ensures community scrutiny, making open source AI safer and more trustworthy than closed models.

  • Broad Industry Impact: Open source prevents monopolization, accelerates progress, and fosters collaboration, aiming to make AI accessible to all, much like Linux did for computing.

Open source AI is the key to unlocking innovation for everyone, ensuring that the power of technology remains in the hands of many, not just a privileged few.

My POV on Meta Developments

Meta Connect was about far more than just AI and Llama models. The event featured a new VR headset, an upgraded version of their popular Ray-Ban glasses now powered by Llama models, and even a futuristic prototype of Orion glasses with holographic capabilities. On the AI front, Meta is poised to disrupt the market, committing significant resources to bring cutting-edge technology to their products while generously making these innovations available to the open source community.

Zuckerberg’s ambition and leadership are evident. In the Bay Area, the competitive atmosphere between Meta, Apple, Google, and others is palpable. Zuckerberg knows that to win, Meta must shape the next computing platform and deliver the best services and experiences around it, including AI. Meta, built on the foundations of open source, understands the power of open collaboration for scaling talent and fostering rapid innovation.

The progress Meta is making with Llama is truly remarkable.

and that’s it for today.

Sorry, I didn’t write for a while…but now, I am BACK!

Cheers,

Armand 💪

Whenever you're ready, learn AI with me:

  • The 15-day Generative AI course: Join my 15-day Generative AI email course, and learn with just 5 minutes a day. You'll receive concise daily lessons focused on practical business applications. It is perfect for quickly learning and applying core AI concepts. 25,000+ Business Professionals are already learning with it.

Reply

or to participate.