- AI with Armand
- Posts
- All about Meta's Llama announcements
All about Meta's Llama announcements
My views after attending Meta Connect
We have over 50,000 subscribers! Thank you all for being part of my newsletter. Please share it with your friends and colleagues, and letโs keep growing the community.
My views from inside Meta's Developer Conference
This Wednesday, I had the privilege of attending Meta Connect. This event initially focused on developments in mixed reality and wearable technology. Still, it has become a general conference for Meta to showcase their work in other areas, especially AI.
Meta Connect is an invite-only event, and I had a chance to join, including the Llama Track, which focused on AI development. In this newsletter issue, I want to update you about all the Meta announcements and give you my point of view.
Today, Iโll cover the following:
Feedback from the community about Llama
Llama Stack - for me, the most significant announcement of all
First Multimodal Llama models
Lightweight models
Llama Guard to implement Safe AI
Why Open Source is the king
My POV on Meta Developments
Letโs dive in ๐คฟ
Feedback from the community about Llama
The community loves Llama. Remember, though, Llama is not even two years old. Since the release of Llama 1 in February 2023 (not that long ago), Llama got 400M+ downloads! Since then, Meta released Llama 2, then this year Llama 3, and now we are getting new releases every other month, with 3.1 in the summer and this week the fantastic 3.2 release. They come with a series of new capabilities and sizes. While the developer community embraces Llama, there are a few things everybody is asking:
What developers want with llama:
Low Latency, they need to run the models very fast to provide a better user experience
Low Cost of Executions. GPUs are expensive and need to be able to run AI at the lowest price possible
Local, developers need to run models anywhere, even on edge devices, on-prem, and close to the data.
Multimodality, models need to work with text and language and interpret any media, starting with images.
Safe, make sure the models are trained with safe principles but also get tools to apply guardrails for all sorts of use cases without losing performance
Easy to use! To build fast on top of Llama, developers need clean APIs, standardized APIs, and plenty of documentation and tutorials.
andโฆ Meta delivered on all fronts!
Llama Stack - for me, the most significant announcement of all
๐ช๐ต๐ฎ๐ ๐ถ๐ ๐๐น๐ฎ๐บ๐ฎ ๐ฆ๐๐ฎ๐ฐ๐ธ?
One of the biggest pain points for developers using Llama is how to use it, get started, and make it more developer-friendly overall. With that, Meta created a stack that standardizes the building blocks needed to bring generative AI applications to market.
These blocks span the entire development lifecycle:
- model training
- fine-tuning
- ai product evaluation
- building and running AI agents in production and more
๐ช๐ต๐ฎ๐ ๐ถ๐ ๐ถ๐ป๐ฐ๐น๐๐ฑ๐ฒ๐ฑ ๐ถ๐ป ๐๐ต๐ถ๐ ๐ฟ๐ฒ๐น๐ฒ๐ฎ๐๐ฒ?
- Llama CLI (command line interface) to build, configure, and run Llama Stack distributions
- Client code in multiple languages, including python, node, kotlin, and swift
- Docker containers for Llama Stack - Distribution Server and Agents API Provider
๐๐น๐ฎ๐บ๐ฎ ๐ฆ๐๐ฎ๐ฐ๐ธ ๐ฐ๐ผ๐บ๐ฒ๐ ๐๐ถ๐๐ต ๐๐ต๐ฒ ๐ณ๐ผ๐น๐น๐ผ๐๐ถ๐ป๐ด ๐ฐ๐ผ๐ฟ๐ฒ ๐ฐ๐ผ๐บ๐ฝ๐ผ๐ป๐ฒ๐ป๐๐:
Agentic System API: Powers end-to-end agentic applications, enabling integration with memory systems, orchestrators, and assistants.
Model Toolchain API: Offers core tools for model development and production, including pre-training, batch and real-time inference, reward scoring, and fine-tuning capabilities.
Data and Models: Provides access to high-quality data and customizable models, ensuring that companies have control over their AI systems.
Hardware Compatibility: Fully optimized for modern hardware architectures, including GPUs and accelerators.
๐๐น๐ฎ๐บ๐ฎ ๐ฆ๐๐ฎ๐ฐ๐ธ ๐ถ๐ ๐ฎ๐๐ฎ๐ถ๐น๐ฎ๐ฏ๐น๐ฒ ๐ถ๐ป ๐ ๐๐น๐๐ถ๐ฝ๐น๐ฒ ๐ฑ๐ถ๐๐๐ฟ๐ถ๐ฏ๐๐๐ถ๐ผ๐ป๐:
- Single-node Llama Stack Distribution via Meta internal implementation and Ollama
- Cloud Llama Stack distributions via Cloud providers
- On-device Llama Stack Distribution on iOS implemented via PyTorch ExecuTorch
- On-prem Llama Stack Distribution supported by Dell
Llama Stack is positioned to become the standard for enterprise AI due to its open-source foundation, rapid adoption, and strong leadership. Meta, under Mark Zuckerberg, has a proven track record of building successful open-source ecosystems, as seen with PyTorch.
It think Llama Stack will become the Operating System for AI.
The building block of Llama Stack today (Source: Meta AI Blog)
Link to the Github Repo: https://github.com/meta-llama/llama-stack
First Multimodal Llama models
The Llama 3.2 collection's largest models, 11B, and 90B, support image reasoning tasks such as:
document-level understanding (including charts and graphs)
image captioning
and visual grounding (e.g., locating objects based on descriptions).
They can analyze graphs to answer questions, interpret maps for trail information, and generate captions that connect vision and language, enhancing contextual storytelling.
Example of Llama 3.2 90B Vision understanding stock chart
Hereโs a summary of the capabilities:
๐๐น๐ฎ๐บ๐ฎ ๐ฏ.๐ฎ ๐ฉ๐ถ๐๐ถ๐ผ๐ป ๐ญ๐ญ๐ & ๐ต๐ฌ๐:
- Now supports image-in, text-out use cases.
- Excels in document-level understanding, chart interpretation, and image captioning.
- Designed for powerful visual reasoning tasks, bringing open models closer to closed ones in performance.
๐๐ถ๐ด๐ต-๐ฅ๐ฒ๐๐ผ๐น๐๐๐ถ๐ผ๐ป ๐๐บ๐ฎ๐ด๐ฒ ๐ฆ๐๐ฝ๐ฝ๐ผ๐ฟ๐:
- Can reason on images up to 1120x1120 pixels.
- Supports tasks like classification, object detection, OCR, contextual Q&A, and data extraction.
๐๐ณ๐ณ๐ถ๐ฐ๐ถ๐ฒ๐ป๐ ๐ ๐๐น๐๐ถ๐บ๐ผ๐ฑ๐ฎ๐น ๐๐ฝ๐ฝ๐ฟ๐ผ๐ฎ๐ฐ๐ต:
Uses image reasoning adaptor weights, keeping the base LLM parameters unchanged. Three key benefits:
1. Retains language performance while adding image reasoning.
2. Efficiently updates just 0.04% of parameters.
3. Activates additional compute resources only when necessary.
The benchmarks are impressive. The vision models are competitive with top AI models like Claude 3 Haiku and GPT4o-mini in image recognition and visual understanding.
Vision Model benchmarks are very competitive (Source: Meta AI Blog)
Hereโs the link to the full collection of Llama 3.2 models: https://huggingface.co/collections/meta-llama/llama-32-66f448ffc8c32f949b04c8cf
Lightweight models
Meta released small, efficient models designed for local use on most hardware, including smartphones. Two key advantages:
Low latency on modest hardware.
Enhanced privacy by eliminating the need for off-device data transmission.
With that, Meta released two new versions of Llama 3.2 models (1B and 3B), which were developed using pruning and distillation techniques. Pruning reduced model size by systematically removing parts while retaining performance. Knowledge distillation involved using larger models (Llama 3.1 8B and 70B) to enhance the smaller modelsโ capabilities, resulting in efficient, high-performing versions that can fit on devices easily.
Pruning & Distillation schema used by Meta (Source: Meta AI Blog)
If you are interested in this technique to reduce the size of models, check out this paper from Nvidia: link
Paper from Nvidia about Pruning and Knowledge Distillation
Llama Guard to implement Safe AI
Llama Guard is a set of advanced moderation models designed to ensure the safe and responsible use of Llama language models. It helps developers detect various types of content violations by filtering both input prompts and output responses by content guidelines.
๐ช๐ต๐ฎ๐ ๐ฎ๐ฟ๐ฒ ๐๐ ๐๐๐ฎ๐ฟ๐ฑ๐ฟ๐ฎ๐ถ๐น๐?
AI guardrails remove potentially harmful content from foundation model output and input, such as hate speech, abuse, and profanity.
๐๐ผ๐ ๐ฑ๐ผ ๐๐๐ฎ๐ฟ๐ฑ๐ฟ๐ฎ๐ถ๐น๐ ๐๐ผ๐ฟ๐ธ?
Guardrail systems use AI to apply a classification task to foundation model input and output text, specifically identifying harmful content.
The sentence classifier, known as a HAP (hate, abuse, and profanity) detector or filter, is created by fine-tuning an LLM.
The classifier analyzes sentences by assessing each word, their relationships, and the context to flag harmful content and assigns a score indicating the likelihood of inappropriate content.
The new Meta Guard models introduced are:
Llama Guard 3 11B Vision: Designed to support Llama 3.2's image understanding capabilities by filtering text+image prompts and responses.
Llama Guard 3 1B: An optimized, pruned, and quantized version based on Llama 3.2 1B, reduced from 2,858 MB to 438 MB for efficient deployment in constrained environments.
Llama Guard integrates "Prompt Guard" to protect models from prompt-based attacks, like prompt injections and jailbreaking, which aim to exploit or bypass safety measures. Another new addition is "Code Shield," designed to filter and prevent the use of insecure or malicious code generated by LLMs, thereby ensuring safer use of code interpreters within Llama systems.
These safeguards enhance Llama modelsโ reliability and ensure safer integration across multiple use cases.
Meta also shared the Safety Dev Cycle to train their Llama models with all the steps to ensure models have proper evaluation and red teaming.
Why Open Source is the king
Last July, Marck Zuckerberg wrote the following blog: Open Source AI Is the Path. These are the main points why Open Source is set to shape the AI industry:
Accessibility and Flexibility: Open source AI allows customization, reduces vendor lock-in, and gives developers more control over their use and data privacy.
Security and Transparency: Open development ensures community scrutiny, making open source AI safer and more trustworthy than closed models.
Broad Industry Impact: Open source prevents monopolization, accelerates progress, and fosters collaboration, aiming to make AI accessible to all, much like Linux did for computing.
Open source AI is the key to unlocking innovation for everyone, ensuring that the power of technology remains in the hands of many, not just a privileged few.
My POV on Meta Developments
Meta Connect was about far more than just AI and Llama models. The event featured a new VR headset, an upgraded version of their popular Ray-Ban glasses now powered by Llama models, and even a futuristic prototype of Orion glasses with holographic capabilities. On the AI front, Meta is poised to disrupt the market, committing significant resources to bring cutting-edge technology to their products while generously making these innovations available to the open source community.
Zuckerbergโs ambition and leadership are evident. In the Bay Area, the competitive atmosphere between Meta, Apple, Google, and others is palpable. Zuckerberg knows that to win, Meta must shape the next computing platform and deliver the best services and experiences around it, including AI. Meta, built on the foundations of open source, understands the power of open collaboration for scaling talent and fostering rapid innovation.
The progress Meta is making with Llama is truly remarkable.
and thatโs it for today.
Sorry, I didnโt write for a whileโฆbut now, I am BACK!
Cheers,
Armand ๐ช
Whenever you're ready, learn AI with me:
The 15-day Generative AI course: Join my 15-day Generative AI email course, and learn with just 5 minutes a day. You'll receive concise daily lessons focused on practical business applications. It is perfect for quickly learning and applying core AI concepts. 25,000+ Business Professionals are already learning with it.
Reply