How to run LLAMA 3 on your local computer-: A STEP-BY-STEP GUIDE

Parth khunt

26 Apr 2024 • 2 min read

Meta developed and released the Meta Llama 3 family of large language models (LLMs), a collection of pretrained and instruction tuned generative text models in 8 and 70B sizes. The Llama 3 instruction tuned models are optimized for dialogue use cases and outperform many of the available open source chat models on common industry benchmarks. Further, in developing these models, we took great care to optimize helpfulness and safety.

Model developers -Meta

Input Models input -text only.

Output Models generate -text and code only.

Model Architecture Llama 3 is an auto-regressive language model that uses an optimized transformer architecture. The tuned versions use supervised fine-tuning (SFT) and reinforcement learning with human feedback (RLHF) to align with human preferences for helpfulness and safety.

What’s new with Llama-3?

Llama 3 brings significant enhancements over Llama 2, including a new tokenizer that increases the vocabulary size to 128,256 tokens (up from 32K tokens). This expanded vocabulary enhances text encoding efficiency, promoting stronger multilingual capabilities.

Moreover, Llama 3 models underwent extensive training on a diverse dataset comprising over 15 trillion tokens, approximately eight times more data than its predecessor. Specifically, Llama 3 Instruct, tailored for dialogue applications, was fine-tuned on a dataset of over 10 million human-annotated samples using a combination of techniques such as supervised fine-tuning, rejection sampling, proximal policy optimization, and direct policy optimization.

The Manual Method: Using Code-Based Integration to llama3.

To manually setup llama3 into local, you can follow the following steps:-

Step 1: Sign-in Procedures

Visit to huggingface.co

Create an account on HuggingFace
Request for llama model access (It may take a day to get access.
Go to below link and request llama access
Link: https://ai.meta.com/resources/models-and-libraries/llama-downloads/
As llama 3 is private repo, login by huggingface and generate a token.
Link: https://huggingface.co/settings/tokens

Step 2: Configuring Tokens Locally

        pip install huggingface_hub
        huggingface-cli login (enter your access token here)

Step 3: Install the necessary libraries

        pip install transformers
        pip install huggingfacehub
        pip install torch
        pip install accelerate

Step 4: Create a “touch run.py”

        import transformers
        import torch

        model_id = "meta-llama/Meta-Llama-3-8B"

        pipeline = transformers.pipeline("text-generation", 
        model=model_id,
        model_kwargs={"torch_dtype": torch.bfloat16},  
        device_map="auto")
        pipeline("Hey how are you doing today?")

Step 5 : Run python file

        python run.py

The Manual Method: Using Code-Based Integration to llama3.

Congratulations! You have successfully run llama3 on local machine.