Starter Guide For Running Large Language Models LLMs

Running large language models (LLMs) presents significant challenges due to their hardware demands, but numerous options exist to make these powerful tools accessible. Today’s landscape offers several approaches – from consuming models through APIs provided by major players like OpenAI and Anthropic, to deploying open-source alternatives via platforms such as Hugging Face and Ollama. Whether you’re interfacing with models remotely or running them locally, understanding key techniques like prompt engineering and output structuring can substantially improve performance for your specific applications. This article explores the practical aspects of implementing LLMs, providing developers with the knowledge to navigate hardware constraints, select appropriate deployment methods, and optimize model outputs through proven techniques.

1. Using LLM APIs: A Quick Introduction

LLM APIs offer a straightforward way to access powerful language models without managing infrastructure. These services handle the complex computational requirements, allowing developers to focus on implementation. In this tutorial, we will understand the implementation of these LLMs using examples to make their high-level potential in a more direct and product-oriented way. To keep this tutorial concise, we have limited ourselves to closed source models only for the implementation part and in the end, we have added a high-level overview of open source models.

2. Implementing Closed Source LLMs: API-Based Solutions

Closed source LLMs offer powerful capabilities through straightforward API interfaces, requiring minimal infrastructure while delivering state-of-the-art performance. These models, maintained by companies like OpenAI, Anthropic, and Google, provide developers with production-ready intelligence accessible through simple API calls.

2.1 Let’s explore how to use one of the most accessible closed-source APIs, Anthropic’s API.

Copy Code

# First, install the Anthropic Python library
!pip install anthropic
import anthropic
import os
client = anthropic.Anthropic(
   api_key=os.environ.get("YOUR_API_KEY"),  # Store your API key as an environment variable
)

2.1.1 Application: In Context Question Answering Bot for User Guides

Copy Code

import anthropic
import os
from typing import Dict, List, Optional


class ClaudeDocumentQA:
   """
   An agent that uses Claude to answer questions based strictly on the content
   of a provided document.
   """


   def __init__(self, api_key: Optional[str] = None):
       """Initialize the Claude client with API key."""
       self.client = anthropic.Anthropic(
           api_key="YOUR_API_KEY",
       )
       # Updated to use the correct model string format
       self.model = "claude-3-7-sonnet-20250219"


   def process_question(self, document: str, question: str) -> str:
       """
       Process a user question based on document context.


       Args:
           document: The text document to use as context
           question: The user's question about the document


       Returns:
           Claude's response answering the question based on the document
       """
       # Create a system prompt that instructs Claude to only use the provided document
       system_prompt = """
       You are a helpful assistant that answers questions based ONLY on the information
       provided in the DOCUMENT below. If the answer cannot be found in the document,
       say "I cannot find information about this in the provided document."
       Do not use any prior knowledge outside of what's explicitly stated in the document.
       """


       # Construct the user message with document and question
       user_message = f"""
       DOCUMENT:
       {document}


       QUESTION:
       {question}


       Answer the question using only information from the DOCUMENT above. If the information
       isn't in the document, say so clearly.
       """


       try:
           # Send request to Claude
           response = self.client.messages.create(
               model=self.model,
               max_tokens=1000,
               temperature=0.0,  # Low temperature for factual responses
               system=system_prompt,
               messages=[
                   {"role": "user", "content": user_message}
               ]
           )


           return response.content[0].text
       except Exception as e:
           # Better error handling with details
           return f"Error processing request: {str(e)}"


   def batch_process(self, document: str, questions: List[str]) -> Dict[str, str]:
       """
       Process multiple questions about the same document.


       Args:
           document: The text document to use as context
           questions: List of questions to answer


       Returns:
           Dictionary mapping questions to answers
       """
       results = {}
       for question in questions:
           results = self.process_question(document, question)
       return results

Copy Code

### Test Code
if __name__ == "__main__":
   # Sample document (an instruction manual excerpt)
   sample_document = """
   QUICKSTART GUIDE: MODEL X3000 COFFEE MAKER


   SETUP INSTRUCTIONS:
   1. Unpack the coffee maker and remove all packaging materials.
   2. Rinse the water reservoir and fill with fresh, cold water up to the MAX line.
   3. Insert the gold-tone filter into the filter basket.
   4. Add ground coffee (1 tbsp per cup recommended).
   5. Close the lid and ensure the carafe is properly positioned on the warming plate.
   6. Plug in the coffee maker and press the POWER button.
   7. Press the BREW button to start brewing.


   FEATURES:
   - Programmable timer: Set up to 24 hours in advance
   - Strength control: Choose between Regular, Strong, and Bold
   - Auto-shutoff: Machine turns off automatically after 2 hours
   - Pause and serve: Remove carafe during brewing for up to 30 seconds


   CLEANING:
   - Daily: Rinse removable parts with warm water
   - Weekly: Clean carafe and filter basket with mild detergent
   - Monthly: Run a descaling cycle using white vinegar solution (1:2 vinegar to water)


   TROUBLESHOOTING:
   - Coffee not brewing: Check water reservoir and power connection
   - Weak coffee: Use STRONG setting or add more coffee grounds
   - Overflow: Ensure filter is properly seated and use correct amount of coffee
   - Error E01: Contact customer service for heating element replacement
   """


   # Sample questions
   sample_questions = [
       "How much coffee should I use per cup?",
       "How do I clean the coffee maker?",
       "What does error code E02 mean?",
       "What is the auto-shutoff time?",
       "How long can I remove the carafe during brewing?"
   ]


   # Create and use the agent
   agent = ClaudeDocumentQA()






   # Process a single question
   print("=== Single Question ===")
   answer = agent.process_question(sample_document, sample_questions[0])
   print(f"Q: {sample_questions[0]}")
   print(f"A: {answer}n")


   # Process multiple questions
   print("=== Batch Processing ===")
   results = agent.batch_process(sample_document, sample_questions)
   for question, answer in results.items():
       print(f"Q: {question}")
       print(f"A: {answer}n")

Output from the model

Claude Document Q&A: A Specialized LLM Application

This Claude Document Q&A agent demonstrates a practical implementation of LLM APIs for context-aware question answering. This application uses Anthropic’s Claude API to create a system that strictly grounds its responses in provided document content – an essential capability for many enterprise use cases.

The agent works by wrapping Claude’s powerful language capabilities in a specialised framework that:

Takes a reference document and user question as inputs
Structures the prompt to delineate between document context and query
Uses system instructions to constrain Claude to only use information present in the document
Provides explicit handling for information not found in the document
Supports both individual and batch question processing

This approach is particularly valuable for scenarios requiring high-fidelity responses tied to specific content, such as customer support automation, legal document analysis, technical documentation retrieval, or educational applications. The implementation demonstrates how careful prompt engineering and system design can transform a general-purpose LLM into a specialised tool for domain-specific applications.

By combining straightforward API integration with thoughtful constraints on the model’s behavior, this example showcases how developers can build reliable, context-aware AI applications without requiring expensive fine-tuning or complex infrastructure.

Note: This is just a basic implementation of document question answering, we have not delved deeper into the real complexities of domain-specific things.

3. Implementing Open Source LLMs: Local Deployment and Adaptability

Open source LLMs offer flexible and customizable alternatives to closed-source options, allowing developers to deploy models on their own infrastructure with complete control over implementation details. These models, from organizations like Meta (LLaMA), Mistral AI, and various research institutions, provide a balance of performance and accessibility for diverse deployment scenarios.

Open source LLM implementations are characterized by:

Local Deployment: Models can run on personal hardware or self-managed cloud infrastructure
Customization Options: Ability to fine-tune, quantize, or modify models for specific needs
Resource Scaling: Performance can be adjusted based on available computational resources
Privacy Preservation: Data remains within controlled environments without external API calls
Cost Structure: One-time computational cost rather than per-token pricing

Major open source model families include:

LLaMA/Llama-2: Meta’s powerful foundation models with commercial-friendly licensing
Mistral: Efficient models with strong performance despite smaller parameter counts
Falcon: Training-efficient models with competitive performance from TII
Pythia: Research-oriented models with extensive documentation of training methodology

These models can be deployed through frameworks like Hugging Face Transformers, llama.cpp, or Ollama, which provide abstractions to simplify implementation while retaining the benefits of local control. While typically requiring more technical setup than API-based alternatives, open source LLMs offer advantages in cost management for high-volume applications, data privacy, and customization potential for domain-specific needs.

Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 80k+ ML SubReddit.