Tuesday, 13 May 2025
  • My Feed
  • My Interests
  • My Saves
  • History
  • Blog
Subscribe
Capernaum
  • Finance
    • Cryptocurrency
    • Stock Market
    • Real Estate
  • Lifestyle
    • Travel
    • Fashion
    • Cook
  • Technology
    • AI
    • Data Science
    • Machine Learning
  • Health
    HealthShow More
    Skincare as You Age Infographic
    Skincare as You Age Infographic

    When I dove into the scientific research for my book How Not…

    By capernaum
    Treating Fatty Liver Disease with Diet 
    Treating Fatty Liver Disease with Diet 

    What are the three sources of liver fat in fatty liver disease,…

    By capernaum
    Bird Flu: Emergence, Dangers, and Preventive Measures

    In the United States in January 2025 alone, approximately 20 million commercially-raised…

    By capernaum
    Inhospitable Hospital Food 
    Inhospitable Hospital Food 

    What do hospitals have to say for themselves about serving meals that…

    By capernaum
    Gaming the System: Cardiologists, Heart Stents, and Upcoding 
    Gaming the System: Cardiologists, Heart Stents, and Upcoding 

    Cardiologists can criminally game the system by telling patients they have much…

    By capernaum
  • Sport
  • 🔥
  • Cryptocurrency
  • Data Science
  • Travel
  • Real Estate
  • AI
  • Technology
  • Machine Learning
  • Stock Market
  • Finance
  • Fashion
Font ResizerAa
CapernaumCapernaum
  • My Saves
  • My Interests
  • My Feed
  • History
  • Travel
  • Health
  • Technology
Search
  • Pages
    • Home
    • Blog Index
    • Contact Us
    • Search Page
    • 404 Page
  • Personalized
    • My Feed
    • My Saves
    • My Interests
    • History
  • Categories
    • Technology
    • Travel
    • Health
Have an existing account? Sign In
Follow US
© 2022 Foxiz News Network. Ruby Design Company. All Rights Reserved.
Home » Blog » A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 Sonnet
AIMachine LearningTechnology

A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 Sonnet

capernaum
Last updated: 2025-03-25 06:30
capernaum
Share
SHARE

Unlock the power of structured data extraction with LangChain and Claude 3.7 Sonnet, transforming raw text into actionable insights. This tutorial focuses on tracing LLM tool calling using LangSmith, enabling real-time debugging and performance monitoring of your extraction system. We utilize Pydantic schemas for precise data formatting and LangChain’s flexible prompting to guide Claude. Experience example-driven refinement, eliminating the need for complex training. This is a glimpse into LangSmith’s capabilities, showcasing how to build robust extraction pipelines for diverse applications, from document processing to automated data entry.

First, we need to install the necessary packages. We’ll use langchain-core and langchain_anthropic to interface with the Claude model.

Copy CodeCopiedUse a different Browser
!pip install --upgrade langchain-core
!pip install langchain_anthropic

If you’re using LangSmith for tracing and debugging, you can set up environment variables:

Copy CodeCopiedUse a different Browser
LANGSMITH_TRACING=True
LANGSMITH_ENDPOINT="https://api.smith.langchain.com"
LANGSMITH_API_KEY="Your API KEY"
LANGSMITH_PROJECT="extraction_api"

Next, we must define the schema for the information we want to extract. We’ll use Pydantic models to create a structured representation of a person.

Copy CodeCopiedUse a different Browser
from typing import Optional
from pydantic import BaseModel, Field


class Person(BaseModel):
    """Information about a person."""


    name: Optional[str] = Field(default=None, description="The name of the person")
    hair_color: Optional[str] = Field(
        default=None, description="The color of the person's hair if known"
    )
    height_in_meters: Optional[str] = Field(
        default=None, description="Height measured in meters"
    )

Now, we’ll define a prompt template that instructs Claude on how to perform the extraction task:

Copy CodeCopiedUse a different Browser
from langchain_core.prompts import ChatPromptTemplate, MessagesPlaceholder




prompt_template = ChatPromptTemplate.from_messages(
    [
        (
            "system",
            "You are an expert extraction algorithm. "
            "Only extract relevant information from the text. "
            "If you do not know the value of an attribute asked to extract, "
            "return null for the attribute's value.",
        ),


        ("human", "{text}"),
    ]
)

This template provides clear instructions to the model about its task and how to handle missing information.

Next, we’ll initialize the Claude model that will perform our information extraction:

Copy CodeCopiedUse a different Browser
import getpass
import os


if not os.environ.get("ANTHROPIC_API_KEY"):
    os.environ["ANTHROPIC_API_KEY"] = getpass.getpass("Enter API key for Anthropic: ")


from langchain.chat_models import init_chat_model


llm = init_chat_model("claude-3-7-sonnet-20250219", model_provider="anthropic")

Now, we’ll configure our LLM to return structured output according to our schema:

Copy CodeCopiedUse a different Browser
structured_llm = llm.with_structured_output(schema=Person)

This key step tells the model to format its responses according to our Person schema.

Let’s test our extraction system with a simple example:

Copy CodeCopiedUse a different Browser
text = "Alan Smith is 6 feet tall and has blond hair."
prompt = prompt_template.invoke({"text": text})
result = structured_llm.invoke(prompt)
print(result)

Now, Let’s try a more complex example:

Copy CodeCopiedUse a different Browser
from typing import List


class Data(BaseModel):
    """Container for extracted information about people."""
    people: List[Person] = Field(default_factory=list, description="List of people mentioned in the text")


structured_llm = llm.with_structured_output(schema=Data)


text = "My name is Jeff, my hair is black and I am 6 feet tall. Anna has the same color hair as me."
prompt = prompt_template.invoke({"text": text})
result = structured_llm.invoke(prompt)
print(result)




# Next example
text = "The solar system is large, (it was discovered by Nicolaus Copernicus), but earth has only 1 moon."
prompt = prompt_template.invoke({"text": text})
result = structured_llm.invoke(prompt)
print(result)

In conclusion, this tutorial demonstrates building a structured information extraction system with LangChain and Claude that transforms unstructured text into organized data about people. The approach uses Pydantic schemas, custom prompts, and example-driven improvement without requiring specialized training pipelines. The system’s power comes from its flexibility, domain adaptability, and utilization of advanced LLM reasoning capabilities.


Here is the Colab Notebook. Also, don’t forget to follow us on Twitter and join our Telegram Channel and LinkedIn Group. Don’t Forget to join our 85k+ ML SubReddit.

The post A Coding Implementation of Extracting Structured Data Using LangSmith, Pydantic, LangChain, and Claude 3.7 Sonnet appeared first on MarkTechPost.

Share This Article
Twitter Email Copy Link Print
Previous Article Mt. Gox Moves 11,500 Bitcoins Again, Major BTC Price Volatility Ahead? Mt. Gox Moves 11,500 Bitcoins Again, Major BTC Price Volatility Ahead?
Next Article Tariff Easing Fuels Altcoin Rally: Solana, DOGE, And ADA Shine While Bitcoin Stalls
Leave a comment

Leave a Reply Cancel reply

Your email address will not be published. Required fields are marked *

Your Trusted Source for Accurate and Timely Updates!

Our commitment to accuracy, impartiality, and delivering breaking news as it happens has earned us the trust of a vast audience. Using RSS feeds, we aggregate news from trusted sources to ensure real-time updates on the latest events and trends. Stay ahead with timely, curated information designed to keep you informed and engaged.
TwitterFollow
TelegramFollow
LinkedInFollow
- Advertisement -
Ad imageAd image

You Might Also Like

Implementing an LLM Agent with Tool Access Using MCP-Use
AI

Implementing an LLM Agent with Tool Access Using MCP-Use

By capernaum
Clean code vs. quick code: What matters most?
Data Science

Clean code vs. quick code: What matters most?

By capernaum
Will Cardano’s AI upgrade help continue its upward trend? 
Data Science

Will Cardano’s AI upgrade help continue its upward trend? 

By capernaum

Daily Habits of Top 1% Freelancers in Data Science

By capernaum
Capernaum
Facebook Twitter Youtube Rss Medium

Capernaum :  Your instant connection to breaking news & stories . Stay informed with real-time coverage across  AI ,Data Science , Finance, Fashion , Travel, Health. Your trusted source for 24/7 insights and updates.

© Capernaum 2024. All Rights Reserved.

CapernaumCapernaum
Welcome Back!

Sign in to your account

Lost your password?