8-K summaries with Pydantic AI

Pydantic is a fantastic framework for object modeling and validation in Python and it is rapidly becoming a core framework for most Python developer. Give it a try if you haven't already.

New and exciting though is that they have released Pydantic AI - their framework for coding AI in Python. Let's see how well it does when we use it with Edgar Tools to get company news.

The business idea

Imagine our business partner asked us to build a tool that they can just enter a ticker and get the latest news for that company. The business key concerns can be broken into the following:

Getting the latest company

We need to find the company then get the latest 8-K filing from SEC. For this we will use edgartools which is a powerful open source library for navigating SCC filings. More recently, we've added features that simplify the way to get the text off 8-K exhibits, which fits nicely in the program we will design and show later.

Writing a summary of the event

We need to take the exhibits attached to the latest 8-K filing and create a sharp summary.

Those two concerns are what we want from a business perspective, but we need to have the technical implementation to support this. PydanticAI gives us agents, data models, functions and tools that we can combine to implement our application. Let's first look at the agent.

The agent

The agent is the core actor in the framework - the one that actually represents the LLM doing the work. Here we are using OpenAI's GPT-40 model and we are prompting it to be a financial analyst that specializes in company news. We also provide it with a dependencies object, which allows us to give it data at runtime that it can use to get news, and produce as result a CompanyEvent object.

company_news_agent = Agent(
    'openai:gpt-4o',
    deps_type=CompanyNewsDependencies,
    result_type=CompanyEvent,
    system_prompt=(
        'You are a financial analyst agent.',
        'You specializing in company news analysis.'
        'You use SEC filings e.g. 8-K filings '
        'to extract company events.'
    ),
)

Dependencies

Dependencies are just components and/or data that you provide at runtime, and it could be anything specific to your use case. In our case, we need to provide a ticker and a way to get data from Edgar.

@dataclass
class CompanyNewsDependencies:
    ticker: str
    edgar: EdgarConn

class EdgarConn:

    @classmethod
    async def latest_8k(cls, *, ticker: str) -> str | None:
        c = Company(ticker)
        if not c:
            return None
        f = c.latest("8-K")
        texts = ""
        for exhibit in f.exhibits:
            d = Document.parse(exhibit.download())
            texts += repr(d)
            texts += "-" * 80
        return texts

EdgarConn is a simple wrapper over edgartools. It defines a function to get the text from the latest 8-K in a single markdown string.

The CompanyEvent object

The CompanyEvent object defines the schema for the data you want LLM to return. The agent will read the 8K text and create a company event by extracting that information into a defined schema.

class CompanyEvent(BaseModel):
    """
    A company event
    """
    ticker:str = Field("The stock ticker of the company e.g. AAPL")
    name: str = Field("The name of the company e.g. Apple Inc.")
    date: str = Field("The date of the company event e.g. January 14, 2024")
    event_description: str = Field("The description of the company event released in the 8-K filing")

Using a tool

To allow the agent to access Edgar we provide a tool that uses the EdgarConn class we created earlier. This is simply a function that is annotated @company_news_agent.tool which marks the function as a tool used by company use agent.

@company_news_agent.tool
async def latest_8k_filing(ctx: RunContext[CompanyNewsDependencies]) -> str:
    """Returns the latest 8-K for a company"""
    latest_8k = await ctx.deps.edgar.latest_8k(
        ticker=ctx.deps.ticker
    )
    return latest_8k

Running the script

To run the program now, we just call run sync and provide it with a basic prompt: "What is the latest news on the company?" since we provided it with dependencies, and dependencies include a ticker and the connection to Edgar, the agent will use this to get the AK filings using Edgar, and then prompt the LLM to create the CompanyNewsEvent object

deps = CompanyNewsDependencies(ticker="ORCL", edgar=EdgarConn())
result = company_news_agent.run_sync(
              'What is the latest news on the company?', 
              deps=deps)
print(result.data)

Results

The results are actually quite promising.

"Oracle Corporation held its 2024 Annual Meeting of Stockholders on November 14, 2024. The stockholders voted on several proposals including the election of directors, approval of executive compensation, ratification of Ernst & Young LLP as the independent registered public accounting firm for the fiscal year ending May 31, 2025, and a stockholder proposal regarding a report on climate risks to retirement plan beneficiaries, which was not approved."

It generated a short summary of the Oracle 8-K filing. Of course, the actual quality of the generation depends on the model being selected, but GPT-4.0 is a very good model. What you should do next is do evaluation on the model output for different queries for different tickers and figure out the ways that you can improve the generation

Conclusion

Pydantic AI Is the latest in a crowded field of AI agent frameworks, but it comes from a very good source (the Pydantic team). What is good about it is that it is built around what Pydantic is really good at, which are structured outputs and validations that you are going to need to use anyway in your AI applications. Combining it with edgartools gives you the simplest way to get 8-K summaries in your application.