Good Decisions: A Monthly Webinar for Enterprise AI Governance Insights

The Next Wave:
SLMs, Agentic AI, and the Future of Model Governance

Learn how forward-thinking AI leaders are aligning governance with innovation to unlock the full potential of SLMs and Agentic AI. Featuring insights from Jim Olsen, CTO of ModelOp, this session explores how enterprises are adopting modern governance strategies to manage expert models, ensure visibility across autonomous AI systems, and scale responsibly—without compromising speed or flexibility.

‍Register for the series.

SLMs and Agentic AI are redefining enterprise AI strategies—offering more efficient, specialized models and autonomous capabilities that can drive real business value. But with these advancements come new governance challenges. Is your organization prepared to manage the complexity, visibility, and oversight required to scale responsibly?

Join us for an insightful session led by Jim Olsen, CTO of ModelOp, as he breaks down the technical and operational implications of SLMs, model distillation, and Agentic AI architectures. Discover how modern governance strategies can empower your organization to embrace innovation while maintaining control, compliance, and long-term scalability.

What You’ll Learn:

SLMs vs. LLMs: How Small Language Models provide cost and privacy advantages over traditional LLMs—and why they’re better suited for agentic use cases.
The Role of Model Distillation: What distinguishes distillation from reinforcement learning, and how tools like DeepSeek and Gemma 3 enable lightweight expert agents.
Managing Agentic AI Architectures: How to govern large-scale model ensembles and expert models using tools like ModelOp’s Model Inventory and Approval Workflows.
Operationalizing Governance: How enterprises are applying continuous monitoring, traceability, and structured approvals to stay ahead of compliance and performance risk.

Download the slide deck.

‍

Transcript

1. Introduction to AI Governance and Agentic AI

Jay Combs: Welcome to the Good Decisions webinar for AI Governance insights. This month, we have Jim Olsen, the CTO here at ModelOp, and he's going to be talking about small language models, Agentic AI, and the future of model governance.

Without any further ado, I'll pass it over to Jim so he can dive right in.

Jim Olsen: Okay, everyone. Welcome to today’s session. As I said, everyone’s hearing about Agentic AI — it’s obviously a hot topic right now. But what does it actually mean for governance? And what’s changing within Agentic AI that's making it a more interesting solution and bringing it more and more to the forefront?

First, we're going to talk about a lot of the Agentic AI solutions you've seen so far, which are built on LLMs — and we’ll take a look at what's being done there, and some of the risks and challenges that come with that approach. But recently, we’ve seen an explosion in SLMs, or small language models. So we’ll talk a little bit about them and why they’re going to help change the landscape for Agentic AI.

2. The Rise of Small Language Models

Small language models really are a game changer. They're going to make it more attainable for companies — and more realistic for enterprises — to actually deploy and use Agentic AI. So we're going to get into that a bit.

Then we’ll look at a very simple AI architecture to understand some of the components and players within that space.

Next, we’ll move on to: now that we understand the models, what’s going into them, and how this space is changing, how do you actually represent them in a model inventory for governance? We'll look at that within a solution like ModelOp Center — and what it means to actually track those solutions as well.

So, we're going to go through all that today in a bit of detail.

Let’s get started.

3. Defining Agentic AI

Agentic AI is a term a lot of people are throwing around. Sometimes it's just buzz, sometimes it's reality — and sometimes it's something in between. A lot of people don’t yet have a clear understanding of exactly what Agentic AI can mean to them.

At its core, Agentic AI is the idea of a system that acts autonomously using various agents to perform tasks, make decisions, achieve specific goals, use tools, etc. In general, it’s goal-oriented. The idea is that it’s hands-off — a human doesn’t have to get involved to make those decisions. Although many people, as a safeguard, do want a human review process involved.

These systems also need to be able to analyze, plan, and take action as situations change — or as they acquire real-time data that dictates different behavior. So it's about adapting to dynamically changing conditions, maintaining situational awareness, and taking meaningful actions to achieve a desired outcome.

4. Human-Like Reasoning in AI Systems

The idea is that these systems have human-like reasoning to carry out tasks. Now, we all know there's a big difference between true human reasoning and AI, but we understand what a reasoning model is — how it can iterate over a process to make decisions, solve complex or novel problems, and do so by pulling together the right tools, information, agents, and so on.

In reality, Agentic AI needs to use expert models inside of its agents to accomplish tasks. By "expert models," I mean ones that have knowledge about specific tools, products, interfaces, data structures — things like that — in order to achieve a goal.

A simple example would be an agent that talks to a REST service, pulls out information, and selects the pieces it needs to complete a task. That’s an example of an expert model. So when we get into Agentic AI, we can start to talk about the different models and the roles they play within a solution.

A lot of today’s solutions use — for lack of a better term — a brute force approach, relying on a large language model as the expert model.

5. Challenges with Large Language Models

These models need to be experts in their domain. That means the LLM must contain all the knowledge required for every task it’s expected to perform. You’re not going to deploy twenty agents with twenty different LLMs. Technically, you could — but it would be extremely cost-prohibitive. They require a lot of resources, and you're not going to have a large number of them deployed.

So your best approach would be to put all your knowledge into a single LLM — or maybe a couple of different ones — that can service multiple agents.

Typically, these models are not even self-hosted. They can be, if you have the hardware, but again, it’s cost-prohibitive. If you can’t self-host, then you’re potentially exporting critical data outside your enterprise walls — and that’s a big concern.

Foundational models are often capable of handling many tasks out of the box. They know how to process JSON, talk to REST services, and understand basic concepts — like what a flight is, for example. So they may be up to the task for general use.

But if you have a specialized area or very specific task in mind, they may not have that knowledge. They may not be able to solve certain math problems, or perform certain operations if the relevant information wasn’t included in the training data — or if the task is novel enough. In those cases, they might fail completely, or worse, just make something up.

That’s obviously a serious issue when we’re talking about autonomous systems.

So what can you do if you’re using LLMs and want to create a specialized, all-encompassing model for your business?

6. Reinforcement Learning for Specialized Models

If you're heading down the path of using LLMs and need a specialized model, you'd need to use reinforcement learning to turn the foundational model into an expert model.

To do this, you'd start with the foundational model and then apply reinforcement learning. That process involves creating a reward model that encourages correct answers within specific domains, and a loss model that penalizes incorrect answers by determining how far off they were and adjusting the reward accordingly.

These reward and loss models work together to enrich the foundational model with domain-specific knowledge.

But this process isn't cheap. It requires a lot of training, significant resources, and plenty of GPUs. Once the augmented foundational model is trained, you still have to deploy it to your infrastructure — which means ensuring you have the hardware and capacity to serve all your running agents.

That could represent a massive workload, especially as the number of agents collaborating within a solution increases. It can really slow things down.

7. The Cost of AI Model Training

A lot of people entering the Agentic AI space are just starting with foundational models without any further tuning. But that often means you're sending your data outside of your enterprise — into a vendor's ecosystem. Is that safe? Is that what you want to do?

We’ve already talked about the challenges with self-hosting. I don’t know if you've checked GPU prices lately, but they’re not cheap. It’s expensive to get the right hardware — even if you can find it, because there are shortages.

And it’s going to take time to train these models.

What’s been really interesting lately — and we’re seeing more and more of them — is the emergence of small language models (SLMs). These are designed with fewer parameters and lighter computational requirements. Some can even run on a simple desktop GPU.

They’re great for edge cases or areas where you don’t need a deep knowledge base. For example, I’ve personally been experimenting with DeepSeek-V2 and their distilled version of Qwen. It’s pretty impressive — I can run a quantized 4-bit version of the 7-billion parameter model in just 4GB of GPU memory on a standard off-the-shelf GPU. It performs well locally.

I’ve even pushed it down to the 1.5-billion-token model. It still gives reasonable responses — answering questions about model governance — and it responds in about five seconds on a low-powered GPU.

That’s pretty impressive. It starts to show that we can deploy these things locally and realistically.

8. Advantages of Distillation Techniques

Once you have these smaller models running, you can further train them on specific tasks so they can operate on very reasonable, affordable hardware. That also means you can avoid the entire problem of shipping data off-site.

So what really changed here?

Distillation has actually been around for a while. It’s a traditional technique in machine learning. One of the most downloaded models out there is a distilled version of SBERT — and it works better than the original. It's faster and requires less computation.

Here’s how it works: you use a foundational model to “teach” a smaller model only the areas of expertise you want. The foundational model acts as the teacher, and the small model is the student. Through a series of questions and answers, you distill the relevant knowledge into the student model.

9. Creating Specialized Models through Distillation

Through distillation, we can condense knowledge from a larger model into a smaller one — but only the parts we actually need. Instead of pulling in everything under the sun, we extract just the relevant areas of expertise for our use case.

The challenge, of course, is that the larger foundational model must already contain the knowledge you want to distill. You can’t distill from nothing. But as a technique, it’s far more cost-effective than training a massive model from scratch.

Some of the distillation models I’ve seen — like DeepSeek-V2’s — reportedly cost only a few thousand dollars to train, compared to the massive GPU resources required to train foundational models.

There is a tradeoff. Compression is lossy, so you might lose some information or oversimplify certain tasks. But it opens up a realistic, cost-effective path to building expert models.

10. Deploying Small Language Models in Enterprises

Now that we can run these SLMs and create specialized agents, we can realistically take advantage of Agentic AI in the enterprise. We can deploy and run these models locally, without sending PII data out to a vendor and hoping it won’t be used elsewhere.

So what does a typical Agentic AI architecture look like?

Here’s a very, very simple example — just to illustrate the concept. Let's say we’re building an automated customer support system. It brings together account information about the customer, tickets they’ve opened (past and present), and product-specific knowledge.

We’d want agents that can pass information back and forth and respond based on what the customer has done, what’s been tried before, and product-specific details — as well as who the customer is, where they are, and so on.

To do that, we’d have a series of agents:

An account management agent that knows how to talk to the account system and pull relevant client information.
An ITSM agent that interfaces with the ticketing system, pulling ticket history — open, closed, resolved issues — specific to that customer.
A specialized, distilled product model trained on everything about the product: what it does, how it works, etc.

One way to build that third model is by using a foundational model augmented with a RAG (retrieval-augmented generation) architecture. That augmented model then acts as the teacher for distilling the specialized product model — without needing full reinforcement learning on an LLM.

Then there’s a supervisor agent, which follows a LangGraph-style orchestration pattern. It pulls together inputs from all the agents and coordinates the final response.

Finally, we’d include RAILS output validation to guard against hallucinations, PII disclosure, profanity, toxicity — all the things we definitely don’t want going out to customers.

11. Real-World Implementation Challenges

So, we’ve now got all these components making up a single solution. But in the real world, this starts to get complicated.

That example was simple — imagine a real deployment with twenty or thirty agents. How do you know what’s going on in your system? How do you know what was deployed when, and how it was approved?

That’s why we need to track Agentic AI — because now we have all these different expert models in use. And those expert models might be shared across multiple solutions. For example, your ticketing agent might be used in five, six, seven, or even more solutions.

There’s no reason not to reuse expert models, but that means we need to track their performance independently — not just in the context of one use case, but across all use cases.

Are they leaking PII? Are they performing as expected?

At the same time, we also want to track how each solution is performing as a whole. Because even if the agent is solid in general, a particular use case could expose a flaw — or maybe it just doesn’t interact well with the other agents in that implementation.

You need visibility into all of that.

So you need governance: tracking which agents are in use, their versions, how they were approved, what distillation techniques were used, and so on.

And if you’re in a regulated industry, like healthcare, you’ve got additional responsibilities. Certain states require you to disclose when AI is helping make decisions — especially decisions about a patient’s care.

So you need a solution that can pull all of this together.

12. Expectations for AI Agents in Healthcare

Jay Combs: Hey Jim — just on that healthcare note, sorry to interrupt you. We actually got a question from a healthcare provider on this topic. As you're setting up these examples of AI agents and walking through the architecture, is the expectation that these agents behave and perform like trained subject matter experts? Are they meant to replace an SME, or are they meant to work alongside one? Just curious what that expected level of functionality is.

13. Autonomy vs. Supervision in AI Agents

Jim Olsen: That really depends on whether we’re talking about truly supervised agent solutions or unsupervised ones.

The so-called "holy grail" of Agentic AI is autonomous agents — ones that act without supervision and solve problems end-to-end. But I don't think that’s realistic right now for many use cases, especially in something like healthcare.

In a case like that, you'd expect the agents to gather a bunch of information and make recommendations. Then, a person — a human in the loop — would review that information before taking action.

It’s a bit like using code suggestions from tools like Cursor or GitHub Copilot. You don't just accept the answer as-is. You review it, you make edits, and then you decide how to respond.

So yes, in a healthcare deployment, I would absolutely expect to have a human subject matter expert (SME) reviewing those responses. But the agents can still help make that SME a “10x-er” by giving them all the background and context they need quickly.

Jay Combs: Awesome, thank you — that's perfect.

14. Implementation of AI Agents

Jim Olsen: No problem. So when we talk about representing these agents in a system, we need to describe how they’re being used in the context of an actual use case. That’s important because we’re often reusing individual agents across multiple implementations.

The use case — for example, automated customer support — describes the overall solution. We’ll track information about how we collect data samples, meet regulatory requirements, and handle compliance.

Each use case is then tied to one or more implementations. And each implementation is made up of a series of agents, LangGraph orchestration code, and RAILS configurations.

15. Model Ensemble in AI Solutions

To describe the implementation itself, we use something called a model ensemble — basically, a “model of models.” It tracks one or more models that work together in a solution.

This concept isn’t unique to Agentic AI. Many AI solutions involve multiple models — transformation models, embedding models, all sorts of components that play together to form the final output.

By tying an ensemble to a specific use case, we can say, “This use case uses this ensemble,” and the ensemble pulls together all the pieces of the implementation.

Any one of those pieces — like an agent — can also be shared across multiple ensembles or implementations. So we want to track them individually. For example, our RAILS validation files are another component we need to monitor: are they working? Are they stopping toxicity or profanity like they should?

We also want to track the supervisor code — the LangGraph orchestration that’s specific to the use case — because that’s what pulls everything together.

By breaking things down in an ensemble, we can track both the shared components and the unique elements of each use case.

16. Tracking Performance of AI Solutions

With this setup, we’re able to track the performance of the entire solution — not just the individual models.

We can monitor things like gibberish outputs, user feedback (thumbs up or down), sentiment, toxicity, and PII exposure — all specific to the use case.

But we can also track the individual agents themselves. For instance, maybe we’ve taken DeepSeek-V2’s distilled Qwen and fine-tuned it specifically for account management or ticket handling. We can assess how those outputs are performing across all the solutions using that agent.

That gives us visibility into both sides:

How the agent is doing globally, across all solutions
How it’s performing within a specific use case

You want a product that gives you both views — because that’s what’s needed to govern these systems effectively.

17. Understanding Model Lineage

Jay Combs: Jim, quick terminology check — when you use the term model lineage, are you referring just to the process of tracking agents, or is there more to it?

Jim Olsen: It’s more comprehensive than just tracking agents. Model lineage includes everything that went into the model — the data, the number of snapshots, version history, any code associated with the model, and more.

It’s the full picture of what makes up that solution.

As we start deploying more and more models — maybe twenty different distilled models across multiple agent-based solutions — we absolutely need to automate wherever we can. One of the key areas for automation is risk determination.

For example, our product can ask specific questions about a model. Based on the data it uses or other attributes, it can automatically detect certain risks. Say it’s going to be deployed in the EU — we can check whether it has the required EU compliance evidence.

Or we might flag if it hasn’t yet gone through complexity review.

As soon as the conditions are met — say, the required file gets uploaded — the system marks the risk as completed. It doesn’t disappear, but it’s now tracked and closed. We log that we resolved it, and how.

You don’t have to jump through hoops or check ten different systems. Meet the condition, and the risk is automatically closed — whether that's triggered by a Jira approval, an uploaded file, or a status change in our own platform.

It’s all about scale.

18. Scaling Agentic AI Solutions

As we move into the world of Agentic AI, scalability becomes critical.

Automation is the key to successfully managing multiple Agentic AI solutions. These systems introduce an explosion of business-critical models within your enterprise — models that interact dynamically in unpredictable ways.

You need to know:

Which solutions are using which agents
Who’s using what — and whether they should be using it
Whether a particular agent is approved for a specific use case

Someone could grab an agent and start using it without proper approval — and if it’s not compliant with the use case, that creates risk. Can you detect that? Can you prevent it?

19. Risk

You must understand not only how each solution is performing, but also how the individual agents are performing across all the solutions they’re part of.

Have you conducted a thorough analysis of what these agents are doing? Have you assessed the risks?

Risk management needs to be tracked — not just for regulatory compliance, but also for your own understanding of business risk. That includes:

Knowing which models are in use
Understanding what they’re doing
Ensuring they’ve been reviewed and approved appropriately

That’s where ModelOp Center stands out. We’ve designed it specifically for model governance — from the ground up. It’s not a repurposed data science workbench. It’s model-agnostic and captures all the relevant information — regardless of the type of AI being used.

It can handle everything from complex neural networks to something as simple as an Excel spreadsheet model.

20. Conclusion and Future Engagement

Jay Combs: Alright — apologies for the technical glitch there at the end. Jim was wrapping up by emphasizing how ModelOp treats all types of AI the same way: by abstracting the models from the data science tools, so they can all be governed and compared on equal footing.

If you want to talk more about how your organization is using Agentic AI — or if you’re just curious about how to properly scale and govern it — we’d love to hear from you. Please reach out.

We’ll be sending out the presentation and the recording by email, and we’ll also post it on our website so you can access all the materials. We always make that available after the webinar.

Thanks for joining us this month. We’ll announce our next session soon.

And as for Skip McCormick’s comment — “Was that really Jim, or was it an Agentic AI model?”

Well, Skip… we’ll let you decide. I guess it’s just a black box experiment.

Thanks again, everyone. Have a great rest of your week. Talk soon!

‍

Get the Latest News in Your Inbox

Share this post

Get started with ModelOp’s AI Governance software — automated visibility, controls, and reporting — in 90 days

Talk to an expert about your AI and governance needs

The Next Wave: SLMs, Agentic AI, and the Future of Model Governance