Breaking the Glass: From No-Code Prototype to Production Python
By Ben White | Part 3 of the "Agentic Architect" Series
In Part 2, we built "TechScout," a research agent that uses Google Search Grounding to find the latest AI trends. We built it in 15 minutes using the Vertex AI Agent Builder console. It was fast, magical, and entirely visual.
But as every senior developer knows, visual tools have a "Glass Ceiling."
Eventually, you want to:
Integrate this agent into your existing Django or Node.js backend.
Version control your prompts (GitOps).
Add complex logic, like "If the news is critical, send a Slack message."
Today, we break the glass. We are going to take the logic we prototyped in the console and port it to the Vertex AI SDK for Python. We are trading the drag-and-drop interface for raw, production-ready code.
The Stack: What We Are Using
We aren't using LangChain today. We are going "Metal-to-Model" using Google's native library. This ensures the lowest latency and the best compatibility with Gemini's features.
SDK: google-cloud-aiplatform
Model: gemini-2.5-flash (or gemini-3.0 depending on availability)
Key Feature: Grounding Service
Step 1: The Setup
First, we need to replicate the environment. In the console, this was just clicking "Create." In Python, we need to initialize the Vertex AI platform.
Pro-Tip: The Gemini 2.5 API Shift If you look at older tutorials from 2024 or 2025, you will see developers using vertexai.generative_models and Tool.from_google_search_retrieval(). If you try to run that with the new gemini-2.5-flash model, you will get a 400 error. Google has unified its ecosystem under the new google-genai SDK. Below is the modern, 2026-compliant way to establish a grounded connection.
from google import genai
from google.genai import types
import datetime
# 1. Initialize the unified GenAI Client for Vertex AI
# This replaces the old vertexai.init()
client = genai.Client( vertexai=True, project="your-gcp-project-id", location="us-central1" )
Step 2: Porting the Brain (System Instructions)
Remember the "Goal" we wrote in the console? In the SDK, this becomes the system_instruction.
The beauty of code is that we can make this dynamic. In the console, the prompt is static. In Python, I can inject variables—like the current date—so the model always knows "today" relative to the research.
# 2. Define the "Truth" (Grounding Tool)
# The old 'from_google_search_retrieval' is deprecated.
# We now simply define a Tool with GoogleSearch.
tool = types.Tool( google_search=types.GoogleSearch() )
# 3. Dynamic Prompting
current_date = datetime.date.today()
system_prompt = f"""
You are TechScout, an expert researcher.
Today's date is {current_date}.
Your goal is to find emerging AI trends.
You MUST cite your sources using the Grounding tool provided. """
# 4. Configuration
# In the new SDK, tools and system prompts are bound in a Config object
config = types.GenerateContentConfig( system_instruction=system_prompt, tools=[tool], temperature=0.0 )
Step 3: The Execution Loop
Now, we interact. In the console, this was the "Preview" pane. In Python, we start a chat session.
Crucially, because we initialized the model with tools=[tool], we don't need to add any special flags to the send_message call. The model is already "grounded."
# 5. Start the chat session chat = client.chats.create( model="gemini-2.5-flash", config=config )
response = chat.send_message( "What are the top 3 Generative AI releases from Google this week?" )
# Printing the response text print(response.text)
# Accessing the Citations
# The SDK returns grounding metadata separately from the text.
grounding_meta = response.candidates[0].grounding_metadata
if grounding_meta and grounding_meta.search_entry_point: print("\n--- Sources ---") print(grounding_meta.search_entry_point.rendered_content)
The "Pro" Upgrade: Why We Did This
You might ask, "This does the same thing as the console. Why write the code?"
Because now that it is a Python object, I can integrate it into the rest of my engineering ecosystem.
Structured Output: I can force the model to return JSON instead of text, making it ready for a frontend UI.
Logic Loops: I can add a Python if statement:
if "security vulnerability" in response.text.lower():
send_slack_alert(response.text)
Observability: I can wrap this call in OpenTelemetry spans to track cost and latency in my own dashboards.
Conclusion: The Best of Both Worlds
We haven't abandoned the ease of Vertex AI; we've just taken off the training wheels. We kept the powerful Grounding feature (which is usually hard to build) but gained the flexibility of Python.
Now we have a script running locally on our machine. But it's stuck on our laptop.
In the next phase of this series, we tackle the "Last Mile." How do we get this Python script off my laptop and into a mobile app?
Next Up: The Deployment Gap: Shipping AI Features with Firebase Genkit.
