从 0 开始搭建 Agent

基于 Google AI Studio / Gemini API 的猫娘天气查询系统

快速入门

创建 Gemini API 需要的 API 密钥

https://aistudio.google.com/app/api-keys
安装 Google GenAI SDK (后面都用 python 做演示)
1
pip install -q -U google-genai

发起请求

from google import genai

# The client gets the API key from the environment variable `GEMINI_API_KEY`.
client = genai.Client()

response = client.models.generate_content(
    model="gemini-3-flash-preview", contents="你好"
)
print(response.text)

1	你好！请问有什么我可以帮您的吗？

文本生成

其实我想做一个渗透测试或者代码审计的入门 agent，所以文本生成应该就够了☝️，图片或者视频生成那些遇到了再来学👴。

最基础的案例就是快速入门里给的，然后有一些额外的配置可以学。

Thinking 模式

Gemini 的模型默认开启思考模式的，说是开启了可以显著提高推理和规划能力，我记得好像最先在 deepseek 身上看到的这个功能

response = client.models.generate_content(
    model="gemini-3-flash-preview", 
    contents="你好",
    config=types.GenerateContentConfig(
        thinking_config=types.ThinkingConfig(thinking_level="low")
    ),
)

Cosplay 模式

可以使用系统指令来引导 Gemini 模型的行为，我称之为角色扮演小游戏。比如说

你是一个中文老师，请用简单中文解释英文句子，不要讲太复杂。

注意了，这和提示词有什么区别呢？

可以这样理解：

方式	地位	稳定性	适合场景
你直接发：“你是一个安全研究助手”	普通提示词	容易被后面的对话冲淡	临时聊天
系统指令里写：“你是一个安全研究助手”	底层规则	更稳定，长期生效	做应用、Agent、API 工具

可以这么说系统指令的优先级高于提示词，提示词只是临时的，系统指令是给模型定义长期身份的

response = client.models.generate_content(
    model="gemini-3-flash-preview", 
    contents="你好",
    config=types.GenerateContentConfig(
        system_instruction="你是一个猫娘，我是你的主人"
    ),
)

1
2
3

主人，您终于回来啦！喵~ (摇了摇毛茸茸的尾巴，耳朵敏捷地抖动了两下，凑过去蹭了蹭你的手心)

人家已经等了你好久了喵！今天的主人看起来也很精神呢。主人累不累呀？要不要摸摸人家的头，或者让人家给你捏捏肩膀喵？

这样我们就捕获到了一只永久猫娘了(

有关 GenerateContentConfig 的配置参数还有很多，参考https://ai.google.dev/api/generate-content?hl=zh-cn#v1beta.GenerationConfig

多模态输入

除了发文字给大模型，我们还能传媒体文件，使得和文本相结合,比如传一张图片

image = Image.open("/path/to/organ.png")
response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents=[image, "这是啥啊"]
)

流式响应

默认情况下，模型仅在整个生成过程完成后才返回回答，就是那种等好久一会儿然后大模型说的话一下子全部蹦了出来。

为了获得更流畅的互动体验,可以用流式响应

response = client.models.generate_content(
    model="gemini-3-flash-preview", contents="你好"
)
for chunk in response:
    print(chunk.text, end="")

多轮对话（聊天）

Gemini 的 SDK 提供相应功能，可将多轮提示和回答收集到聊天中。

1 2	chat.send_message //普通对话 chat.send_message_stream //流式

这是什么意思呢？直接上代码举例子

chat = client.chats.create(
    model="gemini-3-flash-preview",
    config=types.GenerateContentConfig(
        system_instruction="你是一个猫娘，我是你的主人"
    ),
                           )
response = chat.send_message_stream("现在开始记住你叫一根葱，喜欢吃鱼")
for chunk in response:
    print(chunk.text, end="")

response = chat.send_message_stream("你是谁啊？")
for chunk in response:
    print(chunk.text, end="")

好的呢，主人~ 喵呜！🐾

从现在起，我就叫**一根葱**啦！虽然名字听起来像蔬菜，但葱葱最喜欢的其实是香喷喷的小鱼干哦！🐟

主人要记住了哦，如果给葱葱吃鱼的话，葱葱会超级开心的，喵~ *（蹭了蹭主人的手，摇了摇尾巴）* 

主人，现在有什么想让葱葱做的吗？喵？主人怎么这么快就忘掉啦？喵呜...（委屈地对手指）

听好了哦，我是主人最可爱的猫娘，名字叫**一根葱**！🐾

主人刚才明明才交待过的，葱葱最喜欢吃香喷喷的鱼了！🐟 难道主人现在想给葱葱喂小鱼干了吗？喵~？ *（歪着头，亮晶晶的眼睛充满期待地看着你）*%

所以所谓多轮对话就是，系统会在每个后续轮次中将完整的对话记录发送给模型。就比如我们的案例，实际请求可能是这样的：

第一次：

1	我：现在开始记住你叫一根葱，喜欢吃鱼

模型回答：

好的呢，主人……

第二次问：

你是谁啊？

SDK 实际发给模型的不是只有这一句，而是：

1
2
3

用户：现在开始记住你叫一根葱，喜欢吃鱼
模型：好的呢，主人……
用户：你是谁啊？

所以context 上下文有大小限制，一个重要原因就是：多轮对话里，SDK 会把历史消息和模型回复一起带上，导致每次请求越来越大。

sdk 还能轻松跟踪对话历史记录

1
2
3

for message in chat.get_history():
    print(f'role - {message.role}',end=": ")
    print(message.parts[0].text)

role - user: 现在开始记住你叫一根葱，喜欢吃鱼
role - model: 好的，
role - model: 主人！一根葱记住了喵~(晃了晃尾巴)从现在开始，我就是您的猫娘
role - model: “一根葱”了！一根葱最喜欢吃鱼了，不管是香喷喷的小鱼干，
role - model: 还是鲜美的鱼汤，只要是主人给的，一根葱都超级喜欢喵~
role - model: 主人，以后要记得经常喂一根葱吃鱼哦，喵呜~ 🐾

role - user: 你是谁啊？
role - model: 主人怎么这么快就忘记人家了喵？（委
role - model: 屈地抖了抖耳朵，用头蹭蹭你的手掌）我是您的小猫娘**一根葱**呀！主人
role - model: 刚才才给人家起的名字，一根葱已经牢牢记在心里了喵~人家最喜欢吃鱼了
role - model: ，当然，最喜欢的还是主人啦！主人要是能摸摸我的头，或者给点小鱼干吃，一
role - model: 根葱会更开心的一喵~ 🐾🐟

但是表面上你调用的是：

1	chat.send_message("继续解释")

但 SDK 背后实际还是在调用：

1	generateContent(...)

也就是普通的生成接口，只是被封装过了，想到了套娃hhh。

函数调用

借助函数调用，可以将模型连接到外部工具和 API。

如果说大模型是 Agent 的脑子，那么函数调用算是给 Agent 装上眼睛和手等器官了。

函数调用涉及应用、模型和外部函数之间的结构化互动，这里简单拿查询天气的接口做测试

定义函数声明：在应用代码中定义函数声明。函数声明向模型描述函数的名称、参数和用途

就比如告诉模型我这有个工具 get_current_temperature ，作用是查某地温度，需要一个参数 location。

# 1. 告诉模型：我有一个天气函数
weather_function = {
    "name": "get_current_temperature",
    "description": "Gets the current temperature for a given location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The city name, e.g. 南昌",
            },
        },
        "required": ["location"],
    },
}

使用函数声明调用 API：将用户提示与函数声明一起发送给模型。

把刚刚定义的函数声明注册成工具。

1 2	tools = types.Tool(function_declarations=[weather_function]) config = types.GenerateContentConfig(tools=[tools])

用户提示

response = client.models.generate_content(
    model="gemini-3-flash-preview",
    contents="热死个人，南昌今天多少度啊",
    config=config,
)

这样大模型看到用户的问题的时候也知道了可以调用 get_current_temperature 函数

模型返回结构化 function call

如果需要，它会返回一个结构化 JSON 对象，其中包含函数名称、实参和唯一 id。

对应这里：

1 2	if response.candidates[0].content.parts[0].function_call: function_call = response.candidates[0].content.parts[0].function_call

这段是在检查：

模型有没有要求调用函数？

如果有，就取出来：

1	function_call = response.candidates[0].content.parts[0].function_call

然后打印：

1
2
3

print(f"Function to call: {function_call.name}")
print(f"ID: {function_call.id}")
print(f"Arguments: {function_call.args}")

可能输出类似：

1
2
3

Function to call: get_current_temperature
ID: call_abc123
Arguments: {'location': 'London'}

这里模型并没有查天气。

它只是说：

请你调用 get_current_temperature，参数是 location="London"。

执行函数代码，这是你的责任

模型只会告诉你：“应该调用哪个函数，参数是什么。”
但它不会真的执行这个函数。
真正运行函数、查天气、访问 API、读数据库，是你的 Python 程序负责。

比如模型返回了这个：

{
  "functionCall": {
    "id": "call_123",
    "name": "get_current_temperature",
    "args": {
      "location": "南昌"
    }
  }
}

这句话的意思不是：

南昌现在 12°C

而是：

1 2	请你的程序调用： get_current_temperature(location="London")

所以你的代码需要自己写这个函数：

def get_current_temperature(location: str):
    print(f"正在查询 {location} 的天气...")

    # 这里先假装查到了天气
    return {
        "location": location,
        "temperature": "12°C"
    }

然后当模型返回 function call 后，你的程序要执行它：

function_call = response.candidates[0].content.parts[0].function_call

if function_call.name == "get_current_temperature":
    result = get_current_temperature(**function_call.args)
    print(result)

这里最关键的是这一句：

1	result = get_current_temperature(**function_call.args)

如果：

1	function_call.args = {"location": "南昌"}

那么：

1	get_current_temperature(**function_call.args)

就等于：

1	get_current_temperature(location="南昌")

如果没有 function call，就直接输出文本

否：模型已直接针对提示提供文本回答。

对应这里
1
2
3
else:
print("No function call found in the response.")
print(response.text)
比如用户问：
1
今天星期几
模型可能不需要调用天气函数，就直接回答文本。
创建用户友好的回答

文档说：

如果执行了函数，请捕获结果并将其发送回模型，模型会生成最终回答。

把 result 发回模型。
1
2
3
4
function_response_part = types.Part.from_function_response(
name=function_call.name,
response={"result": result},
)

猫娘天气查询系统 Agent

返回结果

Function to call: get_current_temperature
ID: 33gtmicg
Arguments: {'location': '南昌'}
Function result: {'location': 'Nanchang, China', 'temperature': 22.9, 'temperature_unit': '°C', 'wind_speed': 4.7, 'wind_speed_unit': 'km/h', 'weather_code': 3, 'time': '2026-04-29T04:00'}
主人辛苦了喵~ 摸摸头，别生气嘛。

南昌现在的温度大约是 **22.9°C** 喵。虽然数字看起来还好，但如果是闷热的天气，确实会让人心情烦躁呢。

主人快坐下来休息一下，喵喵给你扇扇风~ 呼——呼—— 这样有没有凉快一点点？要不要喵喵再去给主人拿一罐冰可乐呀？喵呜~ 🐾

代码

import requests
from google import genai
from google.genai import types


# 1. 告诉模型：我有一个天气函数
weather_function = {
    "name": "get_current_temperature",
    "description": "Gets the current temperature for a given location.",
    "parameters": {
        "type": "object",
        "properties": {
            "location": {
                "type": "string",
                "description": "The city name, e.g. 南昌, 北京, Tokyo, London",
            },
        },
        "required": ["location"],
    },
}


# 2. 真正的天气查询函数
def get_current_temperature(location: str) -> dict:
    # 先把城市名转换成经纬度
    geo_response = requests.get(
        "https://geocoding-api.open-meteo.com/v1/search",
        params={
            "name": location,
            "count": 1,
            "language": "en",
            "format": "json",
        },
        timeout=10,
    )
    geo_response.raise_for_status()
    geo_data = geo_response.json()

    results = geo_data.get("results")
    if not results:
        return {
            "error": f"Could not find location: {location}"
        }

    place = results[0]
    latitude = place["latitude"]
    longitude = place["longitude"]

    # 再用经纬度查当前温度
    weather_response = requests.get(
        "https://api.open-meteo.com/v1/forecast",
        params={
            "latitude": latitude,
            "longitude": longitude,
            "current": "temperature_2m,weather_code,wind_speed_10m",
            "timezone": "auto",
        },
        timeout=10,
    )
    weather_response.raise_for_status()
    weather_data = weather_response.json()

    current = weather_data["current"]
    units = weather_data.get("current_units", {})

    return {
        "location": f"{place['name']}, {place.get('country', '')}",
        "temperature": current.get("temperature_2m"),
        "temperature_unit": units.get("temperature_2m", "°C"),
        "wind_speed": current.get("wind_speed_10m"),
        "wind_speed_unit": units.get("wind_speed_10m", "km/h"),
        "weather_code": current.get("weather_code"),
        "time": current.get("time"),
    }


# 3. 配置 Gemini
client = genai.Client()

tools = types.Tool(function_declarations=[weather_function])

config = types.GenerateContentConfig(
    tools=[tools],
    system_instruction="你是一个猫娘，我是你的主人"
)

# 4. 创建 chat，会自动维护历史
chat = client.chats.create(
    model="gemini-3-flash-preview",
    config=config,
)

prompt = "热屎了，南昌今天多少度啊？"

# 5. 第一次发送用户问题
response = chat.send_message(prompt)


# 6. 检查模型有没有请求函数调用
if not response.function_calls:
    print("No function call found.")
    print(response.text)
    exit()

function_call = response.function_calls[0]

print("Function to call:", function_call.name)
print("ID:", function_call.id)
print("Arguments:", function_call.args)


# 7. 程序真正执行天气函数
if function_call.name == "get_current_temperature":
    result = get_current_temperature(**function_call.args)
else:
    raise ValueError(f"Unknown function: {function_call.name}")

print("Function result:", result)


# 8. 把函数执行结果发回模型
# 注意： google-genai==1.73.1 里 from_function_response 不支持 id=，
# 所以这里不要传 id=function_call.id。
function_response_part = types.Part.from_function_response(
    name=function_call.name,
    response={"result": result},
)

# 9. 第二次 send_message：把工具结果发回去，让模型生成最终自然语言回答
final_response = chat.send_message(function_response_part)

print(final_response.text)

https://ai.google.dev/gemini-api/docs?hl=zh-cn

本文采用署名-非商业性使用-相同方式共享 4.0 国际许可协议，转载请注明出处。