大模型应用开发初探 : 通用函数调用Planner

news/2024/9/20 8:32:19

大家好，我是Edison。

上一篇，我们了解了什么是AI Agent以及如何用Semantic Kernel手搓一个AI Agent。有朋友留言说，自动函数调用对大模型有较高的要求，比如Azure OpenAI、智谱AI等这些收费的大模型产品就能很好地规划和处理函数调用，而像是一些开源的小参数量的模型例如qwen2-7b-instruct这种可能效果就不太好。刚好，之前在网上看到一位大佬的开源通用函数调用的方案，于是重构了一下上一篇的Agent应用。

UniversalFunctionCaller

这个项目是一个封装了大模型对话的入口，有点类似我们在ASP.NET中写的Filter，在处理某个真正的请求时，给其设置一些横切面，例如PreHandle,PostHandle之类的方法供用户做自定义处理，最终完成所谓的AoP（面向横切面编程）的效果。这个项目做的事儿其实也就是封装了横切面，在真正将prompt发给LLM前，它会读取一些自定义的优点类似于预训练的prompt来对用户的prompt进行“增强“。例如，下面这个方法 GetAskFromHistory 就会来设定一个函数调用的背景以及给出一些预置的训练提示词供大模型理解，妥妥的一个手动增强版提示词工程：

public class UniversalFunctionCaller
{......public async Task<string> RunAsync(ChatHistory askHistory){var ask = await GetAskFromHistory(askHistory);return await RunAsync(ask);}private async Task<string> GetAskFromHistory(ChatHistory askHistory){var sb = new StringBuilder();var userAndAssistantMessages = askHistory.Where(h => h.Role == AuthorRole.Assistant || h.Role == AuthorRole.User);foreach (var message in userAndAssistantMessages)sb.AppendLine($"{message.Role.ToString()}: {message.Content}");var extractAskFromHistoryPrompt = $@"阅读这段用户与助手之间的对话。 总结用户在最后一句话中希望助手做什么##对话开始##{sb.ToString()}##对话结束##";var extractAskResult = await _chatCompletion.GetChatMessageContentAsync(extractAskFromHistoryPrompt);var ask = extractAskResult.Content;return ask;}

　　......

然后，它会初始化一个ChatHistory，提供一些示范性的对话，让大模型知道是否该进行函数调用以及如何调用：

private ChatHistory InitializeChatHistory(string ask)
{var history = new ChatHistory();history.Add(new ChatMessageContent(AuthorRole.User, "New task: 启动飞船"));history.Add(new ChatMessageContent(AuthorRole.Assistant, "GetMySpaceshipName()"));history.Add(new ChatMessageContent(AuthorRole.User, "长征七号"));history.Add(new ChatMessageContent(AuthorRole.Assistant, "StartSpaceship(ship_name: \"长征七号\")"));history.Add(new ChatMessageContent(AuthorRole.User, "飞船启动"));history.Add(new ChatMessageContent(AuthorRole.Assistant, "Finished(finalmessage: \"'长征七号'飞船启动 \")"));history.Add(new ChatMessageContent(AuthorRole.User, $"New task: {ask}"));return history;
}

而示范用的函数则将其封装到了一个预置的Plugin，我们暂且叫它 PreTrainingPlugin，它是一个internal访问的class，只用于对prompt进行增强即给出示例：

internal class PreTrainingPlugin
{[KernelFunction, Description("当工作流程完成，没有更多的函数需要调用时，调用这个函数")]public string Finished([Description("总结已完成的工作和结果，尽量简洁明了。")] string finalmessage){return string.Empty;//no actual implementation, for internal routing only
    }[KernelFunction, Description("获取用户飞船的名称")]public string GetMySpaceshipName(){return "长征七号";}[KernelFunction, Description("启动飞船")]public void StartSpaceship([Description("启动的飞船的名字")] string ship_name){//no actual implementation, for internal routing only
    }
}

同时，它会将你定义的Functions总结为一个string列表，然后作为可用的Function list 放到prompt中告诉大模型：

然后，就开始根据用户的prompt进行函数调用了，直到它认为不会再需要函数调用时就结束，这个方法的全部代码如下所示：

public async Task<string> RunAsync(string task)
{// Initialize pluginsvar plugins = _kernel.Plugins;var internalPlugin = _kernel.Plugins.AddFromType<PreTrainingPlugin>();// Convert plugins to textvar pluginsAsText = GetTemplatesAsTextPrompt3000(plugins);// Initialize function call and chat historyvar nextFunctionCall = new FunctionCall { Name = ConfigConstants.FunctionCallStatus.Start };var chatHistory = InitializeChatHistory(task);// Add new task to chat historychatHistory.Add(new ChatMessageContent(AuthorRole.User, $"New task: {task}"));// Process function callsfor (int iteration = 0; iteration < 10 && nextFunctionCall.Name != ConfigConstants.FunctionCallStatus.Finished; iteration++){nextFunctionCall = await GetNextFunctionCallAsync(chatHistory, pluginsAsText);if (nextFunctionCall == null) throw new Exception("The LLM is not compatible with this approach!");// Add function call to chat historyvar nextFunctionCallText = GetCallAsTextPrompt3000(nextFunctionCall);chatHistory.AddAssistantMessage(nextFunctionCallText);// Invoke plugin and add response to chat historyvar pluginResponse = await InvokePluginAsync(nextFunctionCall);chatHistory.AddUserMessage(pluginResponse);}// Remove internal plugin
    _kernel.Plugins.Remove(internalPlugin);// Check if task was completed successfullyif (nextFunctionCall.Name == ConfigConstants.FunctionCallStatus.Finished){var finalMessage = nextFunctionCall.Parameters[0].Value.ToString();return finalMessage;}throw new Exception("LLM could not finish workflow within 10 steps. Please consider increasing the number of steps!");
}

需要特别注意的是，不建议在一个prompt中涉及超过10次函数调用，这样效果不太好，处理速度也慢，验证也不太方便。

此外，在方法内部进行函数调用的分析时，自动加了一个如下所示的SystemMessage，用于设定一些通用的规则给到大模型进行理解：

private string GetLoopSystemMessage(string pluginsAsTextPrompt3000)
{var systemPrompt = $@"你是一个计算机系统。
你只能使用TextPrompt3000指令，让用户调用对应的函数，而用户将作为另一个回答这些函数的计算机系统。
以下是您所需实现的目标，以及用户可以使用的函数列表。
您需要找出用户到达目标的下一步，并推荐一个TextPrompt3000函数调用。 
您还会得到一个TextPrompt3000 Schema格式的函数列表。
TextPrompt3000格式的定义如下所示:
{GetTextPrompt300Explanation()}
##可用函数列表开始##
{pluginsAsTextPrompt3000}
##可用函数列表结束##以下规则非常重要：
1) 你只能推荐一个函数及其参数，而不是多个函数
2) 你可以推荐的函数只存在于可用函数列表中
3) 你需要为该函数提供所有参数。不要在函数名或参数名中转义特殊字符，直接使用（如只写aaa_bbb，不要写成aaa\_bbb）
4) 你推荐的历史记录与函数需要对更接近目标有重要作用
5) 不要将函数相互嵌套。 遵循列表中的函数，这不是一个数学问题。 不要使用占位符。
我们只需要一个函数，下一个所需的函数。举个例子， 如果 function A() 需要在 function B()中当参数使用, 不要使用 B(A())。 而是,
如果A还没有被调用, 先调用 A()。返回的结果将在下一次迭代中在B中使用。
6) 不要推荐一个最近已经调用过的函数。 使用输出代替。 不要将占位符或函数作为其他函数的参数使用。
7) 只写出一个函数调用，不解释原因，不提供理由。您只能写出一个函数调用！
8) 当所有必需的函数都被调用，且计算机系统呈现了结果，调用Finished函数并展示结果。
9) 请使用中文回答。如果你违反了任何这些规定，那么会有一只小猫死去。
";return systemPrompt;
}

综上所示，这就是提示词工程的魔力所在！

更新后的AI Agent效果

这里我们快速对原来的WorkOrder Agent重构了一下，增加了 Use Function Planner 的 checkbox选项，如果你勾选了它，就会使用上面介绍的 UniversalFunctionCaller 进行prompt的包裹和预处理，然后再发给大模型以及进行函数调用。

这里我修改了使用的模型和平台信息，这里我们基于SiliconCloud来使用一个通义千问的小参数文本生成模型Qwen2-7B-Instruct来试试：

{"LLM_API_PROVIDER": "QwenAI","LLM_API_MODEL": "Qwen/Qwen2-7B-Instruct","LLM_API_BASE_URL": "https://api.siliconflow.cn","LLM_API_KEY": "sk-**************" // Update this value to yours
}

具体效果如下图所示：

（1）没有使用Function Planner的效果

（2）使用了Function Planner的效果

可以看到，我的需求其实包含3个步骤：第一步是更新工单的Quantity，第二步是更新工单的状态，第三步是查询更新后的工单信息。而这几个步骤我们假设其实都是需要去调用MES WorkOrderService API才能获得的，这里我们的Agent理解到了要点，并分别调用了两个function实现了任务。

这个示例代码的结构如下所示：