推出 LiteRT Next：一组全新的 API，可改进和简化设备端硬件加速。

此页面由 Cloud Translation API 翻译。

适用于 Android 的 AI Edge 函数调用指南

AI Edge Function Calling SDK (FC SDK) 是一个库，可让开发者将函数调用与设备端 LLM 搭配使用。借助函数调用，您可以将模型连接到外部工具和 API，让模型能够使用必要的参数调用特定函数，以执行实际操作。

使用 FC SDK 的 LLM 不仅可以生成文本，还可以生成对执行操作的函数的结构化调用，例如搜索最新信息、设置闹钟或进行预订。

本指南将引导您完成一个基本快速入门，以便将 LLM Inference API 与 FC SDK 一起添加到 Android 应用。本指南重点介绍如何向设备端 LLM 添加函数调用功能。如需详细了解如何使用 LLM Inference API，请参阅 Android 版 LLM 推理指南。

快速入门

请按照以下步骤在 Android 应用中使用 FC SDK。本快速入门将 LLM Inference API 与 Hammer 2.1 (1.5B) 搭配使用。LLM Inference API 针对高端 Android 设备（例如 Pixel 8 和 Samsung S23 或更新型号）进行了优化，并且无法可靠地支持设备模拟器。

添加依赖项

FC SDK 使用 com.google.ai.edge.localagents:localagents-fc 库，LLM 推理 API 使用 com.google.mediapipe:tasks-genai 库。将这两个依赖项添加到 Android 应用的 build.gradle 文件中：

dependencies {
    implementation 'com.google.mediapipe:tasks-genai:0.10.24'
    implementation 'com.google.ai.edge.localagents:localagents-fc:0.1.0'
}

对于搭载 Android 12（API 31）或更高版本的设备，请添加原生 OpenCL 库依赖项。如需了解详情，请参阅有关 uses-native-library 标记的文档。

将以下 uses-native-library 标记添加到 AndroidManifest.xml 文件中：

<uses-native-library android:name="libOpenCL.so" android:required="false"/>
<uses-native-library android:name="libOpenCL-car.so" android:required="false"/>
<uses-native-library android:name="libOpenCL-pixel.so" android:required="false"/>

下载模型

从 HuggingFace 下载 8 位量化格式的 Hammer 1B。如需详细了解可用模型，请参阅“模型”文档。

将 hammer2.1_1.5b_q8_ekv4096.task 文件夹的内容推送到 Android 设备。

$ adb shell rm -r /data/local/tmp/llm/ # Remove any previously loaded models
$ adb shell mkdir -p /data/local/tmp/llm/
$ adb push hammer2.1_1.5b_q8_ekv4096.task /data/local/tmp/llm/hammer2.1_1.5b_q8_ekv4096.task

声明函数定义

定义将提供给模型的函数。为了说明此过程，本快速入门包含两个函数，它们作为静态方法返回硬编码的响应。更实用的实现是定义用于调用 REST API 或从数据库检索信息的函数。

以下代码定义了 getWeather 和 getTime 函数：

class ToolsForLlm {
    public static String getWeather(String location) {
        return "Cloudy, 56°F";
    }

    public static String getTime(String timezone) {
        return "7:00 PM " + timezone;
    }

    private ToolsForLlm() {}
}

使用 FunctionDeclaration 描述每个函数，为每个函数指定名称和说明，并指定类型。这会告知模型这些函数的用途以及何时进行函数调用。

var getWeather = FunctionDeclaration.newBuilder()
    .setName("getWeather")
    .setDescription("Returns the weather conditions at a location.")
    .setParameters(
        Schema.newBuilder()
            .setType(Type.OBJECT)
            .putProperties(
                "location",
                Schema.newBuilder()
                    .setType(Type.STRING)
                    .setDescription("The location for the weather report.")
                    .build())
            .build())
    .build();
var getTime = FunctionDeclaration.newBuilder()
    .setName("getTime")
    .setDescription("Returns the current time in the given timezone.")

    .setParameters(
        Schema.newBuilder()
            .setType(Type.OBJECT)
            .putProperties(
                "timezone",
                Schema.newBuilder()
                    .setType(Type.STRING)
                    .setDescription("The timezone to get the time from.")
                    .build())
            .build())
    .build();

将函数声明添加到 Tool 对象：

var tool = Tool.newBuilder()
    .addFunctionDeclarations(getWeather)
    .addFunctionDeclarations(getTime)
    .build();

创建推理后端

使用 LLM 推理 API 创建推理后端，并将模型的格式设置对象传递给该后端。FC SDK 格式化程序 (ModelFormatter) 既可用作格式化程序，也可用作解析器。由于本快速入门使用的是 Gemma-3 1B，因此我们将使用 GemmaFormatter：

var llmInferenceOptions = LlmInferenceOptions.builder()
    .setModelPath(modelFile.getAbsolutePath())
    .build();
var llmInference = LlmInference.createFromOptions(context, llmInferenceOptions);
var llmInferenceBackend = new llmInferenceBackend(llmInference, new GemmaFormatter());

如需了解详情，请参阅 LLM 推理配置选项。

实例化模型

使用 GenerativeModel 对象连接推理后端、系统提示和工具。我们已经有了推理后端和工具，因此只需创建系统提示即可：

var systemInstruction = Content.newBuilder()
      .setRole("system")
      .addParts(Part.newBuilder().setText("You are a helpful assistant."))
      .build();

使用 GenerativeModel 实例化模型：

var generativeModel = new GenerativeModel(
    llmInferenceBackend,
    systemInstruction,
    List.of(tool),
)

发起聊天会话

为简单起见，本快速入门将启动单个聊天会话。您还可以创建多个独立的会话。

使用 GenerativeModel 的新实例启动聊天会话：

var chat = generativeModel.startChat();

使用 sendMessage 方法通过聊天会话向模型发送提示：

var response = chat.sendMessage("How's the weather in San Francisco?");

解析模型回答

向模型传递提示后，应用必须检查响应，以确定是进行函数调用还是输出自然语言文本。

// Extract the model's message from the response.
var message = response.getCandidates(0).getContent().getParts(0);

// If the message contains a function call, execute the function.
if (message.hasFunctionCall()) {
  var functionCall = message.getFunctionCall();
  var args = functionCall.getArgs().getFieldsMap();
  var result = null;

  // Call the appropriate function.
  switch (functionCall.getName()) {
    case "getWeather":
      result = ToolsForLlm.getWeather(args.get("location").getStringValue());
      break;
    case "getTime":
      result = ToolsForLlm.getWeather(args.get("timezone").getStringValue());
      break;
    default:
      throw new Exception("Function does not exist:" + functionCall.getName());
  }
  // Return the result of the function call to the model.
  var functionResponse =
      FunctionResponse.newBuilder()
          .setName(functionCall.getName())
          .setResponse(
              Struct.newBuilder()
                  .putFields("result", Value.newBuilder().setStringValue(result).build()))
          .build();
  var response = chat.sendMessage(functionResponse);
} else if (message.hasText()) {
  Log.i(message.getText());
}

示例代码过于简化了实现。如需详细了解应用如何检查模型响应，请参阅格式设置和解析。

运作方式

本部分将更深入地介绍 Function Calling SDK for Android 的核心概念和组件。

模型

函数调用 SDK 需要具有格式化程序和解析器的模型。FC SDK 包含适用于以下模型的内置格式设置程序和解析器：

Gemma：使用 GemmaFormatter。
骆驼：使用 LlamaFormatter。
锤子：使用 HammerFormatter。

如需将其他模型与 FC SDK 搭配使用，您必须开发与 LLM 推理 API 兼容的自定义格式化程序和解析器。

格式设置和解析

函数调用支持的一个关键部分是提示的格式设置和模型输出的解析。虽然这两个过程是分开的，但 FC SDK 会使用 ModelFormatter 接口同时处理格式设置和解析。

格式设置程序负责将结构化函数声明转换为文本、设置函数响应的格式，以及插入用于指示对话轮次的开始和结束以及这些轮次的角色（例如“用户”“模型”）的令牌。

解析器负责检测模型响应是否包含函数调用。如果解析器检测到函数调用，则会将其解析为结构化数据类型。否则，它会将文本视为自然语言响应。

受限解码

受限解码是一种指导 LLM 生成输出的方法，可确保输出遵循预定义的结构化格式，例如 JSON 对象或 Python 函数调用。通过强制执行这些约束条件，模型会以与预定义函数及其对应参数类型一致的方式设置输出格式。

如需启用受限解码，请在 ConstraintOptions 对象中定义约束条件，并调用 ChatSession 实例的 enableConstraint 方法。启用此约束条件后，响应将仅包含与 GenerativeModel 关联的工具。

以下示例演示了如何配置受限解码，以限制对工具调用的响应。它会将工具调用限制为以 ```tool_code\n 前缀开头，以 \n``` 后缀结尾。

ConstraintOptions constraintOptions = ConstraintOptions.newBuilder()
  .setToolCallOnly( ConstraintOptions.ToolCallOnly.newBuilder()
  .setConstraintPrefix("```tool_code\n")
  .setConstraintSuffix("\n```"))
  .build();
chatSession.enableConstraint(constraintOptions);

如需在同一会话中停用有效约束条件，请使用 disableConstraint 方法：

chatSession.disableConstraint();