推出 LiteRT Next：一組全新的 API，可改善及簡化裝置端硬體加速功能。

本頁面由 Cloud Translation API 翻譯而成。

Android 適用的 LLM 推論指南

LLM 推論 API 可讓您在裝置上完全執行 Android 應用程式的大型語言模型 (LLM)，用於執行多種工作，例如產生文字、以自然語言形式擷取資訊，以及摘要文件。這項工作內建多種文字對文字大型語言模型的支援功能，因此您可以將最新的裝置端生成式 AI 模型套用至 Android 應用程式。

如要快速將 LLM Inference API 新增至 Android 應用程式，請按照快速入門操作。如需執行 LLM Inference API 的 Android 應用程式基本範例，請參閱範例應用程式。如要進一步瞭解 LLM Inference API 的運作方式，請參閱「設定選項」、「模型轉換」和「LoRA 調整」一節。

您可以透過 MediaPipe Studio 示範，瞭解如何執行此工作。如要進一步瞭解這項工作的功能、模型和設定選項，請參閱總覽。

快速入門導覽課程

請按照下列步驟，將 LLM Inference API 新增至 Android 應用程式。LLM Inference API 是針對高階 Android 裝置 (例如 Pixel 8 和 Samsung S23 以上機型) 進行最佳化，不支援裝置模擬器。

新增依附元件

LLM 推論 API 會使用 com.google.mediapipe:tasks-genai 程式庫。請將這個依附元件新增至 Android 應用程式的 build.gradle 檔案：

dependencies {
    implementation 'com.google.mediapipe:tasks-genai:0.10.24'
}

下載模型

從 Hugging Face 下載 Gemma-3 1B 的 4 位元量化格式。如要進一步瞭解可用模型，請參閱模型說明文件。

將 output_path 資料夾的內容推送至 Android 裝置。

$ adb shell rm -r /data/local/tmp/llm/ # Remove any previously loaded models
$ adb shell mkdir -p /data/local/tmp/llm/
$ adb push output_path /data/local/tmp/llm/model_version.task

初始化工作

使用基本設定選項初始化工作：

// Set the configuration options for the LLM Inference task
val taskOptions = LlmInferenceOptions.builder()
        .setModelPath('/data/local/tmp/llm/model_version.task')
        .setMaxTopK(64)
        .build()

// Create an instance of the LLM Inference task
llmInference = LlmInference.createFromOptions(context, taskOptions)

執行工作

使用 generateResponse() 方法產生文字回覆。這會產生單一產生的回覆。

val result = llmInference.generateResponse(inputPrompt)
logger.atInfo().log("result: $result")

如要串流回應，請使用 generateResponseAsync() 方法。

val options = LlmInference.LlmInferenceOptions.builder()
  ...
  .setResultListener { partialResult, done ->
    logger.atInfo().log("partial result: $partialResult")
  }
  .build()

llmInference.generateResponseAsync(inputPrompt)

應用程式範例

如要查看 LLM 推論 API 的實際運作情形，並探索各種裝置端生成式 AI 功能，請試試 Google AI Edge Gallery 應用程式。

Google AI Edge Gallery 是開放原始碼 Android 應用程式，可做為開發人員的互動式遊樂場。展示以下內容：

實際使用 LLM Inference API 執行各種工作，包括：
- 圖片提問：上傳圖片並提出相關問題。取得說明、解決問題或辨識物品。
- 提示實驗室：摘要、重寫、產生程式碼，或使用自由格式提示，探索單回合 LLM 用途。
- AI 即時通訊：進行多輪對話。
能夠從 Hugging Face LiteRT 社群和官方 Google 發布內容 (例如 Gemma 3N) 中，探索、下載和實驗各種 LiteRT 最佳化模型。
不同型號的即時裝置端效能基準 (第一個符號的時間、解碼速度等)。
如何匯入及測試自訂 .task 模型。

這個應用程式是瞭解大型語言模型推論 API 實際導入方式，以及裝置端生成式 AI 潛力的資源。請前往 Google AI Edge Gallery GitHub 存放區，查看原始碼並下載應用程式。

設定選項

請使用下列設定選項設定 Android 應用程式：

選項名稱	說明	值範圍	預設值
`modelPath`	模型在專案目錄中的儲存路徑。	路徑	無
`maxTokens`	模型處理的符記數量上限 (輸入符記 + 輸出符記)。	整數	512
`topK`	模型在產生過程的每個步驟中會考慮的符記數量。將預測結果限制在機率最高的 K 個符記。	整數	40
`temperature`	產生期間引入的隨機程度。溫度越高，生成的文字就越有創意；溫度越低，生成的內容就越容易預測。	浮點值	0.8
`randomSeed`	文字產生期間使用的隨機種子。	整數	0
`loraPath`	裝置本機上 LoRA 模型的絕對路徑。注意：這項功能僅適用於 GPU 型號。	路徑	無
`resultListener`	將結果事件監聽器設為以非同步方式接收結果。僅適用於使用非同步產生方法時。	無	無
`errorListener`	設定選用的錯誤監聽器。	無	無

多模態提示

LLM 推論 API Android API 支援多模態提示，並使用可接受文字和圖片輸入內容的模型。啟用多模態功能後，使用者可以在提示中加入圖片和文字，而 LLM 會提供文字回覆。

如要開始使用，請使用 Gemma 3n 的 MediaPipe 相容變體：

Gemma-3n E2B：Gemma-3n 系列的 2B 模型。
Gemma-3n E4B：Gemma-3n 系列的 4B 模型。

詳情請參閱 Gemma-3n 說明文件。

如要在提示中提供圖片，請先將輸入圖片或影格轉換為 com.google.mediapipe.framework.image.MPImage 物件，再傳送至 LLM 推論 API：

import com.google.mediapipe.framework.image.BitmapImageBuilder
import com.google.mediapipe.framework.image.MPImage

// Convert the input Bitmap object to an MPImage object to run inference
val mpImage = BitmapImageBuilder(image).build()

如要啟用 LLM Inference API 的視覺支援功能，請在圖表選項中將 EnableVisionModality 設定選項設為 true：

LlmInferenceSession.LlmInferenceSessionOptions sessionOptions =
  LlmInferenceSession.LlmInferenceSessionOptions.builder()
    ...
    .setGraphOptions(GraphOptions.builder().setEnableVisionModality(true).build())
    .build();

Gemma-3n 每個工作階段最多可接受一張圖片，因此請將 MaxNumImages 設為 1。

LlmInferenceOptions options = LlmInferenceOptions.builder()
  ...
  .setMaxNumImages(1)
  .build();

以下是 LLM Inference API 的實作範例，可用來處理視覺和文字輸入內容：

MPImage image = getImageFromAsset(BURGER_IMAGE);

LlmInferenceSession.LlmInferenceSessionOptions sessionOptions =
  LlmInferenceSession.LlmInferenceSessionOptions.builder()
    .setTopK(10)
    .setTemperature(0.4f)
    .setGraphOptions(GraphOptions.builder().setEnableVisionModality(true).build())
    .build();

try (LlmInference llmInference =
    LlmInference.createFromOptions(ApplicationProvider.getApplicationContext(), options);
  LlmInferenceSession session =
    LlmInferenceSession.createFromOptions(llmInference, sessionOptions)) {
  session.addQueryChunk("Describe the objects in the image.");
  session.addImage(image);
  String result = session.generateResponse();
}

LoRA 自訂

LLM 推論 API 支援使用 PEFT (參數高效率微調) 程式庫進行 LoRA (低秩調整) 調整。LoRA 調整功能會透過經濟實惠的訓練程序，自訂 LLM 的行為，並根據新的訓練資料建立一小組可訓練的權重，而非重新訓練整個模型。

LLM 推論 API 可在 Gemma-2 2B、Gemma 2B 和 Phi-2 模型的注意力層中新增 LoRA 權重。以 safetensors 格式下載模型。

基礎模型必須採用 safetensors 格式，才能建立 LoRA 權重。完成 LoRA 訓練後，您可以將模型轉換為 FlatBuffers 格式，以便在 MediaPipe 上執行。

準備 LoRA 權重

請參閱 PEFT 的 LoRA 方法指南，根據自己的資料集訓練經過微調的 LoRA 模型。

LLM Inference API 僅支援注意力圖層的 LoRA，因此請只在 LoraConfig 中指定注意力圖層：

# For Gemma
from peft import LoraConfig
config = LoraConfig(
    r=LORA_RANK,
    target_modules=["q_proj", "v_proj", "k_proj", "o_proj"],
)

# For Phi-2
config = LoraConfig(
    r=LORA_RANK,
    target_modules=["q_proj", "v_proj", "k_proj", "dense"],
)

在準備資料集並儲存模型後，精細調校的 LoRA 模型權重會顯示在 adapter_model.safetensors 中。safetensors 檔案是模型轉換期間使用的 LoRA 查核點。

模型轉換

使用 MediaPipe Python 套件，將模型權重轉換為 Flatbuffer 格式。ConversionConfig 會指定基礎模型選項，以及其他 LoRA 選項。

import mediapipe as mp
from mediapipe.tasks.python.genai import converter

config = converter.ConversionConfig(
  # Other params related to base model
  ...
  # Must use gpu backend for LoRA conversion
  backend='gpu',
  # LoRA related params
  lora_ckpt=LORA_CKPT,
  lora_rank=LORA_RANK,
  lora_output_tflite_file=LORA_OUTPUT_FILE,
)

converter.convert_checkpoint(config)

轉換工具會產生兩個 Flatbuffer 檔案，一個用於基礎模型，另一個用於 LoRA 模型。

LoRA 模型推論

Android 在初始化期間支援靜態 LoRA。如要載入 LoRA 模型，請指定 LoRA 模型路徑和基礎 LLM。

// Set the configuration options for the LLM Inference task
val options = LlmInferenceOptions.builder()
        .setModelPath(BASE_MODEL_PATH)
        .setMaxTokens(1000)
        .setTopK(40)
        .setTemperature(0.8)
        .setRandomSeed(101)
        .setLoraPath(LORA_MODEL_PATH)
        .build()

// Create an instance of the LLM Inference task
llmInference = LlmInference.createFromOptions(context, options)

如要使用 LoRA 執行 LLM 推論，請使用與基礎模型相同的 generateResponse() 或 generateResponseAsync() 方法。