Flutter Gemma #

Gemma is a family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models

gemma_github_cover

Bring the power of Google's lightweight Gemma language models directly to your Flutter applications. With Flutter Gemma, you can seamlessly incorporate advanced AI capabilities into your iOS and Android apps, all without relying on external servers.

There is an example of using:

gemma_github_gif

Features #

Local Execution: Run Gemma models directly on user devices for enhanced privacy and offline functionality.
Platform Support: Compatible with both iOS and Android platforms.
LoRA Support: Efficient fine-tuning and integration of LoRA (Low-Rank Adaptation) weights for tailored AI behavior.
Ease of Use: Simple interface for integrating Gemma models into your Flutter projects.

Installation #

Add flutter_gemma to your pubspec.yaml:

dependencies:
  flutter_gemma: latest_version

Run flutter pub get to install.

Setup #

Download Model and optionally LoRA Weights: Obtain a pre-trained Gemma or Gemma-2 model (recommended: 2b-it) from Kaggle

Optionally, fine-tune a model for your specific use case
If you have LoRA weights, you can use them to customize the model's behavior without retraining the entire model.
There is an article that described all approaches

Platfrom specific setup:

iOS

Enable file sharing in info.plist:

<key>UIFileSharingEnabled</key>
<true/>

Change the linking type of pods to static, replace use_frameworks! in Podfile with use_frameworks! :linkage => :static

Android

If you want to use a GPU to work with the model, you need to add OpenGL support in the manifest.xml. If you plan to use only the CPU, you can skip this step.

Add to 'AndroidManifest.xml' above tag </application>

 <uses-native-library
     android:name="libOpenCL.so"
     android:required="false"/>
 <uses-native-library android:name="libOpenCL-car.so" android:required="false"/>
 <uses-native-library android:name="libOpenCL-pixel.so" android:required="false"/>

Web

Web currently works only GPU backend models, CPU backend models are not suported by Mediapipe yet
Add dependencies to index.html file in web folder

  <script type="module">
  import { FilesetResolver, LlmInference } from 'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai';
  window.FilesetResolver = FilesetResolver;
  window.LlmInference = LlmInference;
  </script>

Usage #

The new API splits functionality into two parts:

ModelFileManager: Manages model and LoRA weights file handling.
InferenceModel: Handles model initialization and response generation.
InferenceModelSession: Handles inference session that keeps context.

The updated API splits the functionality into two main parts:

Access the plugin via:

final gemma = FlutterGemmaPlugin.instance;

Managing Model Files with ModelFileManager

final modelManager = gemma.modelManager;

Place the model in the assets or upload it to a network drive, such as Firebase.

ATTENTION!! You do not need to load the model every time the application starts; it is stored in the system files and only needs to be done once. Please carefully review the example application. You should use loadAssetModel and loadNetworkModel methods only when you need to upload the model to device

Usage #

1.Loading Models from assets (available only in debug mode):

Dont forget to add your model to pubspec.yaml

Loading from assets (loraUrl is optional)

    await modelManager.loadAssetModel(fullPath: 'model.bin', loraPath: 'lora_weights.bin');

Loading froms assets with Progress Status (loraUrl is optional)

    modelManager.loadAssetModelWithProgress(fullPath: 'model.bin', loraPath: 'lora_weights.bin').listen(
    (progress) {
      print('Loading progress: $progress%');
    },
    onDone: () {
      print('Model loading complete.');
    },
    onError: (error) {
      print('Error loading model: $error');
    },
  );

2.Loading Models from network:

For web usage, you will also need to enable CORS (Cross-Origin Resource Sharing) for your network resource. To enable CORS in Firebase, you can follow the guide in the Firebase documentation: Setting up CORS
1. Loading from the network (loraUrl is optional).

   await modelManager.loadNetworkModel(url: 'https://example.com/model.bin', loraUrl: 'https://example.com/lora_weights.bin');

Loading froms the network with Progress Status (loraUrl is optional)

    modelManager.loadNetworkModelWithProgress(url: 'https://example.com/model.bin', loraUrl: 'https://example.com/lora_weights.bin').listen(
    (progress) {
      print('Loading progress: $progress%');
    },
    onDone: () {
      print('Model loading complete.');
    },
    onError: (error) {
      print('Error loading model: $error');
    },
);

Using manually placed Models You can manually place the model file on the device's filesystem and specify the paths to use it. This method is useful when you want to pre-load models externally or handle model storage yourself.

Example usage:

await modelManager.setModelPath('model.bin');
await modelManager.setLoraPath('lora_weights.bin');

Loading LoRA Weights only

Loading LoRA weight from the network.

await modelManager.loadLoraWeightsFromNetwork('https://example.com/lora_weights.bin');

Loading LoRA weight from assets.

await modelManager.loadLoraWeightsFromAsset('lora_weights.bin');

Deleting Models and Weights

You can delete the model and weights from the device. Deleting the model or LoRA weights will automatically close and clean up the inference. This ensures that there are no lingering resources or memory leaks when switching models or updating files.

await modelManager.deleteModel();
await modelManager.deleteLoraWeights();

5.Initialize Model and create a Session:

final inferenceModel = await gemma.createModel(
  isInstructionTuned: true, // Set true if using an instruction-tuned model
  maxTokens: 512, 
);

final session = await inferenceModel.createSession(
  temperature: 1.0,
  randomSeed: 1,
  topK: 1,
);

4.Generate a single response

String response = await session.getResponse(
  prompt: 'Tell me something interesting',
  isChat: false, // isChat indicates whether you're using chat-style context
);

5.Generate a Streamed Response (Async)

session.getResponseAsync(prompt: 'Tell me something interesting',isChat: false,)
    .listen((String token) {
      print(token);
    },
    onDone: () {
      print('Stream closed');
    },
    onError: (error) {
      print('Error: $error');
    },
);

6.Chat Scenario This method works properly only for instruction tuned (like gemma2b-it) models

String conversation = 'User: Hello, who are you?';
String response = await session.getResponse(prompt: conversation);
print(response);
// Next question
conversation = 'User: Are you sure?';
String response = await session.getResponse(prompt: conversation);
print(response);

7.Chat Scenario as a S tream This method works properly only for instruction tuned (like gemma2b-it) models

String conversation = 'User: Hello, who are you?';
session.getResponseAsync(prompt: conversation);.listen((String? token) => print(token));

8.Close Session if you would like to reset context

await session.close();

If you need to use the inference again later, remember to create new session before generating responses.

9.Close Model

When you no longer need to perform any further inferences, call the close method to release resources:

await inferenceModel.close();

If you need to use the inference again later, remember to call init() and create session again before generating responses.

The full and complete example you can find in example folder

Important Considerations

Model Size: Larger models (such as 7b and 7b-it) might be too resource-intensive for on-device inference.
LoRA Weights: They provide efficient customization without the need for full model retraining.
Development vs. Production: For production apps, do not embed the model or LoRA weights within your assets. Instead, load them once and store them securely on the device or via a network drive.
Web Models: Currently, Web support is available only for GPU backend models.

flutter_gemma 0.6.0
flutter_gemma: ^0.6.0 copied to clipboard

Metadata

Flutter Gemma #

Features #

Installation #

Setup #

Usage #

Usage #

← Metadata

Publisher

Weekly Downloads

Metadata

License

Dependencies

More

flutter_gemma 0.6.0 flutter_gemma: ^0.6.0 copied to clipboard

Metadata

Flutter Gemma #

Features #

Installation #

Setup #

Usage #

Usage #

← Metadata

Publisher

Weekly Downloads

Metadata

License

Dependencies

More

flutter_gemma 0.6.0
flutter_gemma: ^0.6.0 copied to clipboard