flutter_gemma 0.4.6
flutter_gemma: ^0.4.6 copied to clipboard
The plugin allows running the Gemma AI model locally on a device from a Flutter application.
Flutter Gemma #
Gemma is a family of lightweight, state-of-the art open models built from the same research and technology used to create the Gemini models
Bring the power of Google's lightweight Gemma language models directly to your Flutter applications. With Flutter Gemma, you can seamlessly incorporate advanced AI capabilities into your iOS and Android apps, all without relying on external servers.
There is an example of using:
Features #
- Local Execution: Run Gemma models directly on user devices for enhanced privacy and offline functionality.
- Platform Support: Compatible with both iOS and Android platforms.
- LoRA Support: Efficient fine-tuning and integration of LoRA (Low-Rank Adaptation) weights for tailored AI behavior.
- Ease of Use: Simple interface for integrating Gemma models into your Flutter projects.
Installation #
-
Add
flutter_gemma
to yourpubspec.yaml
:dependencies: flutter_gemma: latest_version
-
Run
flutter pub get
to install.
Setup #
- Download Model and optionally LoRA Weights: Obtain a pre-trained Gemma model (recommended: 2b or 2b-it) from Kaggle
- Optionally, fine-tune a model for your specific use case
- Platfrom specific setup:
iOS
- Enable file sharing in
info.plist
:
<key>UIFileSharingEnabled</key>
<true/>
- Change the linking type of pods to static, replace
use_frameworks!
in Podfile withuse_frameworks! :linkage => :static
Android
- If you want to use a GPU to work with the model, you need to add OpenGL support in the manifest.xml. If you plan to use only the CPU, you can skip this step.
Add to 'AndroidManifest.xml' above tag </application>
<uses-native-library
android:name="libOpenCL.so"
android:required="false"/>
<uses-native-library android:name="libOpenCL-car.so" android:required="false"/>
<uses-native-library android:name="libOpenCL-pixel.so" android:required="false"/>
Web
-
Web currently works only GPU backend models, CPU backend models are not suported by Mediapipe yet
-
Add dependencies to
index.html
file in web folder
<script type="module">
import { FilesetResolver, LlmInference } from 'https://cdn.jsdelivr.net/npm/@mediapipe/tasks-genai';
window.FilesetResolver = FilesetResolver;
window.LlmInference = LlmInference;
</script>
- Prepare Model:
Place the model in the assets or upload it to a network drive, such as Firebase.
ATTENTION!! You do not need to load the model every time the application starts; it is stored in the system files and only needs to be done once. Please carefully review the example application. You should use loadAssetModel and loadNetworkModel methods only when you need to upload the model to device
Usage #
1.Loading Models from assets (available only in debug mode):
Dont forget to add your model to pubspec.yaml
- Loading from assets (loraUrl is optional)
await FlutterGemmaPlugin.instance.loadAssetModel(fullPath: 'model.bin', loraPath: 'lora_weights.bin');
- Loading froms assets with Progress Status (loraUrl is optional)
FlutterGemmaPlugin.instance.loadAssetModelWithProgress(fullPath: 'model.bin', loraPath: 'lora_weights.bin').listen(
(progress) {
print('Loading progress: $progress%');
},
onDone: () {
print('Model loading complete.');
},
onError: (error) {
print('Error loading model: $error');
},
);
1.Loading Models from network:
-
For web usage, you will also need to enable CORS (Cross-Origin Resource Sharing) for your network resource. To enable CORS in Firebase, you can follow the guide in the Firebase documentation: Setting up CORS
- Loading from the network (loraUrl is optional).
await FlutterGemmaPlugin.instance.loadNetworkModel(url: 'https://example.com/model.bin', loraUrl: 'https://example.com/lora_weights.bin');
- Loading froms the network with Progress Status (loraUrl is optional)
FlutterGemmaPlugin.instance.loadNetworkModelWithProgress(url: 'https://example.com/model.bin', loraUrl: 'https://example.com/lora_weights.bin').listen(
(progress) {
print('Loading progress: $progress%');
},
onDone: () {
print('Model loading complete.');
},
onError: (error) {
print('Error loading model: $error');
},
);
3.Initialize:
void main() async {
WidgetsFlutterBinding.ensureInitialized();
await FlutterGemmaPlugin.instance.init(
maxTokens: 512, /// maxTokens is optional, by default the value is 1024
temperature: 1.0, /// temperature is optional, by default the value is 1.0
topK: 1, /// topK is optional, by default the value is 1
randomSeed: 1, /// randomSeed is optional, by default the value is 1
);
runApp(const MyApp());
}
4.Generate response
final flutterGemma = FlutterGemmaPlugin.instance;
String response = await flutterGemma.getResponse(prompt: 'Tell me something interesting');
print(response);
5.Generate response as a stream
final flutterGemma = FlutterGemmaPlugin.instance;
flutterGemma.getAsyncResponse(prompt: 'Tell me something interesting').listen((String? token) => print(token));
6.Generate chat response This method works properly only for instruction tuned (like gemma2b-it) models
final flutterGemma = FlutterGemmaPlugin.instance;
final messages = <Message>[];
messages.add(Message(text: 'Who are you?', isUser: true);
String response = await flutterGemma.getChatResponse(messages: messages);
print(response);
messages.add(Message(text: response));
messages.add(Message(text: 'Really?', isUser: true));
String response = await flutterGemma.getChatResponse(messages: messages);
print(response);
7.Generate chat response as a stream This method works properly only for instruction tuned (like gemma2b-it) models
final flutterGemma = FlutterGemmaPlugin.instance;
final messages = <Message>[];
messages.add(Message(text: 'Who are you?', isUser: true);
flutterGemma.getAsyncChatResponse(messages: messages).listen((String? token) => print(token));
8.Close
When you no longer need to perform any further inferences, call the close method to release resources:
await FlutterGemmaPlugin.instance.close();
If you need to use the inference again later, remember to call init() again before generating responses.
The full and complete example you can find in example
folder
Important Considerations
- Larger models (like 7b and 7b-it) may be too resource-intensive for on-device use.
- LoRA weights allow efficient customization without the need for full model retraining.