ml_algo 4.1.0
ml_algo: ^4.1.0 copied to clipboard
Machine learning algorithms written in native dart (without bindings to any popular ML libraries, pure Dart implementation)
Machine learning algorithms with dart #
Following algorithms are implemented:
-
Linear regression:
- gradient descent algorithm (batch, mini-batch, stochastic) with ridge regularization
- lasso regression (feature selection model)
-
Linear classifier:
- Logistic regression (with "one-vs-all" multinomial classification)
Usage #
Real life example #
Let's classify records from well-known dataset - Pima Indians Diabets Database via Logistic regressor
Import all necessary packages:
import 'dart:async';
import 'dart:convert';
import 'dart:io';
import 'dart:typed_data';
import 'package:csv/csv.dart' as csv;
import 'package:ml_algo/ml_algo.dart';
import 'package:ml_linalg/linalg.dart';
Read csv
-file pima_indians_diabetes_database.csv
with test data. You can use csv from the library's
datasets directory:
final csvCodec = csv.CsvCodec(eol: '\n');
final input = File('datasets/pima_indians_diabetes_database.csv').openRead();
final fields = (await input.transform(utf8.decoder).transform(csvCodec.decoder).toList()).sublist(1);
Data in this file is represented by 768 records and 8 features. Let's do some data preprocessing.
Let's extract features from the data. Declare utility function extractFeatures
:
final extractFeatures = (List<Object> item) => item.map((Object feature) => (feature as num).toDouble()).toList();
...and get all features:
final features = fields.map((List item) => extractFeatures(item.sublist(0, item.length - 1))).toList(growable: false);
For now, our features are just a list. For further processing we should create a matrix, based on the feature list:
final featuresMatrix = Float32x4Matrix.from(features);
To get more information about Float32x4Matrix
, please, see ml_linal repo
Also, we need to extract labels (last element of each line)
final labels = Float32x4Vector.from(fields.map((List<dynamic> item) => (item.last as num).toDouble()));
Then, we should create an instance of CrossValidator
class for fitting hyper parameters
of our model
final validator = CrossValidator<Float32x4>.KFold();
All are set, so, we can perform our classification. For better hyperparameters fitting, let's create a loop in order to try each value of a chosen hyperparameter in a defined range:
final step = 0.001;
final limit = 0.6;
double minError = double.infinity;
double bestLearningRate = 0.0;
for (double rate = step; rate < limit; rate += step) {
// ...
}
Let's create a logistic regression classifier instance with stochastic gradient descent optimizer in the loop's body:
final logisticRegressor = LogisticRegressor(
iterationLimit: 100,
learningRate: rate,
batchSize: 1,
learningRateType: LearningRateType.constant,
fitIntercept: true);
Evaluate our model via accuracy metric:
final error = validator.evaluate(logisticRegressor, featuresMatrix, labels, MetricType.accuracy);
if (error < minError) {
minError = error;
bestLearningRate = rate;
}
Let's print score:
print('best error on classification: ${(minError * 100).toFixed(2)}');
print('best learning rate: ${bestLearningRate.toFixed(3)}');
Best model parameters search takes much time so far, so be patient. After the search is over, we will see something like this:
best error on classification: 35.5%
best learning rate: 0.155
All the code above all together:
import 'dart:async';
import 'dart:convert';
import 'dart:io';
import 'dart:typed_data';
import 'package:csv/csv.dart' as csv;
import 'package:ml_algo/ml_algo.dart';
import 'package:ml_linalg/linalg.dart';
Future<double> main() async {
final csvCodec = csv.CsvCodec(eol: '\n');
final input = File('datasets/pima_indians_diabetes_database.csv').openRead();
final fields = (await input.transform(utf8.decoder).transform(csvCodec.decoder).toList()).sublist(1);
final extractFeatures = (List<Object> item) => item.map((Object feature) => (feature as num).toDouble()).toList();
final features = fields.map((List item) => extractFeatures(item.sublist(0, item.length - 1))).toList(growable: false);
final featuresMatrix = Float32x4Matrix.from(features);
final labels = Float32x4Vector.from(fields.map((List<dynamic> item) => (item.last as num).toDouble()));
final validator = CrossValidator<Float32x4>.kFold(numberOfFolds: 7);
final step = 0.001;
final limit = 0.6;
double minError = double.infinity;
double bestLearningRate = 0.0;
for (double rate = step; rate < limit; rate += step) {
final logisticRegressor = LogisticRegressor(
iterationLimit: 100,
learningRate: rate,
batchSize: 1,
learningRateType: LearningRateType.constant,
fitIntercept: true);
final error = validator.evaluate(logisticRegressor, featuresMatrix, labels, MetricType.accuracy);
if (error < minError) {
minError = error;
bestLearningRate = rate;
}
}
print('best error on classification: ${(minError * 100).toFixed(2)}');
print('best learning rate: ${bestLearningRate.toFixed(3)}');
}
For more examples please see examples folder
Contacts #
If you have questions, feel free to write me on