https://github.com/olegtarasov/fasttext.netwrapper
.NET Standard wrapper for fastText library. Now works on Windows, Linux and MacOs!
https://github.com/olegtarasov/fasttext.netwrapper
csharp fasttext machine-learning net nlp
Last synced: 8 months ago
JSON representation
.NET Standard wrapper for fastText library. Now works on Windows, Linux and MacOs!
- Host: GitHub
- URL: https://github.com/olegtarasov/fasttext.netwrapper
- Owner: olegtarasov
- License: mit
- Created: 2018-01-20T09:17:00.000Z (over 8 years ago)
- Default Branch: master
- Last Pushed: 2024-09-30T11:46:10.000Z (over 1 year ago)
- Last Synced: 2025-08-24T08:39:43.432Z (10 months ago)
- Topics: csharp, fasttext, machine-learning, net, nlp
- Language: C#
- Homepage:
- Size: 2.28 MB
- Stars: 76
- Watchers: 4
- Forks: 10
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
[](https://github.com/olegtarasov/FastText.NetWrapper/actions)
[](https://www.nuget.org/packages/FastText.NetWrapper)
[](https://www.nuget.org/packages/FastText.NetWrapper)
# FastText.NetWrapper
This is a cross-platform .NET Standard wrapper for Facebook's [FastText](https://github.com/facebookresearch/fastText) library.
The wrapper comes with bundled precompiled native binaries for all three platforms: Windows, Linux and MacOs.
Just add it to your project and start using it! No additional setup required. This library will unpack and call appropriate native
binary depending on target platform.
## Is this project dead or abandoned?
Of course not! It's just complete :) There are no major updates for fastText, and most bugs in this repository are fixed. All features
should work and if something doesn't — just ping me with an issue and I will try to get back to you.
## Usage
Library API closely follows fastText command-line interface, so you can jump right in.
### Supervised model training
The simplest use case is to train a supervised model with default parameters. We create a `FastTextWrapper` and call `Supervised()`.
```c#
var fastText = new FastTextWrapper();
fastText.Supervised("cooking.train.txt", "cooking");
```
Note the arguments:
1. We specify an input file with one labeled example per line. Here we use Stack Overflow cooking dataset from Facebook:
https://dl.fbaipublicfiles.com/fasttext/data/cooking.stackexchange.tar.gz. You can find extracted files split into training
and validation sets in `UnitTests` directory in this repository.
2. Your model will be saved to `cooking.bin` and `cooking.vec` with pretrained vectors will be placed if the same directory.
3. Here we use `Supervised()` overload with 2 arguments. This means that training will be done with default parameters.
It's a good starting point and is the same as calling fastText this way:
```bash
./fasttext supervised -input cooking.train.txt -output cooking
```
### Loading models
Call `LoadModel()` and specify path to the `.bin` model file:
```c#
var fastText = new FastTextWrapper();
fastText.LoadModel("model.bin");
```
### Using pretrained vectors
To use pretrained vectors for your supervised model, create an instance of `SupervisedArgs` and customize it:
**❗ Important ❗** It doesn't say this anywhere in the original documentation, but you must use preterained vectors in **text** format
(`.vec` file extension), and not in binary format. If you try to use binary vectors, you will get an error about your vectors having
the dimension 0.
```c#
var fastText = new FastTextWrapper();
var args = new SupervisedArgs
{
PretrainedVectors = "cooking.unsup.300.vec",
dim = 300
};
fastText.Supervised("cooking.train.txt", "cooking", args);
```
Here we get default training arguments, supply a path to pretrained vectors file and adjust vector dimension accordingly.
**❗ Important ❗** Be sure to always check the dimension of your pretrained vectors! Many vectors on the internet have dimension `300`,
but default dimension for fastText supervised model training is `100`.
### Testing the model
Now you can easily test a supervised model against a validation set. You can specify different values for `k` and `threshlod` as well.
```c#
var result = fastText.Test("cooking.valid.txt");
```
You will get an instance of `TestResult` where you can find aggregated or per-label metrics:
```c#
Console.WriteLine($"Results:\n\tPrecision: {result.GlobalMetrics.GetPrecision()}" +
$"\n\tRecall: {result.GlobalMetrics.GetRecall()}" +
$"\n\tF1: {result.GlobalMetrics.GetF1()}");
```
You can even get a precision-recall curve (aggregated or per-label)! Here is an example of exporting an SVG plot with cross-platform
[OxyPlot library](https://oxyplot.github.io):
```c#
var result = fastText.Test("cooking.valid.txt");
var curve = result.GetPrecisionRecallCurve();
var series = new LineSeries {StrokeThickness = 1};
series.Points.AddRange(curve.Select(x => new DataPoint(x.recall, x.precision)).OrderBy(x => x.X));
var plotModel = new PlotModel
{
Series = { series },
Axes =
{
new LinearAxis {Position = AxisPosition.Bottom, Title = "Recall"},
new LinearAxis {Position = AxisPosition.Left, Title = "Precision"}
}
};
using (var stream = new FileStream("precision-recall.svg", FileMode.Create, FileAccess.Write))
{
SvgExporter.Export(plotModel, stream, 600, 600, false);
}
```

### Supervised model quantization
You can train a new supervised model and quantize it immediatly by replacing `SupervisedArgs` with `QuantizedSupervisedArgs`:
```c#
var fastText = new FastTextWrapper();
fastText.Supervised("cooking.train.txt", "cooking", new QuantizedSupervisedArgs());
```
You can also load an existing model and quantize it:
```c#
var fastText = new FastTextWrapper();
fastText.LoadModel("model.bin");
fastText.Quantize();
```
### Training unsupervised models
Use `Unsupervised()` method specifying model type: Skipgram or Cbow:
```c#
var fastText = new FastTextWrapper();
fastText.Unsupervised(UnsupervisedModel.SkipGram, "cooking.train.nolabels.txt", "cooking");
```
You can use an optional `UnsupervisedArgs` argument to customize training.
### Automatic hyperparameter tuning
You can use fastText autotune to do an automatic hyperparameter search.
Refer to https://github.com/facebookresearch/fastText/blob/master/docs/autotune.md for complete parameter reference.
Use `AutotuneArgs` to control tuning:
```c#
var fastText = new FastTextWrapper();
var autotuneArgs = new AutotuneArgs
{
Duration = 30, // in seconds
Metric = "precisionAtRecall:30", // supports custom metrics
Predictions = 2, // Supports @k predictions
ModelSize = "10M", // Set this to train a quantized model and do an
// additional quantization hyperparameter search. Requires QuantizedSupervisedArgs.
ValidationFile = "cooking.valid.txt" // REQUIRED: path to a validation file
};
fastText.Supervised("cooking.train.txt", "cooking", new QuantizedSupervisedArgs(), autotuneArgs);
```
### Progress callbacks
You can get progress callbacks from the native library. To do so, add a handler to `(Un)SupervisedArgs.TrainProgressCallback` for
simple training, or to `AutotuneArgs.AutotuneProgressCallback` for hyperparameter tuning.
See `ConsoleTest` project for an example of using training callbacks with `ShellProgressBar` library:
```c#
using (var pBar = new ProgressBar(100, "Training"))
{
var ftArgs = new SupervisedArgs
{
// ... Other args
verbose = 0,
TrainProgressCallback = (progress, loss, wst, lr, eta) =>
{
pBar.Tick((int)Math.Ceiling(progress * 100), $"Loss: {loss}, words/thread/sec: {wst}, LR: {lr}, ETA: {eta}");
}
};
fastText.Supervised("cooking.train.txt", outPath, ftArgs);
}
```

### Stopping `stderr` output
Native FastText library reports training progress to `stderr` by default. You can turn off this output by setting
`(Un)SupervisedArgs.verbose = 0` for simple training and `AutotuneArgs.Verbose = 0` for hyperparameter tuning.
### Getting logs from the wrapper
`FastTextWrapper` can produce a small amount of logs mostly concerning native library management. You can turn logging on by providing an
instance of `Microsoft.Extensions.Logging.ILoggerFactory`. In this example we use Serilog with console sink.
You can also inject your standard `IloggerFactory` through .NET Core DI.
```c#
// Add the following Nuget packages to your project:
// * Serilog.Sinks.Console
// * Serilog.Extensions.Logging
Log.Logger = new LoggerConfiguration()
.MinimumLevel.Debug()
.WriteTo.Console(theme: ConsoleTheme.None)
.CreateLogger();
var fastText = new FastTextWrapper(loggerFactory: new SerilogLoggerFactory());
```
### Handling native exceptions
In version `1.1` I've added much better native error handling. Now in case of most native errors you will get a nice
`NativeLibraryException` which you can inspect for detailed error description.
## Windows Requirements
Since this wrapper uses native C++ binaries under the hood, you will need to have Visual C++ Runtime Version 140 installed when
running under Windows. Visit the MS Downloads page (https://support.microsoft.com/en-us/help/2977003/the-latest-supported-visual-c-downloads)
and select the appropriate redistributable.
## FastText C-style API
If you are interested in using FastText with C-style API, here is my fork of the official library: https://github.com/olegtarasov/fastText.
## Changelog
### `1.3.1`
* Updated fastText binaries with latest improvements from the Facebook repo.
### `1.3.0`
* Native libraries are now explicitly included in target project and copied to output directory. Hopefully,
this solves a couple of problems with the previous approach of dynamically extracting libraries from
resources.
### `1.2.5`
* Fixed progress callbacks for unsupervised model training.
### `1.2.4`
* Added progress callbacks for model training and autotuning.
### `1.2.3`
* Added supervised model quantization with `Quantize` method.
* Stable version released! 🎉
### `1.2.2-preview`
* Merged #20 with new `GetWordVector` method.
### `1.2.1-preview`
* Added model autotuning with quantization support.
* Fixed a horrible bug with `bool` marshalling.
### `1.2.0-preview`
Version 1.2.0 introduces a few breaking changes to library API. If you are not ready to migrate, use v. `1.1.2`.
* **❗️Breaking change:️** Removed both deprecated `Train()` methods.
* **❗️Breaking change:️** Removed deprecated `SupervisedArgs` class.
* **❗️Breaking change:️** Removed `FastTextArgs.SupervisedDefaults()` in favor of new `SupervisedArgs` with default constructor.
* **❗️Breaking change:️** `FastTextArgs` class can't be constructed directly, use new `SupervisedArgs` and `UnsupervisedArgs` classes.
* Added an `Unsupervised()` method to train Skipgram or Cbow models.
### `1.1.2`
* Fixed a horrible bug with `bool` marshalling on a `1.1.*` branch.
### `1.1.0`, `1.1.1`
* Added new `Supervised()` method as part of streamlining the API.
* Added new `Test()` method for testing supervised model.
* Deprecated both `Train()` methods. They will be removed in v. `1.2.0`.
### `1.0.38`
* Fixed a horrible bug with `bool` marshalling on a `1.0.*` branch.
## Version `1.2.0` migration guide
* Instead of old `Train()` methods use `Supervised()` and `Unsupervised()` methods.
* Instead of `FastTextArgs.SupervisedDefaults()` use `SupervisedArgs` or `Supervised()` overload with 2 arguments.