An open API service indexing awesome lists of open source software.

https://github.com/haifengl/smile

Statistical Machine Intelligence & Learning Engine
https://github.com/haifengl/smile

classification clustering computer-algebra-system computer-vision data-science dataframe deep-learning genetic-algorithm interpolation linear-algebra llm machine-learning manifold-learning multidimensional-scaling nearest-neighbor-search nlp regression statistics visualization wavelet

Last synced: about 2 months ago
JSON representation

Statistical Machine Intelligence & Learning Engine

Awesome Lists containing this project

README

          

# Statistical Machine Intelligence & Learning Engine SMILE
[![Maven Central](https://img.shields.io/maven-central/v/com.github.haifengl/smile-core)](https://central.sonatype.com/artifact/com.github.haifengl/smile-core)

## Goal ##
SMILE (Statistical Machine Intelligence & Learning Engine) is
a fast and comprehensive machine learning framework in Java.
SMILE v5.x requires Java 25, v4.x requires Java 21, and all previous versions
require Java 8. SMILE also provides APIs in Scala and Kotlin with
corresponding language paradigms. With advanced data structures and
algorithms, SMILE delivers state-of-art performance.
SMILE covers every aspect of machine learning, including deep learning,
large language models, classification, regression, clustering, association
rule mining, feature selection and extraction, manifold learning,
multidimensional scaling, genetic algorithms, missing value imputation,
efficient nearest neighbor search, etc. Furthermore, SMILE also provides
advanced algorithms for graph, linear algebra, numerical analysis,
interpolation, computer algebra system for symbolic manipulations,
and data visualization.

## Features ##
SMILE implements the following major machine learning algorithms:

- **LLM:**
Native Java implementation of Llama 3.1, tiktoken tokenizer, high performance
LLM inference server with OpenAI-compatible APIs and SSE-based chat streaming,
fully functional frontend.

- **Deep Learning:**
Deep learning with CPU and GPU. EfficientNet model for image classification.

- **Classification:**
Support Vector Machines, Decision Trees, AdaBoost, Gradient Boosting,
Random Forest, Logistic Regression, Neural Networks, RBF Networks,
Maximum Entropy Classifier, KNN, Naïve Bayesian,
Fisher/Linear/Quadratic/Regularized Discriminant Analysis.

- **Regression:**
Support Vector Regression, Gaussian Process, Regression Trees,
Gradient Boosting, Random Forest, RBF Networks, OLS, LASSO, ElasticNet,
Ridge Regression.

- **Feature Selection:**
Genetic Algorithm based Feature Selection, Ensemble Learning based Feature
Selection, TreeSHAP, Signal Noise ratio, Sum Squares ratio.

- **Clustering:**
BIRCH, CLARANS, DBSCAN, DENCLUE, Deterministic Annealing, K-Means,
X-Means, G-Means, Neural Gas, Growing Neural Gas, Hierarchical
Clustering, Sequential Information Bottleneck, Self-Organizing Maps,
Spectral Clustering, Minimum Entropy Clustering.

- **Association Rule & Frequent Itemset Mining:**
FP-growth mining algorithm.

- **Manifold Learning:**
IsoMap, LLE, Laplacian Eigenmap, t-SNE, UMAP, PCA, Kernel PCA,
Probabilistic PCA, GHA, Random Projection, ICA.

- **Multi-Dimensional Scaling:**
Classical MDS, Isotonic MDS, Sammon Mapping.

- **Nearest Neighbor Search:**
BK-Tree, Cover Tree, KD-Tree, SimHash, LSH.

- **Sequence Learning:**
Hidden Markov Model, Conditional Random Field.

- **Natural Language Processing:**
Sentence Splitter and Tokenizer, Bigram Statistical Test, Phrase Extractor,
Keyword Extractor, Stemmer, POS Tagging, Relevance Ranking

## License ##
SMILE employs a dual license model designed to meet the development
and distribution needs of both commercial distributors (such as OEMs,
ISVs and VARs) and open source projects. For details, please see
[LICENSE](https://github.com/haifengl/smile/blob/master/LICENSE).
To acquire a commercial license, please contact smile.sales@outlook.com.

## Issues/Discussions ##

* **Discussion/Questions**:
If you wish to ask questions about SMILE, we're active on
[GitHub Discussions](https://github.com/haifengl/smile/discussions) and
[Stack Overflow](http://stackoverflow.com/questions/tagged/smile).

* **Docs**:
SMILE is well documented and [our docs are available online](https://haifengl.github.io/), where you can find tutorial,
programming guides, and more information. If you'd like to help improve the docs, they're part of this repository
in the `web/src` directory. [Java Docs](https://haifengl.github.io/api/java/index.html),
[Scala Docs](https://haifengl.github.io/api/scala/index.html), [Kotlin Docs](https://haifengl.github.io/api/kotlin/index.html),
and [Clojure Docs](https://haifengl.github.io/api/clojure/index.html) are also available.

* **Issues/Feature Requests**:
Finally, any bugs or features, please report to our [issue tracker](https://github.com/haifengl/smile/issues/new).

## Installation ##
You can use the libraries through Maven central repository by adding the
following to your project pom.xml file.
```

com.github.haifengl
smile-core
5.1.0

```

For deep learning and NLP, use the artifactId `smile-deep` and `smile-nlp`, respectively.

For Scala API, please add the below into your sbt script.
```
libraryDependencies += "com.github.haifengl" %% "smile-scala" % "5.1.0"
```

For Kotlin API, add the below into the `dependencies` section
of Gradle build script.
```
implementation("com.github.haifengl:smile-kotlin:5.1.0")
```

Some algorithms rely on BLAS and LAPACK (e.g. manifold learning,
some clustering algorithms, Gaussian Process regression, MLP, etc.).
To use these algorithms in SMILE v5.x, you should install OpenBLAS and ARPACK
for optimized matrix computation. For Windows, you can find the pre-built
DLL files from the `bin` directory of release packages. Make sure to add this
directory to PATH environment variable.

To install on Linux (e.g., Ubuntu), run
```shell script
sudo apt update
sudo apt install libopenblas-dev libarpack2
```

On Mac, we use the BLAS library from the Accelerate framework provided by macOS.
But you should install ARPACK by running
```shell script
brew install arpack
```
However, macOS System Integrity Protection (SIP) significantly impacts how
JVM handles dynamic library loading by purging dynamic linker (DYLD)
environment variables like DYLD_LIBRARY_PATH when launching protected processes.
A simple workaround is to copy /opt/homebrew/lib/libarpack.dylib to your working
directory so that JVM can successfully load it.

For SMILE v4.x, OpenBLAS and ARPACK libraries can be added to your project with
the following dependencies.
```
libraryDependencies ++= Seq(
"org.bytedeco" % "javacpp" % "1.5.11" classifier "macosx-arm64" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64",
"org.bytedeco" % "openblas" % "0.3.28-1.5.11" classifier "macosx-arm64" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64",
"org.bytedeco" % "arpack-ng" % "3.9.1-1.5.11" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64"
)
```
In this example, we include all supported 64-bit platforms and filter out
32-bit platforms. The user should include only the needed platforms to save
spaces.

## Studio ##
SMILE Studio is an interactive desktop application to help you be more
productive in building and serving models with SMILE. Similar to Jupyter
Notebooks, SMILE Studio is a REPL (Read-Evaluate-Print-Loop) containing
an ordered list of input/output cells.

Download pre-packaged SMILE from the
[releases page](https://github.com/haifengl/smile/releases).
After unziping the package and cd into the `bin` directory of SMILE
in a terminal, type
```shell script
./smile
```
to enter SMILE Studio. If you work in a headless environment without
graphical interface, you may run `./smile shell` to enter SMILE Shell
for Java, which pre-imports all major SMILE packages. If you prefer
Scala, type `./smile scala` to enter SMILE Shell for Scala.

By default, the Studio/Shell uses up to 4GB memory. If you need more memory
to handle large data, use the option `-J-Xmx` or `-XX:MaxRAMPercentage`.
For example,
```shell script
./smile -J-Xmx30G
```
You can also modify the configuration file `conf/smile.ini` for the
memory and other JVM settings.

## Model Serialization ##
Most models support the Java `Serializable` interface (all classifiers
do support `Serializable` interface) so that you can serialze a model
and ship it to a production environment for inference. You may also
use serialized models in other systems such as Spark.

## Visualization ##
A picture is worth a thousand words. In machine learning, we usually handle
high-dimensional data, which is impossible to draw on display directly.
But a variety of statistical plots are tremendously valuable for us to grasp
the characteristics of many data points. SMILE provides data visualization tools
such as plots and maps for researchers to understand information more easily and quickly.
To use `smile-plot`, add the following to dependencies
```

com.github.haifengl
smile-plot
5.1.0

```

On Swing-based systems, the user may leverage `smile.plot.swing` package to
create a variety of plots such as scatter plot, line plot, staircase plot,
bar plot, box plot, histogram, 3D histogram, dendrogram, heatmap, hexmap,
QQ plot, contour plot, surface, and wireframe.

This library also support data visualization in declarative approach.
With `smile.plot.vega` package, we can create a specification
that describes visualizations as mappings from data to properties
of graphical marks (e.g., points or bars). The specification is
based on [Vega-Lite](https://vega.github.io/vega-lite/). In a web browser,
the Vega-Lite compiler automatically produces visualization components
including axes, legends, and scales. It then determines properties
of these components based on a set of carefully designed rules.

## Contributing ##
Please read the [contributing.md](CONTRIBUTING.md) on how to build and test SMILE.

## Maintainers ##
- Haifeng Li (@haifengl)
- Karl Li (@kklioss)

## Gallery




SPLOM

Scatterplot Matrix








Scatter

Scatter Plot






Heart

Line Plot






Surface

Surface Plot








Scatter

Bar Plot






Box Plot

Box Plot






Histogram

Histogram Heatmap








Rolling

Rolling Average






Map

Geo Map






UMAP

UMAP








Text

Text Plot






Contour

Heatmap with Contour






Hexmap

Hexmap








IsoMap

IsoMap






LLE

LLE






Kernel PCA

Kernel PCA








Neural Network

Neural Network






SVM

SVM






Hierarchical Clustering

Hierarchical Clustering








SOM

SOM






DBSCAN

DBSCAN






Neural Gas

Neural Gas








Wavelet

Wavelet






Mixture

Exponential Family Mixture






Teapot

Teapot Wireframe








Interpolation

Grid Interpolation