https://github.com/haifengl/smile
Statistical Machine Intelligence & Learning Engine
https://github.com/haifengl/smile
classification clustering computer-algebra-system computer-vision data-science dataframe deep-learning genetic-algorithm interpolation linear-algebra llm machine-learning manifold-learning multidimensional-scaling nearest-neighbor-search nlp regression statistics visualization wavelet
Last synced: about 2 months ago
JSON representation
Statistical Machine Intelligence & Learning Engine
- Host: GitHub
- URL: https://github.com/haifengl/smile
- Owner: haifengl
- License: other
- Created: 2014-11-20T16:28:12.000Z (over 11 years ago)
- Default Branch: master
- Last Pushed: 2025-05-02T03:03:49.000Z (10 months ago)
- Last Synced: 2025-05-06T19:52:13.510Z (10 months ago)
- Topics: classification, clustering, computer-algebra-system, computer-vision, data-science, dataframe, deep-learning, genetic-algorithm, interpolation, linear-algebra, llm, machine-learning, manifold-learning, multidimensional-scaling, nearest-neighbor-search, nlp, regression, statistics, visualization, wavelet
- Language: Java
- Homepage: https://haifengl.github.io
- Size: 245 MB
- Stars: 6,169
- Watchers: 266
- Forks: 1,140
- Open Issues: 6
-
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- Funding: .github/FUNDING.yml
- License: COPYING
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
- StarryDivineSky - haifengl/smile
- awesome-scala - **smile** - activity/y/haifengl/smile) (Table of Contents / Science and Data Analysis)
- awesome-java-zh - Smile
- fucking-awesome-java - Smile - Statistical Machine Intelligence and Learning Engine provides a set of machine learning algorithms and a visualization library. (Projects / Machine Learning)
- awesome-machine-learning - SmileMiner - Statistical Machine Intelligence & Learning Engine. (Java / Speech Recognition)
- fucking-awesome-scala - **smile** - activity/y/haifengl/smile) (Table of Contents / Science and Data Analysis)
- useful-java-links - Smile - of-art machine learning algorithms. Smile is self contained and requires only Java standard library. [Hello World examples](https://github.com/Vedenin/useful-java-links/tree/master/helloworlds/2.8-machine-learning/smile). License: [Apache 2](http://www.apache.org/licenses/LICENSE-2.0) , [](https://github.com/haifengl/smile).  (II. Databases, search engines, big data and machine learning / 8. Machine Learning)
- awesome-machine-learning - SmileMiner - Statistical Machine Intelligence & Learning Engine. (Java / Speech Recognition)
- awesome-java - Smile - Statistical Machine Intelligence and Learning Engine provides a set of machine learning algorithms and a visualization library. (Projects / Machine Learning)
- awesome-java - Smile
- awesome-jvm - Smile - Statistical Machine Intelligence & Learning Engine. (Machine Learning)
- awesome-advanced-metering-infrastructure - SmileMiner - Statistical Machine Intelligence & Learning Engine. (Java / Speech Recognition)
- awesome-java - Smile - Statistical Machine Intelligence and Learning Engine provides a set of machine learning algorithms and a visualization library. (Projects / Machine Learning)
- awesome-vega - Smile - Scala (Vega-Lite). (Wrappers / Papers)
- my-awesome - haifengl/smile - algebra-system,computer-vision,data-science,dataframe,deep-learning,genetic-algorithm,interpolation,linear-algebra,llm,machine-learning,manifold-learning,multidimensional-scaling,nearest-neighbor-search,nlp,regression,statistics,visualization,wavelet pushed_at:2026-02 star:6.3k fork:1.1k Statistical Machine Intelligence & Learning Engine (Java)
README
# Statistical Machine Intelligence & Learning Engine
[](https://central.sonatype.com/artifact/com.github.haifengl/smile-core)
## Goal ##
SMILE (Statistical Machine Intelligence & Learning Engine) is
a fast and comprehensive machine learning framework in Java.
SMILE v5.x requires Java 25, v4.x requires Java 21, and all previous versions
require Java 8. SMILE also provides APIs in Scala and Kotlin with
corresponding language paradigms. With advanced data structures and
algorithms, SMILE delivers state-of-art performance.
SMILE covers every aspect of machine learning, including deep learning,
large language models, classification, regression, clustering, association
rule mining, feature selection and extraction, manifold learning,
multidimensional scaling, genetic algorithms, missing value imputation,
efficient nearest neighbor search, etc. Furthermore, SMILE also provides
advanced algorithms for graph, linear algebra, numerical analysis,
interpolation, computer algebra system for symbolic manipulations,
and data visualization.
## Features ##
SMILE implements the following major machine learning algorithms:
- **LLM:**
Native Java implementation of Llama 3.1, tiktoken tokenizer, high performance
LLM inference server with OpenAI-compatible APIs and SSE-based chat streaming,
fully functional frontend.
- **Deep Learning:**
Deep learning with CPU and GPU. EfficientNet model for image classification.
- **Classification:**
Support Vector Machines, Decision Trees, AdaBoost, Gradient Boosting,
Random Forest, Logistic Regression, Neural Networks, RBF Networks,
Maximum Entropy Classifier, KNN, Naïve Bayesian,
Fisher/Linear/Quadratic/Regularized Discriminant Analysis.
- **Regression:**
Support Vector Regression, Gaussian Process, Regression Trees,
Gradient Boosting, Random Forest, RBF Networks, OLS, LASSO, ElasticNet,
Ridge Regression.
- **Feature Selection:**
Genetic Algorithm based Feature Selection, Ensemble Learning based Feature
Selection, TreeSHAP, Signal Noise ratio, Sum Squares ratio.
- **Clustering:**
BIRCH, CLARANS, DBSCAN, DENCLUE, Deterministic Annealing, K-Means,
X-Means, G-Means, Neural Gas, Growing Neural Gas, Hierarchical
Clustering, Sequential Information Bottleneck, Self-Organizing Maps,
Spectral Clustering, Minimum Entropy Clustering.
- **Association Rule & Frequent Itemset Mining:**
FP-growth mining algorithm.
- **Manifold Learning:**
IsoMap, LLE, Laplacian Eigenmap, t-SNE, UMAP, PCA, Kernel PCA,
Probabilistic PCA, GHA, Random Projection, ICA.
- **Multi-Dimensional Scaling:**
Classical MDS, Isotonic MDS, Sammon Mapping.
- **Nearest Neighbor Search:**
BK-Tree, Cover Tree, KD-Tree, SimHash, LSH.
- **Sequence Learning:**
Hidden Markov Model, Conditional Random Field.
- **Natural Language Processing:**
Sentence Splitter and Tokenizer, Bigram Statistical Test, Phrase Extractor,
Keyword Extractor, Stemmer, POS Tagging, Relevance Ranking
## License ##
SMILE employs a dual license model designed to meet the development
and distribution needs of both commercial distributors (such as OEMs,
ISVs and VARs) and open source projects. For details, please see
[LICENSE](https://github.com/haifengl/smile/blob/master/LICENSE).
To acquire a commercial license, please contact smile.sales@outlook.com.
## Issues/Discussions ##
* **Discussion/Questions**:
If you wish to ask questions about SMILE, we're active on
[GitHub Discussions](https://github.com/haifengl/smile/discussions) and
[Stack Overflow](http://stackoverflow.com/questions/tagged/smile).
* **Docs**:
SMILE is well documented and [our docs are available online](https://haifengl.github.io/), where you can find tutorial,
programming guides, and more information. If you'd like to help improve the docs, they're part of this repository
in the `web/src` directory. [Java Docs](https://haifengl.github.io/api/java/index.html),
[Scala Docs](https://haifengl.github.io/api/scala/index.html), [Kotlin Docs](https://haifengl.github.io/api/kotlin/index.html),
and [Clojure Docs](https://haifengl.github.io/api/clojure/index.html) are also available.
* **Issues/Feature Requests**:
Finally, any bugs or features, please report to our [issue tracker](https://github.com/haifengl/smile/issues/new).
## Installation ##
You can use the libraries through Maven central repository by adding the
following to your project pom.xml file.
```
com.github.haifengl
smile-core
5.1.0
```
For deep learning and NLP, use the artifactId `smile-deep` and `smile-nlp`, respectively.
For Scala API, please add the below into your sbt script.
```
libraryDependencies += "com.github.haifengl" %% "smile-scala" % "5.1.0"
```
For Kotlin API, add the below into the `dependencies` section
of Gradle build script.
```
implementation("com.github.haifengl:smile-kotlin:5.1.0")
```
Some algorithms rely on BLAS and LAPACK (e.g. manifold learning,
some clustering algorithms, Gaussian Process regression, MLP, etc.).
To use these algorithms in SMILE v5.x, you should install OpenBLAS and ARPACK
for optimized matrix computation. For Windows, you can find the pre-built
DLL files from the `bin` directory of release packages. Make sure to add this
directory to PATH environment variable.
To install on Linux (e.g., Ubuntu), run
```shell script
sudo apt update
sudo apt install libopenblas-dev libarpack2
```
On Mac, we use the BLAS library from the Accelerate framework provided by macOS.
But you should install ARPACK by running
```shell script
brew install arpack
```
However, macOS System Integrity Protection (SIP) significantly impacts how
JVM handles dynamic library loading by purging dynamic linker (DYLD)
environment variables like DYLD_LIBRARY_PATH when launching protected processes.
A simple workaround is to copy /opt/homebrew/lib/libarpack.dylib to your working
directory so that JVM can successfully load it.
For SMILE v4.x, OpenBLAS and ARPACK libraries can be added to your project with
the following dependencies.
```
libraryDependencies ++= Seq(
"org.bytedeco" % "javacpp" % "1.5.11" classifier "macosx-arm64" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64",
"org.bytedeco" % "openblas" % "0.3.28-1.5.11" classifier "macosx-arm64" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64",
"org.bytedeco" % "arpack-ng" % "3.9.1-1.5.11" classifier "macosx-x86_64" classifier "windows-x86_64" classifier "linux-x86_64"
)
```
In this example, we include all supported 64-bit platforms and filter out
32-bit platforms. The user should include only the needed platforms to save
spaces.
## Studio ##
SMILE Studio is an interactive desktop application to help you be more
productive in building and serving models with SMILE. Similar to Jupyter
Notebooks, SMILE Studio is a REPL (Read-Evaluate-Print-Loop) containing
an ordered list of input/output cells.
Download pre-packaged SMILE from the
[releases page](https://github.com/haifengl/smile/releases).
After unziping the package and cd into the `bin` directory of SMILE
in a terminal, type
```shell script
./smile
```
to enter SMILE Studio. If you work in a headless environment without
graphical interface, you may run `./smile shell` to enter SMILE Shell
for Java, which pre-imports all major SMILE packages. If you prefer
Scala, type `./smile scala` to enter SMILE Shell for Scala.
By default, the Studio/Shell uses up to 4GB memory. If you need more memory
to handle large data, use the option `-J-Xmx` or `-XX:MaxRAMPercentage`.
For example,
```shell script
./smile -J-Xmx30G
```
You can also modify the configuration file `conf/smile.ini` for the
memory and other JVM settings.
## Model Serialization ##
Most models support the Java `Serializable` interface (all classifiers
do support `Serializable` interface) so that you can serialze a model
and ship it to a production environment for inference. You may also
use serialized models in other systems such as Spark.
## Visualization ##
A picture is worth a thousand words. In machine learning, we usually handle
high-dimensional data, which is impossible to draw on display directly.
But a variety of statistical plots are tremendously valuable for us to grasp
the characteristics of many data points. SMILE provides data visualization tools
such as plots and maps for researchers to understand information more easily and quickly.
To use `smile-plot`, add the following to dependencies
```
com.github.haifengl
smile-plot
5.1.0
```
On Swing-based systems, the user may leverage `smile.plot.swing` package to
create a variety of plots such as scatter plot, line plot, staircase plot,
bar plot, box plot, histogram, 3D histogram, dendrogram, heatmap, hexmap,
QQ plot, contour plot, surface, and wireframe.
This library also support data visualization in declarative approach.
With `smile.plot.vega` package, we can create a specification
that describes visualizations as mappings from data to properties
of graphical marks (e.g., points or bars). The specification is
based on [Vega-Lite](https://vega.github.io/vega-lite/). In a web browser,
the Vega-Lite compiler automatically produces visualization components
including axes, legends, and scales. It then determines properties
of these components based on a set of carefully designed rules.
## Contributing ##
Please read the [contributing.md](CONTRIBUTING.md) on how to build and test SMILE.
## Maintainers ##
- Haifeng Li (@haifengl)
- Karl Li (@kklioss)
## Gallery
Scatterplot Matrix
Scatter Plot
Line Plot
Surface Plot
Bar Plot
Box Plot
Histogram Heatmap
Rolling Average
Geo Map
UMAP
Text Plot
Heatmap with Contour
Hexmap
IsoMap
LLE
Kernel PCA
Neural Network
SVM
Hierarchical Clustering
SOM
DBSCAN
Neural Gas
Wavelet
Exponential Family Mixture
Teapot Wireframe
Grid Interpolation
