Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/lanl/dnmfk

A C++ framework of Distributed Non-Negative Matrix Factorization implementation to find Latent Dimensionality in Big Data
https://github.com/lanl/dnmfk

artificial-intelligence clustering latent-features machine-learning-algorithms non-negative-matrix-factorization non-negative-tensor-factorization pattern-recognition

Last synced: 25 days ago
JSON representation

A C++ framework of Distributed Non-Negative Matrix Factorization implementation to find Latent Dimensionality in Big Data

Awesome Lists containing this project

README

        

# Distributed Non-Negative Matrix Factorization with Model Determination (DnMFkCPP)

The holistic analysis and understanding of the latent (that is, not-directly observable) variables and patterns buried in large datasets is crucial for data-driven science, decision making and emergency response. Such exploratory analyses require devising unsupervised learning methods for data mining and extraction of the latent features, and non-negative matrix factorization (NMF) is one of the prominent such methods to extract interpretable latent features, with dimensionality reduction for data mining and blind source separation. NMF is based on compute-intense non-convex constrained minimization, which, for large datasets requires fast and distributed algorithms. In practice, identifying the latent features is both difficult and significant for pattern recognition and latent feature analysis, especially for large dense matrices. This software suite introduces a distributed NMF algorithm coupled with distributed custom clustering followed by a stability analysis on dense data, which we call DnMFkCPP, to determine the number of latent variables.

-

Chennupati, G., Vangara, R., Skau, E., Djidjev, H., & Alexandrov, B. (2020). Distributed non-negative matrix factorization with determination of the number of latent features. The Journal of Supercomputing, 76(9), 7458–7488. https://doi.org/10.1007/s11227-020-03181-6


-
Bhattarai, M., Chennupati, G., Skau, E., Vangara, R., Djidjev, H., & Alexandrov, B. S. (2020). Distributed Non-Negative Tensor Train Decomposition. 2020 IEEE High Performance Extreme Computing Conference (HPEC), 1–10. https://doi.org/10.1109/HPEC43674.2020.9286234


-
Nebgen, B., Vangara, R., Hombrados-Herrera, M. A., Kuksova, S., & Alexandrov, B. (2020). A neural network for determination of latent dimensionality in Nonnegative Matrix Factorization. Machine Learning: Science and Technology. https://doi.org/10.1088/2632-2153/aba372


## Contributors

* [Gopinath Chennupati](mailto:[email protected]) - Los Alamos National Laboratory
* [Raviteja Vangara](mailto:[email protected]) - Los Alamos National Laboratory
* [Erik Skau](mailto:[email protected]) - Los Alamos National Laboratory
* [Namita Kharat](mailto:[email protected]) - Los Alamos National Laboratory
* [Phan Minh Duc Truong](mailto:[email protected]) - Los Alamos National Laboratory
* [Hristo Djidjev](mailto:[email protected])- Los Alamos National Laboratory
* [Boian Alexandrov](mailto:[email protected]) - Los Alamos National Laboratory

## Build Procedure and Experiments

* For building Distributed Non-negative Matrix Factorization **_k_**, refer [README](distnmfk/README.md) under distnmfk directory.
* For instructions to run experiments for Distributed Non-negative Matrix Factorization **_k_**, refer [README Experiments](distnmfk/experiments/README_EXPS.md) under distnmfk/experiments directory.

## Acknowledgements
This study was funded by U.S. Department of Energy National Nuclear Security Administration under Contract No. DE-AC52-06NA25396 through Los Alamos National Laboratory's Laboratory Directed Research and Development (LDRD) grant 20190020DR.

This software is extension to the distributed NMF repository [planc](https://github.com/ramkikannan/planc) by Ramakrishnan Kannan et al. The license and readme information can be found in [planc-master](planc-master/). Please fidn the following references:

* Ramakrishnan Kannan, Grey Ballard, and Haesun Park. 2016. A high-performance parallel algorithm for nonnegative matrix factorization. In Proceedings of the 21st ACM SIGPLAN Symposium on Principles and Practice of Parallel Programming (PPoPP '16). ACM, New York, NY, USA, , Article 9 , 11 pages. DOI: http://dx.doi.org/10.1145/2851141.2851152
* James P. Fairbanks, Ramakrishnan Kannan, Haesun Park, David A. Bader, Behavioral clusters in dynamic graphs, Parallel Computing, Volume 47, August 2015, Pages 38-50, ISSN 0167-8191. DOI: http://dx.doi.org/10.1016/j.parco.2015.03.002.
* Kannan, Ramakrishnan. "SCALABLE AND DISTRIBUTED CONSTRAINED LOW RANK APPROXIMATIONS." (Doctoral Disseration) (2016). https://smartech.gatech.edu/handle/1853/54962
* Ramakrishnan Kannan, Grey Ballard, Haesun Park: MPI-FAUN: An MPI-Based Framework for Alternating-Updating Nonnegative Matrix Factorization. IEEE Trans. Knowl. Data Eng. 30(3): 544-558 (2018). DOI: https://doi.org/10.1109/TKDE.2017.2767592
* Oguz Kaya, Ramakrishnan Kannan, Grey Ballard: Partitioning and Communication Strategies for Sparse Non-negative Matrix Factorization. ICPP 2018: 90:1-90:10. DOI: https://doi.org/10.1145/3225058.3225127
* Grey Ballard, Koby Hayashi, Ramakrishnan Kannan: Parallel Nonnegative CP Decomposition of Dense Tensors. 25th {IEEE} International Conference on High Performance Computing(HiPC) 2018. DOI: https://doi.org/10.1109/HiPC.2018.00012

## LANL C Number
LANL C number: C20028.

The Copyright and Licensing information is found in [license.dat](license.dat)
## Copyright
© (or copyright) 2020. Triad National Security, LLC. All rights reserved.
This program was produced under U.S. Government contract 89233218CNA000001 for Los Alamos National Laboratory (LANL), which is operated by Triad National Security, LLC for the U.S.
Department of Energy/National Nuclear Security Administration. All rights in the program are reserved by Triad National Security, LLC, and the U.S. Department of Energy/National Nuclear Security Administration. The Government is granted for itself and others acting on its behalf a nonexclusive, paid-up, irrevocable worldwide license in this material to reproduce, prepare derivative works, distribute copies to the public, perform publicly and display publicly, and to permit others to do so.
## License
This program is open source under the BSD-3 License.
Redistribution and use in source and binary forms, with or without modification, are permitted provided that the following conditions are met:
1. Redistributions of source code must retain the above copyright notice, this list of conditions and the following disclaimer.

2. Redistributions in binary form must reproduce the above copyright notice, this list of conditions and the following disclaimer in the documentation and/or other materials provided with the distribution.

3. Neither the name of the copyright holder nor the names of its contributors may be used to endorse or promote products derived from this software without specific prior written permission.

THIS SOFTWARE IS PROVIDED BY THE COPYRIGHT HOLDERS AND CONTRIBUTORS "AS IS" AND ANY EXPRESS OR IMPLIED WARRANTIES, INCLUDING, BUT NOT LIMITED TO, THE IMPLIED WARRANTIES OF MERCHANTABILITY AND FITNESS FOR A PARTICULAR PURPOSE ARE DISCLAIMED. IN NO EVENT SHALL THE COPYRIGHT HOLDER OR CONTRIBUTORS BE LIABLE FOR ANY DIRECT, INDIRECT, INCIDENTAL, SPECIAL, EXEMPLARY, OR CONSEQUENTIAL DAMAGES (INCLUDING, BUT NOT LIMITED TO, PROCUREMENT OF SUBSTITUTE GOODS OR SERVICES; LOSS OF USE, DATA, OR PROFITS; OR BUSINESS INTERRUPTION) HOWEVER CAUSED AND ON ANY THEORY OF LIABILITY, WHETHER IN CONTRACT, STRICT LIABILITY, OR TORT (INCLUDING NEGLIGENCE OR OTHERWISE) ARISING IN ANY WAY OUT OF THE USE OF THIS SOFTWARE, EVEN IF ADVISED OF THE POSSIBILITY OF SUCH DAMAGE.