Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/mmourafiq/data-analysis

data analysis functions
https://github.com/mmourafiq/data-analysis

Last synced: about 19 hours ago
JSON representation

data analysis functions

Awesome Lists containing this project

README

        

math and data analysis functions
================================

*shingling*
- k-shingles generation
- minhashing

*jaccard similarity*
- jaccard similarity calculation
- jaccard distance calculation
- jaccard conditional comparaison

*adwords problem*
- greedy_adwords
- balance_adwords
- generalized_balance_adwords

*frequency problem*
- items frequency
- the algorithm of savasere, omniescinski and navathe

*graph problem*
- graph construction
- shortest_path
- longest path
- centrality
- independent graphs detection
- clustering_coef
- dijkstra
- dijkstra with heap

*recommendation problem*
- hamming distance
- euclidean distance
- pearson correlation
- tanimoto score
- euclidean similarity
- pearson similarity
- tanimoto similarity
- top similars
- top similar with map reduce
- recommendation user filtred
- recommendation item filtred

*Radix tree*
- insert
- remove
- search
- longest prefix

*Decision tree*
- Divide data
- Gini impurity
- Entropy
- Variance
- Buil tree
- Prune
- Classify
- Draw tree

*Page Rank*

A very simple version/implementation of the page rank algorithm.
- Page rank
- Advanced version of page rank, topic sensitive
- spam farms
- spam farms
- trust rank
- Hiperlink induced topic search
- Map reduce to efficiently calculates the page rank
- Jaccard simiarity to be found in data analysis repo

*Map-Reduce*

Implementation of map reduce, and some examples.
- Map Reduce class
- Estimation of pi number
- Calculation of frequency of Items from multiple files