https://github.com/arun11299/mining-massive-datasets

Programs written as part of Coursera's MMDS course by Ullman-Rajaraman-Leskovic
https://github.com/arun11299/mining-massive-datasets

Last synced: 3 months ago
JSON representation

Programs written as part of Coursera's MMDS course by Ullman-Rajaraman-Leskovic

Host: GitHub
URL: https://github.com/arun11299/mining-massive-datasets
Owner: arun11299
Created: 2014-11-09T08:27:51.000Z (over 10 years ago)
Default Branch: master
Last Pushed: 2014-11-09T17:56:34.000Z (over 10 years ago)
Last Synced: 2024-12-27T11:14:30.653Z (5 months ago)
Language: Python
Size: 2.12 MB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

        Mining-Massive-Datasets

=======================

Programs written as part of Coursera's MMDS course by Ullman-Rajaraman-Leskovic.

adwords.py :- Given a set of advertisers, their budget and click through rates, find/choose the advertisers, such

that when the budget of one advertiser finishes choose an advertiser that can bring in maximum revenue based on the

click through rate based upon the impressions (which is limited to 101).

lsh/lsh_test.py: This implements the min hashing technique by shingling of the document lines and creating a signature matrix

for the document lines.

This signature matrix is then fed to the LSH (Location Sensitive hashing) algo code, which finds the best matching lines within

the document. The Jaccard similarity is kept around 0.8 (but the code just displays the best matching lines with a 

difference of 1 word).

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/arun11299/mining-massive-datasets

Awesome Lists containing this project

README