https://github.com/louisguitton/twitter-mining
Code of the project for the course "TPT32: mining of massive datasets"
https://github.com/louisguitton/twitter-mining
Last synced: 6 months ago
JSON representation
Code of the project for the course "TPT32: mining of massive datasets"
- Host: GitHub
- URL: https://github.com/louisguitton/twitter-mining
- Owner: louisguitton
- Created: 2015-11-18T14:05:59.000Z (almost 10 years ago)
- Default Branch: master
- Last Pushed: 2016-03-23T16:06:21.000Z (over 9 years ago)
- Last Synced: 2025-02-03T17:07:19.129Z (8 months ago)
- Language: Java
- Homepage:
- Size: 3.1 MB
- Stars: 0
- Watchers: 3
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# twitter-mining
1. Data Cleaning and Preprocessing.
---------------You will be given a collection of raw data from Twitter
in the JSON format. This data is usually quite noisy, for example there are many copies of
a same tweet, while each tweet might contain text which is not relevant for our purposes.
After cleaning the data, you will build a graph representing the co-occurrences of relevant
“entities” in tweets.2. Finding Dense Subgraphs
---------------extracting dense subgraphs from the latter graph. To this end, you should adapt the
sequential greedy algorithm for finding dense subgraphs (which we presented during our
class on Wednesday) so as to 1) deal with weighted graphs, 2) find “small” subgraphs 3)
finding more than one dense subgraph, 4) deal with large graph, in particular try to give
a linear implementation (in the size of the input) of the algorithm.3. Data Analysis
------------------analyze the dense subgraphs that you found so as to find “interesting” information.