https://github.com/louisguitton/twitter-mining

Code of the project for the course "TPT32: mining of massive datasets"
https://github.com/louisguitton/twitter-mining

Last synced: 6 months ago
JSON representation

Code of the project for the course "TPT32: mining of massive datasets"

Host: GitHub
URL: https://github.com/louisguitton/twitter-mining
Owner: louisguitton
Created: 2015-11-18T14:05:59.000Z (almost 10 years ago)
Default Branch: master
Last Pushed: 2016-03-23T16:06:21.000Z (over 9 years ago)
Last Synced: 2025-02-03T17:07:19.129Z (8 months ago)
Language: Java
Homepage:
Size: 3.1 MB
Stars: 0
Watchers: 3
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# twitter-mining

1. Data Cleaning and Preprocessing.
---------------

You will be given a collection of raw data from Twitter
in the JSON format. This data is usually quite noisy, for example there are many copies of
a same tweet, while each tweet might contain text which is not relevant for our purposes.
After cleaning the data, you will build a graph representing the co-occurrences of relevant
“entities” in tweets.

2. Finding Dense Subgraphs
---------------

extracting dense subgraphs from the latter graph. To this end, you should adapt the
sequential greedy algorithm for finding dense subgraphs (which we presented during our
class on Wednesday) so as to 1) deal with weighted graphs, 2) find “small” subgraphs 3)
finding more than one dense subgraph, 4) deal with large graph, in particular try to give
a linear implementation (in the size of the input) of the algorithm.

3. Data Analysis
------------------

analyze the dense subgraphs that you found so as to find “interesting” information.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/louisguitton/twitter-mining

Awesome Lists containing this project

README