Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/yinleon/links-as-data

This is an example of using Links as data
https://github.com/yinleon/links-as-data

Last synced: 25 days ago
JSON representation

This is an example of using Links as data

Awesome Lists containing this project

README

        

# Links as Data
Leon Yin 2018-10-01

## Introduction
This repo contains a Jupyter notebook (view it here on Github, or here on NBViewer) which demos how to extract URLs from Twitter data, preprocess and expand them, and perform rudimentary data analysis using supervised and unsupervied machine learning. The notebook and some auxilliary files are in the `nbs` (short for notebooks) directory. Where `congress-links.ipnb` is the main notebook, `download_data.py` is a script to download the raw and intermediate data used in the notebook, and `config.py` contains gloval variables for where data is stored locally.

The notebook can be viewed as [slides](http://www.leonyin.org/presentations/congress-links-slides.slides.html).

## Requirements
This notebook is intended to be run with Python 3.6 or above, utilizing default packages as well as open source packages in `requirements.txt`.

Please download them using:
```pip install -R requirements.txt```

These resources can be run either on NYU HPC, or locally on your personal machine.
You may want to take a look at `config.py` for an idea of where data is coming from and where it wil be sotred. From the nbs directory, run the `python download_data.py` to download the data necessary to run this tutorial.

## Installation
This is a public repository, clone the repo!

If you use the Terminal or another command line client, you can do so with:
```git clone [email protected]:yinleon/links-as-data.git```
In whatever working directory you choose.