https://github.com/archivesunleashed/twut

An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.
https://github.com/archivesunleashed/twut

apache-spark spark spark-packages tweets twitter-data twitter-json

Last synced: 9 months ago
JSON representation

An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.

Host: GitHub
URL: https://github.com/archivesunleashed/twut
Owner: archivesunleashed
License: apache-2.0
Created: 2019-11-29T14:52:12.000Z (over 6 years ago)
Default Branch: main
Last Pushed: 2024-12-11T21:11:59.000Z (over 1 year ago)
Last Synced: 2025-02-02T01:31:56.725Z (over 1 year ago)
Topics: apache-spark, spark, spark-packages, tweets, twitter-data, twitter-json
Language: Scala
Homepage:
Size: 457 KB
Stars: 9
Watchers: 4
Forks: 2
Open Issues: 0
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md

Awesome Lists containing this project

webarchiving-awesome-graph - Tweet Archvies Unleashed Toolkit - An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark. 💽 ⭐ 10 👀 3 (Tools & Software / Analysis)
awesome-web-archiving - Tweet Archvies Unleashed Toolkit - An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark. *(In Development)* (Tools & Software / Analysis)

README

          # Tweet Archives Unleashed Toolkit (twut)

[![Maven Central](https://maven-badges.herokuapp.com/maven-central/io.archivesunleashed/twut/badge.svg)](https://maven-badges.herokuapp.com/maven-central/io.archivesunleashed/twut)

[![LICENSE](https://img.shields.io/badge/license-Apache-blue.svg?style=flat)](https://www.apache.org/licenses/LICENSE-2.0)

[![Contribution Guidelines](http://img.shields.io/badge/CONTRIBUTING-Guidelines-blue.svg)](./CONTRIBUTING.md)

An open-source toolkit for analyzing line-oriented JSON data from the Twitter v1.1 API or flattened line-oriented JSON data from the Twitter v2 API using Apache Spark.

## Dependencies

- Java 8 or 11

- Python 3

- [Apache Spark](https://spark.apache.org/downloads.html)

## Getting Started

To get started with `twut`, you can either use it directly from Maven or download the JAR and ZIP files for Spark or PySpark.

### Using the Spark Shell

To use `twut` with Apache Spark, you can use the following command to include the package:

```

$ spark-shell --packages "io.archivesunleashed:twut:1.1.0"

```

Alternatively, you can download the JAR file from the [latest release](https://github.com/archivesunleashed/twut/releases) and include it manually:

```

$ spark-shell --jars /path/to/twut-1.1.0-fatjar.jar

```

### Using PySpark

For Python users, download the ZIP file from the [latest release](https://github.com/archivesunleashed/twut/releases) and include it in your PySpark environment:

```

$ pyspark --py-files /path/to/twut-1.1.0.zip

```

You will also need to set the `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables.

## Documentation and Tutorials

After you have `twut` built or downloaded, you can follow the basic set of recipes and tutorials [here](https://github.com/archivesunleashed/twut/tree/main/docs/usage.md).

## License

Licensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).

## Acknowledgments

This work is primarily supported by the [Andrew W. Mellon Foundation](https://mellon.org/). Other financial and in-kind support comes from the [Social Sciences and Humanities Research Council](http://www.sshrc-crsh.gc.ca/), [Compute Canada](https://www.computecanada.ca/), the [Ontario Ministry of Research, Innovation, and Science](https://www.ontario.ca/page/ministry-research-innovation-and-science), [York University Libraries](https://www.library.yorku.ca/web/), [Start Smart Labs](http://www.startsmartlabs.com/), and the [Faculty of Arts](https://uwaterloo.ca/arts/) and [David R. Cheriton School of Computer Science](https://cs.uwaterloo.ca/) at the [University of Waterloo](https://uwaterloo.ca/).

Any opinions, findings, and conclusions or recommendations expressed are those of the researchers and do not necessarily reflect the views of the sponsors.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/archivesunleashed/twut

Awesome Lists containing this project

README