{"id":15208919,"url":"https://github.com/archivesunleashed/twut","last_synced_at":"2025-10-29T12:31:41.504Z","repository":{"id":56327999,"uuid":"224873214","full_name":"archivesunleashed/twut","owner":"archivesunleashed","description":"An open-source toolkit for analyzing line-oriented JSON Twitter archives with Apache Spark.","archived":false,"fork":false,"pushed_at":"2024-12-11T21:11:59.000Z","size":468,"stargazers_count":9,"open_issues_count":0,"forks_count":2,"subscribers_count":4,"default_branch":"main","last_synced_at":"2025-02-02T01:31:56.725Z","etag":null,"topics":["apache-spark","spark","spark-packages","tweets","twitter-data","twitter-json"],"latest_commit_sha":null,"homepage":"","language":"Scala","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/archivesunleashed.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-11-29T14:52:12.000Z","updated_at":"2024-12-11T21:12:03.000Z","dependencies_parsed_at":"2023-10-20T17:31:31.078Z","dependency_job_id":null,"html_url":"https://github.com/archivesunleashed/twut","commit_stats":null,"previous_names":[],"tags_count":3,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/archivesunleashed%2Ftwut","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/archivesunleashed%2Ftwut/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/archivesunleashed%2Ftwut/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/archivesunleashed%2Ftwut/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/archivesunleashed","download_url":"https://codeload.github.com/archivesunleashed/twut/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":238825755,"owners_count":19537119,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apache-spark","spark","spark-packages","tweets","twitter-data","twitter-json"],"created_at":"2024-09-28T07:04:46.916Z","updated_at":"2025-10-29T12:31:41.062Z","avatar_url":"https://github.com/archivesunleashed.png","language":"Scala","funding_links":[],"categories":["Tools \u0026 Software"],"sub_categories":["Analysis"],"readme":"# Tweet Archives Unleashed Toolkit (twut)\n\n[![Maven Central](https://maven-badges.herokuapp.com/maven-central/io.archivesunleashed/twut/badge.svg)](https://maven-badges.herokuapp.com/maven-central/io.archivesunleashed/twut)\n[![LICENSE](https://img.shields.io/badge/license-Apache-blue.svg?style=flat)](https://www.apache.org/licenses/LICENSE-2.0)\n[![Contribution Guidelines](http://img.shields.io/badge/CONTRIBUTING-Guidelines-blue.svg)](./CONTRIBUTING.md)\n\nAn open-source toolkit for analyzing line-oriented JSON data from the Twitter v1.1 API or flattened line-oriented JSON data from the Twitter v2 API using Apache Spark.\n\n## Dependencies\n\n- Java 8 or 11\n- Python 3\n- [Apache Spark](https://spark.apache.org/downloads.html)\n\n## Getting Started\n\nTo get started with `twut`, you can either use it directly from Maven or download the JAR and ZIP files for Spark or PySpark.\n\n### Using the Spark Shell\n\nTo use `twut` with Apache Spark, you can use the following command to include the package:\n\n```\n$ spark-shell --packages \"io.archivesunleashed:twut:1.1.0\"\n```\n\nAlternatively, you can download the JAR file from the [latest release](https://github.com/archivesunleashed/twut/releases) and include it manually:\n\n```\n$ spark-shell --jars /path/to/twut-1.1.0-fatjar.jar\n```\n\n### Using PySpark\n\nFor Python users, download the ZIP file from the [latest release](https://github.com/archivesunleashed/twut/releases) and include it in your PySpark environment:\n\n```\n$ pyspark --py-files /path/to/twut-1.1.0.zip\n```\n\nYou will also need to set the `PYSPARK_PYTHON` and `PYSPARK_DRIVER_PYTHON` environment variables.\n\n## Documentation and Tutorials\n\nAfter you have `twut` built or downloaded, you can follow the basic set of recipes and tutorials [here](https://github.com/archivesunleashed/twut/tree/main/docs/usage.md).\n\n## License\n\nLicensed under the [Apache License, Version 2.0](http://www.apache.org/licenses/LICENSE-2.0).\n\n## Acknowledgments\n\nThis work is primarily supported by the [Andrew W. Mellon Foundation](https://mellon.org/). Other financial and in-kind support comes from the [Social Sciences and Humanities Research Council](http://www.sshrc-crsh.gc.ca/), [Compute Canada](https://www.computecanada.ca/), the [Ontario Ministry of Research, Innovation, and Science](https://www.ontario.ca/page/ministry-research-innovation-and-science), [York University Libraries](https://www.library.yorku.ca/web/), [Start Smart Labs](http://www.startsmartlabs.com/), and the [Faculty of Arts](https://uwaterloo.ca/arts/) and [David R. Cheriton School of Computer Science](https://cs.uwaterloo.ca/) at the [University of Waterloo](https://uwaterloo.ca/).\n\nAny opinions, findings, and conclusions or recommendations expressed are those of the researchers and do not necessarily reflect the views of the sponsors.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farchivesunleashed%2Ftwut","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Farchivesunleashed%2Ftwut","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Farchivesunleashed%2Ftwut/lists"}