https://github.com/dsfsi/project-state-capture
Zondo Commission or State Capture Commission Transcripts
https://github.com/dsfsi/project-state-capture
dsfsi-datasets natural-language-processing nlp south-africa
Last synced: 7 months ago
JSON representation
Zondo Commission or State Capture Commission Transcripts
- Host: GitHub
- URL: https://github.com/dsfsi/project-state-capture
- Owner: dsfsi
- License: cc-by-sa-4.0
- Created: 2021-03-18T06:44:39.000Z (over 4 years ago)
- Default Branch: main
- Last Pushed: 2023-10-26T07:19:14.000Z (almost 2 years ago)
- Last Synced: 2025-01-21T11:17:23.617Z (9 months ago)
- Topics: dsfsi-datasets, natural-language-processing, nlp, south-africa
- Homepage:
- Size: 50.3 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 3
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# South African State Capture Commision Transcripts - Zondo Commission
Give Feedback 📑: [DSFSI Resource Feedback Form](https://docs.google.com/forms/d/e/1FAIpQLSf7S36dyAUPx2egmXbFpnTBuzoRulhL5Elu-N1eoMhaO7v10w/formResponse)
## About State Capture Comission
The Judicial Commission of Inquiry into Allegations of State Capture, Corruption and Fraud in the Public Sector including Organs of State, better known as the Zondo Commission or State Capture Commission, is a public inquiry established in January 2018 by former President Jacob Zuma to investigate allegations of state capture, corruption, and fraud in the public sector in South Africa.[2][3]
Source: [https://en.wikipedia.org/wiki/Zondo_Commission](https://en.wikipedia.org/wiki/Zondo_Commission)
## About Dataset
We extracted plaintext versions of thhe published transcripts (from [https://www.statecapture.org.za/site/transcripts](https://www.statecapture.org.za/site/transcripts). There is minimal clearning but we believe these can be sued for textual analysis.
| file/folder | description| url |
|-----------------|-----|---------------|
| data/interim | Folder with individuaual *.txt* files of extracted transcripts by day. | [/data/interim/](/data/interim/) |
| state-capture-transcripts-day-1-399.txt.zip | zip file wiht all transcripts. | [state-capture-transcripts-day-1-399.txt.zip](/data/state-capture-transcripts-day-1-399.txt.zip)|## TODOs
* Clean up the data
* Extract sentences
* Tag conversations by who is talking (speaker)## Authors
* **Tsholofelo Gomba**
* **Vukosi Marivate** - [@vukosi](https://twitter.com/vukosi)See also the list of [contributors](https://github.com/dsfsi/project-state-capture/contributors) who participated in this project.
## Citation
TBA
## License
Data is Licensed under CC 4.0 BY SA
Code is Licences under MIT License.