{"id":21905038,"url":"https://github.com/textcorpuslabs/getting-started","last_synced_at":"2025-03-22T07:14:51.139Z","repository":{"id":174805652,"uuid":"285281071","full_name":"TextCorpusLabs/getting-started","owner":"TextCorpusLabs","description":"Getting started at Text Corpus Labs","archived":false,"fork":false,"pushed_at":"2020-11-19T17:16:07.000Z","size":10,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-27T07:27:29.164Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":"","language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/TextCorpusLabs.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-08-05T12:26:27.000Z","updated_at":"2020-11-19T18:49:01.000Z","dependencies_parsed_at":null,"dependency_job_id":"da47a529-6b9d-4054-922b-edb005c7918b","html_url":"https://github.com/TextCorpusLabs/getting-started","commit_stats":null,"previous_names":["textcorpuslabs/getting-started"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TextCorpusLabs%2Fgetting-started","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TextCorpusLabs%2Fgetting-started/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TextCorpusLabs%2Fgetting-started/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/TextCorpusLabs%2Fgetting-started/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/TextCorpusLabs","download_url":"https://codeload.github.com/TextCorpusLabs/getting-started/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244918710,"owners_count":20531686,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-28T16:20:23.877Z","updated_at":"2025-03-22T07:14:51.133Z","avatar_url":"https://github.com/TextCorpusLabs.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"![MIT license](https://img.shields.io/badge/License-MIT-green.svg)\n\nThe below is an explanation of what and how we work in Text Corpus Labs.\nIt serves as a reminder of the team's global research goals.\n\n# Raison d'être\n\nWe are a collection of researchers focused on _collecting_ different modes of human communication through text.\nWe want to share our work and ways of working with the broader academic community.\n\nTo this end we:\n\n* Create guidance on how to standardize the format of a text corpus.\n  All the members of our lab have come to an understanding as to how a text corpus should look _prior_ to being analyzed.\n* Create processes to automate the collection of text corpora from _existing_ resources.\n  Scraping and parsing can be challenging at times.\n  Our goals are to allow the reuse of a text corpus with the lowest barrier to entry for a new analysis.\n* Curate _unique_ corpora.\n  It has been well known for quite some time that humans have different modes of communication.\n  Text corpora reflect this difference.\n  When a new mode of communication is believed to exist, we try to capture a sample of that mode.\n* Provide a \"Methods and Materials\" boilerplate describing how the corpus was collected.\n* Provide a citable [DOI](https://guides.github.com/activities/citable-code/) for the _process_.\n  For _unique_ corpora, we provide the [DOI](https://www.doi.org/) to the _article_ where the text corpus was introduced.\n\nSo that you can:\n\n* Get a text corpus on your local device with as little effort as possible.\n\n# Citations\n\nIt is always nice to see others build upon your efforts.\nIf you use our work, please cite it using the provided [DOI](https://www.doi.org/).\n\n# Getting the code to work\n\nAs of now, all members of our lab work on Windows PCs and program in Python.\nIf that changes in the future, we will likely update this section to include other methods.\n\n## Prerequisites\n\nThe following packages need to be installed.\nYou can use any method to install the prerequisites.\nOn a Windows device, I recommend using [Chocolatey](https://chocolatey.org/install).\nIf you decide to use Chocolatey, open an _admin_ PowerShell prompt, and run the code snippet below.\n\n* [7-zip](https://www.7-zip.org/)\n* [Python](https://www.python.org/downloads/)\n  \n```{ps1}\nif('Unrestricted' -ne (Get-ExecutionPolicy)) { Set-ExecutionPolicy Bypass -Scope Process -Force }\niex ((New-Object System.Net.WebClient).DownloadString('https://chocolatey.org/install.ps1'))\nrefreshenv\n\nchoco install 7zip -y\nchoco install python3 -y\n```\n\n## Python\n\nUnless otherwise noted in the repository directly, all scripts have been tested on Python 3.9.x.\nIn addtion to the steps below, each repository's `README.md` will contain a list of any special instructions.\nAfter running the steps here, run the special instructions.\n\n1. Clone this repository then open an _Admin_ shell to the `~/code` directory.\n2. Install the required modules.\n   ```{shell}\n   pip install -r requirements.txt\n   ```\n  \nWhen writing any code that uses an external dependency, the version of that dependency needs to be declared.\nAll the version information can be found in the repository’s `~/code/requirements.txt` file.\nYou _may_ be able to run different versions, especially if it is just a _minor_ revision, but if the _exact_ version is not used, YMMV.\n\n## Steps\n\nAll the repositories contain a \"Steps\" section in the `README.md`.\nPlease follow those guides to retrieve the text corpus.\n\nYou will likely want to perform additional text processing _after_ retrieving the text corpus.\nOur goal is to provide you with a clean _base_ to perform an analysis, **not** to be opinionated on what you do next.\nWhen completing your study, please remember to keep track of this difference.\nDoing so will better allow you to write your \"Methods and Materials\" section; using (and citing) our steps, then applying (and highlighting) your unique contribution.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftextcorpuslabs%2Fgetting-started","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftextcorpuslabs%2Fgetting-started","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftextcorpuslabs%2Fgetting-started/lists"}