{"id":19074358,"url":"https://github.com/howardyclo/digestant","last_synced_at":"2025-06-17T16:32:57.744Z","repository":{"id":37070613,"uuid":"97578565","full_name":"howardyclo/Digestant","owner":"howardyclo","description":"Modules for effectively digesting data from Twitter and Reddit using ML, NLP and statistics.","archived":false,"fork":false,"pushed_at":"2022-12-08T00:44:05.000Z","size":172692,"stargazers_count":8,"open_issues_count":16,"forks_count":5,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-04-29T12:58:42.347Z","etag":null,"topics":["machine-learning","named-entity-recognition","natural-language-processing","reddit","summarization","topic-modeling"],"latest_commit_sha":null,"homepage":"","language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/howardyclo.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-07-18T09:15:06.000Z","updated_at":"2022-12-31T08:28:25.000Z","dependencies_parsed_at":"2023-01-25T01:31:32.685Z","dependency_job_id":null,"html_url":"https://github.com/howardyclo/Digestant","commit_stats":null,"previous_names":[],"tags_count":1,"template":false,"template_full_name":null,"purl":"pkg:github/howardyclo/Digestant","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howardyclo%2FDigestant","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howardyclo%2FDigestant/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howardyclo%2FDigestant/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howardyclo%2FDigestant/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/howardyclo","download_url":"https://codeload.github.com/howardyclo/Digestant/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/howardyclo%2FDigestant/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260398654,"owners_count":23003014,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["machine-learning","named-entity-recognition","natural-language-processing","reddit","summarization","topic-modeling"],"created_at":"2024-11-09T01:50:40.197Z","updated_at":"2025-06-17T16:32:52.728Z","avatar_url":"https://github.com/howardyclo.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Digestant\nSee more on [introduction slides](https://docs.google.com/presentation/d/18flIvwADXwQum-8xY6I3nSSQmWzqJCRrtQa9y23zVcM/edit?usp=sharing), [project survey](https://hackmd.io/s/rkh_rJY4-) and [demo](https://github.com/YuChunLOL/Digestant/blob/master/demo/demo_howard.ipynb).\n\n## Dev Environment\n- Python 3.x\n\n## Setup\n- Recommended to create a new virtual environment to manage your python project.\n- Download python packages from `requirements.txt`: `$ pip install -r requirements.txt`.\n- Download NLTK data: `$ python -m nltk.downloader all`.\n- Download SpaCy **`en_core_web_md`** model: `$ python -m spacy download en_core_web_md`.\n- Download `stanford-ner-xxxx-xx-xx` zip file Stanford NER model\n  1. Download from [the official website](https://nlp.stanford.edu/software/stanford-ner-2017-06-09.zip).\n  2. Unzip and place the `stanford-ner-xxxx-xx-xx` folder the project root path. The name of folder should also be **`stanford-ner/`**.\n\n## Usage\n1. Create a twitter and reddit account, follow the accounts that you are interested in.\n2. Copy `config-sample.json` and rename it to `config.json` in the same directory. Remember to fill the keys in `config.json`. (Go to your twitter/reddit developer console, create application and get keys.)\n3. We need to crawl twitter data, so run the script `crawlers/twitter_crawler.py`. It will automatically crawl data and save them to `dataset/twitter/` by default.\n4. You can customize data entities by modifying `domains.json` and `types.json`. (See [demo](https://github.com/YuChunLOL/Digestant/blob/master/demo/demo_howard.ipynb))\n5. Currently, you can execute [`demo/demo_howard.ipynb`](https://github.com/YuChunLOL/Digestant/blob/master/demo/demo_howard.ipynb) or other notebooks to see daily digest.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhowardyclo%2Fdigestant","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fhowardyclo%2Fdigestant","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fhowardyclo%2Fdigestant/lists"}