{"id":22846570,"url":"https://github.com/jsonkao/computational-journalism","last_synced_at":"2025-03-31T05:41:23.953Z","repository":{"id":114470304,"uuid":"162889184","full_name":"jsonkao/computational-journalism","owner":"jsonkao","description":"Notes on computational journalism.","archived":false,"fork":false,"pushed_at":"2019-03-15T20:34:38.000Z","size":7,"stargazers_count":2,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-06T10:15:24.841Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":null,"has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/jsonkao.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-12-23T12:34:12.000Z","updated_at":"2021-09-25T05:29:11.000Z","dependencies_parsed_at":"2023-06-08T07:00:37.212Z","dependency_job_id":null,"html_url":"https://github.com/jsonkao/computational-journalism","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jsonkao%2Fcomputational-journalism","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jsonkao%2Fcomputational-journalism/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jsonkao%2Fcomputational-journalism/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/jsonkao%2Fcomputational-journalism/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/jsonkao","download_url":"https://codeload.github.com/jsonkao/computational-journalism/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":246423494,"owners_count":20774796,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-12-13T03:29:38.635Z","updated_at":"2025-03-31T05:41:23.933Z","avatar_url":"https://github.com/jsonkao.png","language":null,"funding_links":[],"categories":[],"sub_categories":[],"readme":"| Table of Contents |\n|-------------|\n| [Potential Projects](#potential-projects) |\n| [General Resources](#general-resources) |\n| [High Dimensional Data](#high-dimensional-data)\n\n## Potential Projects\n\n- Debiasing word embeddings (see [Man is to Computer Programmer as Woman is to Homemaker?](https://arxiv.org/pdf/1607.06520.pdf)]\n\n## General Resources\n\n- [Jonathan Stray's Frontiers of Computational Journalism course](http://www.compjournalism.com/)\n- [Jun Yang's publications](https://users.cs.duke.edu/~junyang/)\n- [Information Retrieval Book](https://nlp.stanford.edu/IR-book/information-retrieval-book.html)\n\n_Reporting on society, using computation, and reporting on computation in society._\n\n## High Dimensional Data\n\nVector-izing data, then projecting it into K \u003c\u003c R (typically K=2 or K=3)\n\n**Text analysis in Journalism**\n- Clustering, classification\n- Document Vector Space Model: what is this document about?\n  - finding important words, topic analysis, key component for filtering\n  - features = words works fine. One dimension = vocabulary of a document, value of a dimension = # of times word appears\n    - Each entry becomes term frequency: tf(t,d)\n  - distance metric for text (how similar? clustering?)\n    - Looking for overlapping terms: Cosine similarity. similary(a,b) = (a \\dot b) / (mag(a) \\times mag(b)) = cos(theta). Cosine distance is just 1 - similarity(a,b).\n  - also ignore stopwords and \"de-weight\" common words (e.g. \"car\" in car reviews). Document frequency `df(t,D)` = fraction of docs containing term\n    - Inverse document frequency idf(t, D) = log(|D| / |d \\in D : t \\in d|)\n  - [TF-IDF](https://planspace.org/20150524-tfidf_is_about_what_matters/) = tf(t,d) * idf(d, D) = term frequency * 1 / document frequency\n\n## Text Analysis\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjsonkao%2Fcomputational-journalism","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjsonkao%2Fcomputational-journalism","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjsonkao%2Fcomputational-journalism/lists"}