{"id":23690512,"url":"https://github.com/torinriley/bayesian-causal-inference","last_synced_at":"2025-06-24T13:37:19.747Z","repository":{"id":270017810,"uuid":"909149579","full_name":"torinriley/Bayesian-Causal-Inference","owner":"torinriley","description":" Bayesian causal inference model using BERT embeddings to estimate the causal effect of review length on sentiment polarity. ","archived":false,"fork":false,"pushed_at":"2024-12-27T21:42:15.000Z","size":0,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-12-27T22:24:45.555Z","etag":null,"topics":["bayesian-statistics","causal-inference"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/torinriley.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2024-12-27T21:30:02.000Z","updated_at":"2024-12-27T21:43:19.000Z","dependencies_parsed_at":"2024-12-27T22:44:15.120Z","dependency_job_id":null,"html_url":"https://github.com/torinriley/Bayesian-Causal-Inference","commit_stats":null,"previous_names":["torinriley/bayesian-causal-inference"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torinriley%2FBayesian-Causal-Inference","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torinriley%2FBayesian-Causal-Inference/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torinriley%2FBayesian-Causal-Inference/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/torinriley%2FBayesian-Causal-Inference/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/torinriley","download_url":"https://codeload.github.com/torinriley/Bayesian-Causal-Inference/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":239753733,"owners_count":19691162,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["bayesian-statistics","causal-inference"],"created_at":"2024-12-30T02:33:35.226Z","updated_at":"2025-02-20T00:17:15.874Z","avatar_url":"https://github.com/torinriley.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Bayesian Causal Inference\n\n## Overview\nThis project demonstrates the use of **Bayesian causal inference** to investigate the relationship between **review length** (measured in words) and **sentiment polarity** (positive or negative) in Yelp reviews. The model adjusts for confounding variables in the review's content using **BERT embeddings**, enabling robust causal analysis of text data.\n\n## Key Features\n\n- **Causal Inference**:\n  - Estimates the causal effect of review length on sentiment polarity.\n  - Uses the **NUTS (No-U-Turn Sampler)** algorithm to perform posterior inference.\n- **Natural Language Processing (NLP)**:\n  - Extracts semantic features from reviews using pretrained **BERT embeddings**.\n  - Adjusts for confounding factors in textual content.\n- **Bayesian Modeling**:\n  - Implements a probabilistic framework with Pyro to model relationships and account for uncertainty.\n\n## Methodology\n1. **Data Preparation**:\n   - The **Yelp Polarity Dataset** is used, with a random 1% sample of the training data.\n   - Review content is tokenized and embedded using **BERT** (\"bert-base-uncased\").\n   - Features include:\n     - `X`: Review lengths (word count).\n     - `Z`: BERT embeddings (high-dimensional semantic representations).\n     - `Y`: Sentiment labels (binary).\n\n2. **Causal Model**:\n   - The Bayesian model includes:\n     - **β (beta)**: The causal effect of review length on sentiment.\n     - **σ (sigma)**: Noise parameter accounting for variability in sentiment.\n     - **Z Weights**: Contributions of BERT embeddings to sentiment prediction.\n\n3. **Inference**:\n   - The model uses the **NUTS algorithm** to sample from the posterior distribution of parameters.\n   - Posterior samples for `α`, `σ`, and `β` are analyzed to estimate the causal effect and its uncertainty.\n\n## Results\n- **Posterior Distributions**:\n  - Visualized the posterior distributions of hyperparameters (`α`, `σ`, `β`).\n  - Insights include:\n    - **β (causal effect)**: Indicates whether review length significantly influences sentiment polarity.\n    - **σ (noise)**: Captures unexplained variability in sentiment.\n\n- **Key Findings**:\n  - Adjusting for semantic content (via BERT embeddings) highlights that textual content is a stronger predictor of sentiment than review length.\n\n\n\n## Requirements\n- Python 3.8+\n- Libraries:\n  - `numpy`\n  - `torch`\n  - `pyro`\n  - `datasets`\n  - `transformers`\n  - `seaborn`\n  - `matplotlib`\n\nInstall dependencies using:\n```bash\npip install numpy torch pyro-ppl datasets transformers seaborn matplotlib\n```\n\n## Visualization\n- The script generates a plot of posterior distributions for the hyperparameters (`α`, `σ`, `β`), enabling interpretation of the model's outputs.\n\n\n\u003cimg width=\"500\" alt=\"Screenshot 2024-12-27 at 3 24 59 PM\" src=\"https://github.com/user-attachments/assets/4e514eb0-8f3c-449f-8adb-4ae11bc09482\" /\u003e\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftorinriley%2Fbayesian-causal-inference","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Ftorinriley%2Fbayesian-causal-inference","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Ftorinriley%2Fbayesian-causal-inference/lists"}