{"id":27656403,"url":"https://github.com/andrewmsilva/insightoverflow","last_synced_at":"2025-08-08T01:04:53.021Z","repository":{"id":53500443,"uuid":"256264142","full_name":"andrewmsilva/InsightOverflow","owner":"andrewmsilva","description":"A bachelor's thesis focusing on making an exploratory analysis from Stack Overflow posts, making general and user-centric analyses on discussed topics.","archived":false,"fork":false,"pushed_at":"2021-06-15T02:35:46.000Z","size":204,"stargazers_count":3,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"develop","last_synced_at":"2025-04-24T06:29:31.376Z","etag":null,"topics":["author-topic-model","extraction","latent-dirichlet-allocation","machine-learning","natural-language-processing","nlp","stack-overflow-posts","topic-modeling"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/andrewmsilva.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2020-04-16T16:03:55.000Z","updated_at":"2021-08-10T21:38:26.000Z","dependencies_parsed_at":"2022-08-24T23:52:10.128Z","dependency_job_id":null,"html_url":"https://github.com/andrewmsilva/InsightOverflow","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/andrewmsilva/InsightOverflow","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewmsilva%2FInsightOverflow","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewmsilva%2FInsightOverflow/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewmsilva%2FInsightOverflow/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewmsilva%2FInsightOverflow/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/andrewmsilva","download_url":"https://codeload.github.com/andrewmsilva/InsightOverflow/tar.gz/refs/heads/develop","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/andrewmsilva%2FInsightOverflow/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":269348099,"owners_count":24401885,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-07T02:00:09.698Z","response_time":73,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["author-topic-model","extraction","latent-dirichlet-allocation","machine-learning","natural-language-processing","nlp","stack-overflow-posts","topic-modeling"],"created_at":"2025-04-24T06:19:54.264Z","updated_at":"2025-08-08T01:04:52.975Z","avatar_url":"https://github.com/andrewmsilva.png","language":"Python","readme":"# Insight Overflow\n\nAn exploratory analysis employing topic modeling: Tracking evolution and loyalty from Stack Overflow users' interests\n\nRunning this experiment requires downloading Stack Overflow posts from the [data dump](https://archive.org/download/stackexchange/stackoverflow.com-Posts.7z) and extract the `.7z` file into ```src/data/```. As this algorithm employs Redis database for extraction step, installing, configuring, and starting Redis is essential (a tutorial is found [here](https://redis.io/topics/quickstart)).\n\n## Extraction\n\n```sh\nExtraction started\n  Extracted: 49598818\n  Ignored: 739023\n  Total: 50337841\nExecution time: 04:11:27.56\n```\n\n## Pre-processing\n\n```\nPre-processing started\nExecution time: 102:39:36.14\n```\n\n## Topic modeling\n\n```\nTopic modeling started\n  Corpus built: 00:00:01.65\n  Experiment done: k=20 i=10 | p=4133.9019, cv=0.4946\n  Experiment done: k=20 i=100 | p=1433.5471, cv=0.6330\n  Experiment done: k=20 i=200 | p=1388.5460, cv=0.6343\n  Experiment done: k=20 i=500 | p=1365.3670, cv=0.6341\n  Experiment done: k=40 i=10 | p=5503.5514, cv=0.5449\n  Experiment done: k=40 i=100 | p=1448.7289, cv=0.6046\n  Experiment done: k=40 i=200 | p=1379.5958, cv=0.6051\n  Experiment done: k=40 i=500 | p=1330.4556, cv=0.6072\n  Experiment done: k=60 i=10 | p=6675.3963, cv=0.5221\n  Experiment done: k=60 i=100 | p=1448.0626, cv=0.5874\n  Experiment done: k=60 i=200 | p=1349.6507, cv=0.5940\n  Experiment done: k=60 i=500 | p=1290.6926, cv=0.5880\n  Experiment done: k=80 i=10 | p=7576.2664, cv=0.5115\n  Experiment done: k=80 i=100 | p=1457.7716, cv=0.5800\n  Experiment done: k=80 i=200 | p=1351.4062, cv=0.5866\n  Experiment done: k=80 i=500 | p=1288.1277, cv=0.5892\n  Experiment done: k=100 i=10 | p=8093.3122, cv=0.5114\n  Experiment done: k=100 i=100 | p=1448.3062, cv=0.5762\n  Experiment done: k=100 i=200 | p=1341.3547, cv=0.5787\n  Experiment done: k=100 i=500 | p=1272.4512, cv=0.5794\nExecution time: 00:54:22.32\n```\n\n## Post-processing\n\n```\nPost-processing started\n  Extracting topics\n  Creating coherence chart\n  Creating perplexity chart\n  Computing general popularity\n    Posts covered: 49573604\n    Number of posts with empty topics: 36085\n    Computed metrics: 4410\n  Creating general popularity charts\n  Computing user popularity\n    Posts covered: 49573604\n    Number of users: 4943206\n    Number of posts with empty topics: 36085\n    Computed metrics: 534554010\n  Creating user popularity charts\nExecution time: 12:57:57.99\n```","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrewmsilva%2Finsightoverflow","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fandrewmsilva%2Finsightoverflow","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fandrewmsilva%2Finsightoverflow/lists"}