{"id":40137796,"url":"https://github.com/lwgray/upvotes","last_synced_at":"2026-01-19T14:06:16.965Z","repository":{"id":26095779,"uuid":"107331233","full_name":"lwgray/upvotes","owner":"lwgray","description":"Predicting Upvotes based on your Reddit Post title","archived":false,"fork":false,"pushed_at":"2022-12-08T00:39:32.000Z","size":1126,"stargazers_count":4,"open_issues_count":12,"forks_count":0,"subscribers_count":3,"default_branch":"master","last_synced_at":"2024-05-01T23:28:54.290Z","etag":null,"topics":["idf","linear-regression","machine-learning","prediction","reddit","subreddit","text-classification","upvote"],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/lwgray.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2017-10-17T22:32:37.000Z","updated_at":"2023-09-08T17:31:21.000Z","dependencies_parsed_at":"2023-01-14T04:01:34.451Z","dependency_job_id":null,"html_url":"https://github.com/lwgray/upvotes","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/lwgray/upvotes","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lwgray%2Fupvotes","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lwgray%2Fupvotes/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lwgray%2Fupvotes/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lwgray%2Fupvotes/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/lwgray","download_url":"https://codeload.github.com/lwgray/upvotes/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/lwgray%2Fupvotes/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":28571560,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-01-19T12:50:50.164Z","status":"ssl_error","status_checked_at":"2026-01-19T12:50:42.704Z","response_time":67,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["idf","linear-regression","machine-learning","prediction","reddit","subreddit","text-classification","upvote"],"created_at":"2026-01-19T14:06:16.099Z","updated_at":"2026-01-19T14:06:16.952Z","avatar_url":"https://github.com/lwgray.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"What does it do?\n-------------------\nSay you wanted to submit a post on reddit and you wanted a lot of people to like it.  I believe that the title of your post(I think you would agree) is important in determining how many upvotes(likes) you will receive.  This tool will do this!!!!\n\n\nWhere did this idea come from?\n------------------------------\nThis idea was made concrete at [hackumbc](http://hackumbc.org) with a prototype hosted [here](http://http://54.197.149.55) ***IMPORTANT*** Only enter \"Title\". Do not enter \"subreddit\"\n[Read more about how this project came together](https://devpost.com/software/calzone-trhmwb)\n\nSo how did you do it?\n---------------------\nTo explore this idea, I turned to Machine learning to predict how many upvotes you would get based on your post's title.  I have built a simple Machine learning model with python using mainly scikit-learn package.  The model is currently trained on 38,000 posts from /r/politics subreddit.  \n\nThe  workflow is a simple 2-step process.  I first apply Tf-idf vectorizer.  This performs word tokenizing, n-grams, and idf.  Secondly, I fit with Ridge Linear Regression to predict the upvotes.\n\nTake a look at the Code\n-----------------------\nCode is in upvotes.ipynb\n\nWhat is the problem?\n--------------------\nThis method alone doesn't provide a great prediction.  I want to add in new features but I am not sure how to.  I think adding in the following would improve the prediction.\n    \n    1. Length of title (maybe shorter titles are better???)\n    2. How old is the post? ( I would think that the older the post the more the upvotes)\n\nTechnically, how do I do this?   HELP ME.. post an issue\n\nWhat solutions or Suggestions have others made?\n-----------------------------------------------\n  1. Try the simpler problem of classifying which posts are within the top 20%\n  2. Check if your data is normally distributed - Short Answer, it's not :(\n  3. Add in features such as time of day, day of week, is the title a question(yes,no)\n  4. Use Logistic regression for binary classification\n  5. Normalize votes to a metric of votes/time \n\nThese comments came from reddit discussions:\n  1. [/r/machinelearning](https://www.reddit.com/r/MachineLearning/comments/772hik/d_i_need_help_predicting_the_number_of_subreddit/)\n  2. [/r/datascience](https://www.reddit.com/r/datascience/comments/772e7q/i_need_help_predicting_the_number_of_subreddit/)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flwgray%2Fupvotes","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Flwgray%2Fupvotes","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Flwgray%2Fupvotes/lists"}