{"id":20847369,"url":"https://github.com/johnsutor/dsga-1011-project","last_synced_at":"2025-08-31T23:33:34.055Z","repository":{"id":207236600,"uuid":"716829176","full_name":"johnsutor/dsga-1011-project","owner":"johnsutor","description":"NYU DSGA 1011 Project","archived":false,"fork":false,"pushed_at":"2023-12-05T03:29:24.000Z","size":3715,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2025-06-03T05:34:47.367Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Jupyter Notebook","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/johnsutor.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-11-10T00:47:10.000Z","updated_at":"2023-11-10T00:47:26.000Z","dependencies_parsed_at":"2025-03-12T11:44:38.343Z","dependency_job_id":"3bc0d97b-32f6-4cfa-84da-9a86479fb8a2","html_url":"https://github.com/johnsutor/dsga-1011-project","commit_stats":null,"previous_names":["johnsutor/dsga-1011-project"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/johnsutor/dsga-1011-project","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsutor%2Fdsga-1011-project","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsutor%2Fdsga-1011-project/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsutor%2Fdsga-1011-project/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsutor%2Fdsga-1011-project/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/johnsutor","download_url":"https://codeload.github.com/johnsutor/dsga-1011-project/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/johnsutor%2Fdsga-1011-project/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":273052850,"owners_count":25037295,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-31T02:00:09.071Z","response_time":79,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-18T02:19:45.660Z","updated_at":"2025-08-31T23:33:33.760Z","avatar_url":"https://github.com/johnsutor.png","language":"Jupyter Notebook","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Predicting High-Engagement Social Media Posts Based on Internet Trends\r\n\r\n## Motivation\r\nThe motivation for our project stems from the intertwining of social media and daily life, alongside the rapid proliferation of online news which significantly impacts public discourse. As social media platforms like Instagram become primary channels for information dissemination, understanding and leveraging current internet trends from reputable sources becomes crucial. Our project aims to bridge the gap between real-world events and online social interaction by recommending Instagram posts based on web-scraped trends from trustworthy news outlets. This endeavor is not only timely but imperative, as it fosters a more informed social media landscape, enriching user engagement with topical, credible content. By incorporating a thorough analysis of news source trustworthiness, comment sentiment, and temporal dynamics, we strive to curate a more nuanced, insightful social media experience. This initiative resonates with a broader effort to mitigate the spread of misinformation while promoting a more enlightening, responsible social media interaction. Through leveraging advanced NLP tools such as sentiment analysis algorithms, topic modeling techniques (e.g., Latent Dirichlet Allocation), and Named Entity Recognition (NER) to identify and categorize current news trends, we aspire to contribute to the ongoing discourse on how data science and NLP can be harnessed to enhance digital literacy and foster a more informed citizenry in the digital age.\r\n\r\n## Goal\r\nIn our project, we aim to create an intelligent recommendation system to suggest posting content to Instagram posters based on real-time trends harvested from credible news sources. By leveraging advanced Natural Language Processing (NLP) tools and methodologies, we aim to bridge the gap between trending real-world events and social media content, thus fostering a more informed and engaging social media landscape. The motivation for this goal is rooted in the works of Zhou et al. and Vaswani et al., which respectively highlight the potential of personalized recommendation systems and the transformative impact of the Transformer architecture in processing sequential data efficiently. Our endeavor addresses a significant gap in current literature by not only harnessing trending information but also scrutinizing the trustworthiness of the news sources and analyzing the sentiment encapsulated in discussions surrounding these trends.\r\n\r\nThe primary challenges en route to achieving this goal encompass accurately identifying and extracting trending topics from a plethora of online news sources, ensuring the credibility of these sources, and effectively analyzing the sentiment and temporal dynamics surrounding these trends. Moreover, devising a robust recommendation model that can tailor content suggestions to individual Instagram posters based on these extracted trends, while ensuring relevance and engagement, presents a complex challenge.\r\n\r\nAdditionally, should time permit, exploring the integration of other social media platforms to provide a more comprehensive content recommendation system is envisaged. Through solving these challenges and potentially achieving our stretch goals, we aspire to significantly contribute to the domain of data-driven social media content creation and dissemination, ultimately enhancing the digital literacy and interactive experience of the online community.\r\n\r\n## Methodology\r\nOur approach can be broken down into five distinct meta-steps that are as follows:\r\n\r\n1. **Get the current trends for a geographic region of interest (in our case, we will use trends from the USA).** This will be done at a refresh rate set at 24 hours.\r\n\r\n2. **Use the trends to extract relevant posts from different media houses (from the past 2 days) and rank them based on relevance to the topic and credibility of the source.** This will help us form the corpus (later converted to embeddings for each trending item).\r\n\r\n3. **Obtain historical posts from a content publisher and create embeddings/vector representations of that content.** This will help us get the embeddings that we use for matching/comparison to understand better the type of content we should post from what is trending.\r\n\r\n4. **Calculate scores of current trends with respect to the historical posts from the given content publisher.** To calculate scores, we would experiment with simpler approaches like TF-IDF embeddings and more complex approaches that use pre-trained models like BERT.\r\n\r\n5. **Rank the content based on the best scores and come up with the \"ideal post\" depending on the rank.**\r\n\r\n\u003cb\u003e In summary \u003c/b\u003e, we will look at trending pieces and will be able to tailor them to suit the content of our audience. The idea stems from popular websites like \u003ci\u003eBuzzfeed\u003c/i\u003e but without the need for any content writers.\r\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnsutor%2Fdsga-1011-project","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fjohnsutor%2Fdsga-1011-project","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fjohnsutor%2Fdsga-1011-project/lists"}