{"id":16942074,"url":"https://github.com/samouri/kinja-api","last_synced_at":"2025-03-21T08:15:57.056Z","repository":{"id":29073366,"uuid":"32601407","full_name":"samouri/kinja-api","owner":"samouri","description":null,"archived":false,"fork":false,"pushed_at":"2015-05-28T20:49:43.000Z","size":9056,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":4,"default_branch":"master","last_synced_at":"2025-01-26T04:44:23.147Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/samouri.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2015-03-20T18:53:55.000Z","updated_at":"2021-03-12T19:39:37.000Z","dependencies_parsed_at":"2022-08-22T17:40:31.195Z","dependency_job_id":null,"html_url":"https://github.com/samouri/kinja-api","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samouri%2Fkinja-api","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samouri%2Fkinja-api/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samouri%2Fkinja-api/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/samouri%2Fkinja-api/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/samouri","download_url":"https://codeload.github.com/samouri/kinja-api/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":244759961,"owners_count":20505716,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-10-13T21:11:02.626Z","updated_at":"2025-03-21T08:15:57.032Z","avatar_url":"https://github.com/samouri.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Predicting Gawker\n\nTo follow is an outline of the directory structure and a description of its components. Detailed descriptions of individual files are provided at the top of each script:\n\n1. Dataset\n  1. Input\n  1. Output\n  1. Scripts\n2. Preprocessing and Dataset Analysis\n3. Machine Learning\n\n## Dataset\n\nThis folder contains three folders within it. All of the files within it are related to generating the dataset or the graphs. This means downloading html, scraping, etc.\n\n### Input\n\nContains article url files used as input to the scripts.\n\n### Output\n\nContains the json representation of our dataset (articles_labeled.json), and other files used to create the full dataset. These are the results of the script files. Files in the Output folder are generally used for processing in our attempts in Machine Learning. \n\n### Scripts\n\nContains bash and ruby scripts used to build the dataset and graphs.\n\n#### htmls.sh\n\nDownloads all of the html when given the url-list.\n\n#### htmlsWatir.rb\n\n(Deprecated file) Was a proposed solution to the viewcounts issue. By using a chromium webdriver to run the AJAX and wait for viewcounts to exist.\n\n#### urls.sh\n\nGenerates list of article urls from the sitemap.\n\n#### scrape.rb\n\nScrapes article html for relevant article characteristics.\n\n#### viewcounts.sh\n\nDowmloads viewcounts for every article and prints to stdout in json format.\n\n#### generate_linkgraph.rb\n\nReads through htmls and generates a directed graph where nodes are articles and edges are links between them.\n\n#### add_viewcounts.rb\n\nAdds article view counts to article dataset. Each article is given the number of views it received. Uses the result from viewcounts.sh.\n\n#### add_weekend_hour.rb\n\nAdds to time information to article dataset. Article objects are given a boolean describing whether they were published on a weekend and the hour in which they were published on a 24-hour clock.\n\n## Preprocessing and Dataset Analysis\n\nContains the Python files that were used to create and analyze tag and link networks and to preprocess the data. A detailed description of this component is provided in a README.txt inside the folder.\n\n## Machine Learning\n\nContains scripts used in performing Supervised Learning with the dataset. Using a classifier, we predict the number of views an article will receive based on its features.\n\n### Scripts\n\n#### dataset.py\n\nDefines an ArticleDataset class used to simplify importing the dataset and extracting features. Included in any script using the dataset for analysis.\n\n#### pipeline.py\nCombines article features to train a classifier and tests accuracy of predictions. This script is our primary attempt at article classification. \n\n#### transformers.py\n\nDefines custom transformers used in pipeline.py.\n\n#### label.py\n\nLabels the dataset. Can be set to divide articles into either 3 or 5 popularity classes.\n(3: Unpopular, Average, Popular; 5: VeryUnpopular, Unpopular, Average, Popular, VeryPopular)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamouri%2Fkinja-api","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsamouri%2Fkinja-api","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsamouri%2Fkinja-api/lists"}