{"id":18984148,"url":"https://github.com/abdullah0297445/jyi","last_synced_at":"2026-05-06T17:34:22.394Z","repository":{"id":186027276,"uuid":"176518228","full_name":"Abdullah0297445/JYI","owner":"Abdullah0297445","description":"This repository holds python script to scrape research articles from jyi.org and find which articles are most similar to each other.","archived":false,"fork":false,"pushed_at":"2019-03-19T15:52:02.000Z","size":162,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":0,"default_branch":"master","last_synced_at":"2025-08-24T17:48:05.022Z","etag":null,"topics":["cosine-similarity","nltk","pandas","python3","selenium","selenium-python","webscraping"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/Abdullah0297445.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-03-19T13:29:54.000Z","updated_at":"2019-06-27T10:11:57.000Z","dependencies_parsed_at":null,"dependency_job_id":"d7d13b70-624a-4852-8383-0322a854406f","html_url":"https://github.com/Abdullah0297445/JYI","commit_stats":null,"previous_names":["abdullah0297445/jyi"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/Abdullah0297445/JYI","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Abdullah0297445%2FJYI","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Abdullah0297445%2FJYI/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Abdullah0297445%2FJYI/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Abdullah0297445%2FJYI/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/Abdullah0297445","download_url":"https://codeload.github.com/Abdullah0297445/JYI/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/Abdullah0297445%2FJYI/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":32704510,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-06T08:33:17.875Z","status":"ssl_error","status_checked_at":"2026-05-06T08:33:17.221Z","response_time":117,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.6:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cosine-similarity","nltk","pandas","python3","selenium","selenium-python","webscraping"],"created_at":"2024-11-08T16:19:55.475Z","updated_at":"2026-05-06T17:34:22.378Z","avatar_url":"https://github.com/Abdullah0297445.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# JYI\n\nThis repository holds python script to scrape research articles from jyi.org and find which articles are most similar to each other. \u003cbr/\u003e\nCosine similarity has been used to measure the similarity between articles. TF-IDF model has been used. \n\n# About JYI\n\nJYI is a student-led initiative to broaden the undergraduate scientific experience, allowing students to participate in the scientific review and publication processes of its peer-reviewed undergraduate journal. Incorporated as a non-profit, student-run corporation, JYI represents over 50 different academic institutions from over half a dozen countries.\n\n# Requirements:\nYou need to install following python packages because this project depends on those. \n\n 1. Pandas\n```python\npip install pandas\n```\n 2. NLTK \n```python\npip install nltk\n```\nAfter installing nltk you have to download all the necessary text data it provides like stopwords etc.\nYou can do that in 3 simple steps:\n\n1.Open CMD \u003cbr/\u003e\n2.Write 'python' in the prompt so a python environement will start. \u003cbr/\u003e\n3.Write these two lines of code into the prompt. \n\n```python\nimport nltk\nnltk.download('all')\n```\nWait till download is finished.\n\n 3. BeautifulSoup\n```python\npip install beautifulsoup4\n```\n 4. Scikit-Learn\n```python\npip install scikit-learn\n```\n 5. Selenium\n```python\npip install selenium\n```\nAfter installing selenium you need to download its Chrome WebDriver which can be downloaded from:\nhttp://chromedriver.chromium.org/downloads\nChoose appropriate version of Chrome Driver according to the version of your google chrome browser. \nYou need to add the downloaded ChromeDriver EXE to your PATH variable. \n\nThats all for dependencies.\n\n# Usage\n\nusage: similarityscript.py [-h] [-i INPUTFILE] [-s SHEET] [-o OUTPUTFILE]\n\noptional arguments: \u003cbr/\u003e -h, --help show this help message and exit \u003cbr/\u003e -i INPUTFILE, --inputfile INPUTFILE \u003cbr/\u003e Specify the input xlsx file path. E.g. C:\\user\\downloads\\Excel.xlsx \u003cbr/\u003e -s SHEET, --sheet SHEET \u003cbr/\u003e Specify the sheet name in xlsx file. E.g. Dataset1  \u003cbr/\u003e -o OUTPUTFILE, --outputfile OUTPUTFILE \u003cbr/\u003e Specify the directory you want to save output xlsx file. E.g. C:\\user\\downloads\\\n\n![](img/Example%20Usage.jpg)\n\n\u003cbr/\u003e\nIf no input file is specified then this script tries to find a file named \"input.xlsx\" in script's directory- The current directory. \u003cbr/\u003e\nIf no sheet name is specified then \"Dataset1\" is considered as default sheet name of the input xlsx file. \u003cbr/\u003e\nIf no output path is specified then the output file is placed in the same folder as the script. \u003cbr/\u003e\n\u003cbr/\u003e\n\n###### Example input and output XLSX files have been added along with python script. \u003cbr/\u003e This Script has been been tested on Windows 10 with Python 3.6.2 \n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdullah0297445%2Fjyi","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fabdullah0297445%2Fjyi","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fabdullah0297445%2Fjyi/lists"}