{"id":19347275,"url":"https://github.com/sarthakjshetty/red","last_synced_at":"2026-05-15T11:09:43.173Z","repository":{"id":75476069,"uuid":"241039423","full_name":"SarthakJShetty/Red","owner":"SarthakJShetty","description":"Developing a database of species threats and stresses from the IUCN Red List. Published in Conservation Letters 2021.","archived":false,"fork":false,"pushed_at":"2024-03-05T19:21:13.000Z","size":3362,"stargazers_count":0,"open_issues_count":0,"forks_count":1,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-01-06T14:35:31.590Z","etag":null,"topics":["beautifulsoup","bots","iucn-red-list","python3","scrapper","selenium"],"latest_commit_sha":null,"homepage":"https://github.com/SarthakJShetty/Red","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/SarthakJShetty.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2020-02-17T06:54:16.000Z","updated_at":"2024-03-04T05:23:34.000Z","dependencies_parsed_at":"2024-03-05T20:43:23.983Z","dependency_job_id":null,"html_url":"https://github.com/SarthakJShetty/Red","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SarthakJShetty%2FRed","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SarthakJShetty%2FRed/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SarthakJShetty%2FRed/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/SarthakJShetty%2FRed/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/SarthakJShetty","download_url":"https://codeload.github.com/SarthakJShetty/Red/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":240457940,"owners_count":19804489,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["beautifulsoup","bots","iucn-red-list","python3","scrapper","selenium"],"created_at":"2024-11-10T04:15:14.495Z","updated_at":"2026-05-15T11:09:43.102Z","avatar_url":"https://github.com/SarthakJShetty.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Red\n\n:warning: **Code is buggy** :warning:\n\n## 1.0 Introduction:\n\n+ The aim of the project is to analyze correlations between the threat status of a particular species tracked on the [IUCN Red List](https://www.iucnredlist.org/ \"IUCN Red List\"), and their threats and stresses.\n\n+ This repository is dedicated to scrapping the necessary datafields from the [IUCN Red List](https://www.iucnredlist.org/ \"IUCN Red List\") to prove such correlations.\n\n+ This project is a collaboration with [Uttara Mendiratta](https://www.researchgate.net/profile/Uttara_Mendiratta \"Uttara\") and [Anand M Ossuri](https://www.ncf-india.org/author/675623/anand-osuri-2 \"Anand\") from the [Nature Conservation Foundation, India](http://ncf-india.org/ \"NCF-India\").\n\n## 2.0 Implementation\n\n1. The [```birds.csv```](https://github.com/SarthakJShetty/Red/tree/master/data/birds.csv) and [```mammals.csv```](https://github.com/SarthakJShetty/Red/tree/master/data/mammals.csv) contain the species for which the data has to be scrapped.\n\n2. The permissions of the [```start.sh```](https://github.com/SarthakJShetty/Red/blob/master/start.sh) have to be changed before the first run of the code.\n\n        user@computer:~/Red chmod +X start.sh\n\n3. The pipeline is triggered using the [```start.sh```](https://github.com/SarthakJShetty/Red/blob/master/start.sh) script, that in-turn triggers the [```scraper.py```](https://github.com/SarthakJShetty/Red/tree/master/scraper.py) code.\n\n        user@computer:~/Red ./start.sh\n\n4. The scrapped data is stored to the disc in the form of a ```X_WORKING.csv``` file, a copy of the original ```.csv```, ensuring the originals are not tampered with.\n\n## 3.0 Model Overview:\n\n+ The model is made of two components: 1. [```interface.py```](https://github.com/SarthakJShetty/Red/tree/master/interface.py) and 2. [```scraper.py```](https://github.com/SarthakJShetty/Red/tree/master/scraper.py).\n\n![alt text](assets/RedPipeline.png \"Scrapping Pipeline\")\n\u003ci\u003eFigure 2.1 Model to scrape data from IUCN Red List\u003c/i\u003e\n\n### 3.1 Interface\n\n1. Disk write/read operations are handled by the [```interface.py```](https://github.com/SarthakJShetty/Red/tree/master/interface.py) code.\n\n2. The [```pandas```](https://pandas.pydata.org/) dataframe is saved to the disc by the [```interface.py```](https://github.com/SarthakJShetty/Red/tree/master/interface.py) code after each run.\n\n### 3.2 Scraper\n\n1. The [```scraper.py```](https://github.com/SarthakJShetty/Red/tree/master/scraper.py) interacts with the webpage using the [Selenium](https://www.selenium.dev/) framework for performance testing. \n\n2. The ```HTML``` ```tags``` contained in the ```page_source``` gathered by the [```Selenium```](https://www.selenium.dev/) middleware code is made searchable using [```BeautifulSoup```](https://www.crummy.com/software/BeautifulSoup/)\n\n3. The [```scraper.py```](https://github.com/SarthakJShetty/Red/tree/master/scraper.py) pipeline collects the prescribed ```HTML``` tags from the website queried and updates a [```pandas```](https://pandas.pydata.org/) dataframe with the information.\n\n4. The ```speciesCounter()``` of the [```scraper.py```](https://github.com/SarthakJShetty/Red/tree/master/scraper.py) script returns the ```sno``` of the last species that's missing the ```stable```, ```unknown``` or ```decline``` population trend tags, which all scrapped species must have.\n\n## 4.0 Known Issues:\n\n1. While writing elements to the [```pandas```](https://pandas.pydata.org/) dataframe an element maybe right-shifting a column(s). This error may lead to a [```pandas```](https://pandas.pydata.org/) memory warning, considreing entities of multiple datatypes occupy the same column.\n\n2. Some species are not indexed by the [IUCN Red List](https://www.iucnredlist.org/ \"IUCN Red List\"). This may cause the [```start.sh```](https://github.com/SarthakJShetty/Red/blob/master/start.sh) script to loop while trying to collect the species ```URL``` from the searchpage.\n\n## Citation:\n\nIf you decide to use our client, scraper or cleaner for your project, or as a means to interface with the IUCN database, please cite our [2021 Conservation Letters](https://conbio.onlinelibrary.wiley.com/doi/full/10.1111/conl.12815) paper!\n\n```\n@article{mendiratta2021mammal,\n  title={Mammal and bird species ranges overlap with armed conflicts and associated conservation threats},\n  author={Mendiratta, Uttara and Osuri, Anand M and Shetty, Sarthak J and Harihar, Abishek},\n  journal={Conservation Letters},\n  volume={14},\n  number={5},\n  pages={e12815},\n  year={2021},\n  publisher={Wiley Online Library}\n}\n```","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsarthakjshetty%2Fred","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fsarthakjshetty%2Fred","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fsarthakjshetty%2Fred/lists"}