https://github.com/prak112/data4wildlife

Instagram scraping algorithm for collecting json and images to identify wildlife trade of Slow Loris
https://github.com/prak112/data4wildlife

data-collection dataset instagram-scraper web-scraping wildlife-conservation

Last synced: 2 months ago
JSON representation

Instagram scraping algorithm for collecting json and images to identify wildlife trade of Slow Loris

Host: GitHub
URL: https://github.com/prak112/data4wildlife
Owner: prak112
License: mit
Created: 2022-01-30T10:29:43.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-02-04T19:37:20.000Z (about 3 years ago)
Last Synced: 2025-01-15T01:42:01.438Z (4 months ago)
Topics: data-collection, dataset, instagram-scraper, web-scraping, wildlife-conservation
Language: HTML
Homepage:
Size: 4.45 MB
Stars: 0
Watchers: 1
Forks: 1
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE

Awesome Lists containing this project

README

# Data 4 Wildlife Hackathon 🛠️
**Hackathon (29-30 Jan 2022) based on developing a digital solution to prevent illegal wildlife trade (IWT) on online social platforms.**

* **Team** - Sean P. Rogers, Gabriela Youngken 👩‍🎓 👨‍🎓
* **Mentor** - Alastair Jamieson 👨‍🏫 (also API-keys holder 👛)

### **Challenge**
* To build a **benchmark dataset** of possible instances of IWT & related information from online social platforms which could also be searched and analyzed 🔚
* According to challenge guidelines : [Challenge1_Guidelines](https://github.com/prak112/data4wildlife/files/8005154/Challenge.1.Guidance.Document.pdf)
* _A benchmark dataset is a public dataset which is designed and collected for studying real-world data science/research problems._
* _The benchmark dataset should be social media platform agnostic, as IWT happens across multiple platforms such as Instagram and YouTube._

## Our Task
* **Collect instagram posts with images related to _Slow Loris_ hashtags (slowloris, slowlorisforsale) to build a benchmark dataset** 🏛️

* **Task Duration** - 26 hours 🏃⏲️

## Our Approach 🏗️
- Manually identify _Slow Loris_ hashtags 🐵 for example data
- Call instagram api (RapidAPI, instagram85) for hashtag related feed
- Collect json (first page only), extract images & label images by user id
- Save images in folder labelled by language (_see **Future Prospects**_)
- Iterate api calls & collect images
- Import json to webpage, [index.html](updated_(code-webpage)/index.html), for human validation of images
- Manually validate images and export csv file with information from comments

### Future Prospects 👀
- Call api recursively with 'next_page_id' to collect all pages
- Depending on image volume, project can evolve into Image Recognition for automation

## _Key Takeaways_
* _Focus on the bigger picture_ 🌄
* _Build one-block-at-a-time_ 🧱
* _Have consistent breaks_ 😌

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/prak112/data4wildlife

Awesome Lists containing this project

README