Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/usrrname/aryscraper
A series of python3 scripts written for creating an image data set
https://github.com/usrrname/aryscraper
python python3 scraper
Last synced: 29 days ago
JSON representation
A series of python3 scripts written for creating an image data set
- Host: GitHub
- URL: https://github.com/usrrname/aryscraper
- Owner: usrrname
- Created: 2022-01-30T20:51:12.000Z (almost 3 years ago)
- Default Branch: main
- Last Pushed: 2022-07-14T18:05:00.000Z (over 2 years ago)
- Last Synced: 2024-04-18T06:21:23.872Z (7 months ago)
- Topics: python, python3, scraper
- Language: Python
- Homepage:
- Size: 600 KB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
Aryan scraper
============
### WORK IN PROGRESSThis is one of multiple tools I'm working on as a _Black Mirror_-style proof-of-concept tool to demonstrate the ongoing harm of racial profiling, and how it may be perpetuated by face recognition and machine learning.
This repository shows a series of scripts I used to:
- [x] scrape the names of the SS high command from the Wikipedia page `scrape-names.py` and output them as a .csv
- [x] use them to create folders `create-named-folders.py`
- [x] finds portraits of their faces with serpapi/google search image api `scraper.py`
- [x] extracts metadata for each name and saves it all as a .json file `wiki.py`
- [x] some elementary face extraction in `haar_cascade.py`## About
AryScraper is the scraper that created the male and female face "Aryan" data sets using photos of Holocaust perpetrators and the SS high command. Note: This whole exercise is arcane and not rooted in science but the interest was to use AI to see if we could perform some hyperblic "categorization" or averaging on samples of self-proclaimed "Aryans".
Such gestures in public take after "tactical media" or "tactical technology".
Although images here were compiled from publicly available sources, the dataset nor model is available here to prevent harmful misuse.
## Folder Structure
```
.
├── Makefile
├── .github // github actions
├── README.md
├── requirements.txt
├── __pycache__
├── test
├── automate_scraping.py
├── create_folders.py
├── haar_cascade.py
├── names.py
├── scrape_names.py
├── scraper.py
├── ss-ranks.csv
├── util.py
├── wiki.py
├── men // images of male faces
└── women // images of female faces
```