https://github.com/lopcode/peep
A work-in-progress project to classify and tag species (and other) information automatically from free text input on social media websites 🦊👀🐰
https://github.com/lopcode/peep
Last synced: 10 months ago
JSON representation
A work-in-progress project to classify and tag species (and other) information automatically from free text input on social media websites 🦊👀🐰
- Host: GitHub
- URL: https://github.com/lopcode/peep
- Owner: lopcode
- License: mit
- Created: 2023-11-06T18:13:38.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-18T12:20:40.000Z (almost 2 years ago)
- Last Synced: 2025-03-26T23:03:20.638Z (10 months ago)
- Language: Kotlin
- Homepage:
- Size: 75.2 KB
- Stars: 1
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE.md
Awesome Lists containing this project
README
# peep
peep is a work-in-progress project to classify and tag species (and other) information automatically from free text
input on social media websites 🦊👀🐰.
For example, on [barq.app](https://barq.app) there's a free text field for species which could result in:
* `Lop-eared bunny` -> `species:rabbit`, `family:leporidae`, `order:lagomorpha`, `characterisation:lop-eared`
* `Arctic fox` -> `species:arctic fox`, `species:fox`, `species:vulpes lagopus`, `genus:vulpes`, `family:canidae`
* `Blue sparkleyote` -> `species:coyote`, `species:canis latrans`, `genus:canis`, `family:canidae`, `color:blue`, `color:sparkle`
This is useful for data analysis, including monitoring species popularity over time, whilst also letting users express
themselves freely.
If this sounds interesting to you, please star the repo - thanks ⭐️!
## How does it work
The initial plan is to do this by:
* Tidying/normalising input data
* Using some form of natural language processing
* Probably with some cultural additions that tend not to appear in popular language
* Nearest-matching to a configurable taxonomy, using an appropriate algorithm
Some other goals / non-goals:
* Take some samples from an existing dataset to do classification performance measurement
* Try combining different methods and benchmark each
* Make an API
* Including batch classification
* Not interested in ML/AI classification to begin with
## Examples
Run the project with Gradle and pass the file you want to parse as an argument, or omit to use the default provided (`data.csv`):
```
🥕 carrot 🗂 ~/git/peep 🐙 main $ ./gradlew run
> Task :app:run
hello, peep
loading data from file: "data.csv"
read 5 rows (not including a header)
top 10 uncategorised:
rabbit - 2
bunny - 1
bunny rabbit - 1
lop eared rabbit - 1
top 10, accounting for categories and their aliases:
rabbit - 5
normalised stats:
4 unique entries (reduction of 1 by normalisation)
with an uncategorised count of 5
and 1 resulting categories (reduction of 3 by categorisation)
with 5 categorisations (1.00 average categories per entry)
```