https://github.com/eggsyntax/py-user-knowledge
Predicting demographics of users with GPT based on text they've written
https://github.com/eggsyntax/py-user-knowledge
Last synced: 12 months ago
JSON representation
Predicting demographics of users with GPT based on text they've written
- Host: GitHub
- URL: https://github.com/eggsyntax/py-user-knowledge
- Owner: eggsyntax
- Created: 2024-04-17T16:59:40.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-05-09T18:10:32.000Z (over 1 year ago)
- Last Synced: 2025-02-15T13:42:57.206Z (12 months ago)
- Language: Python
- Size: 12.5 MB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README
Awesome Lists containing this project
README
Series of experiments testing how well LLMs (mainly GPT-3.5) can predict
demographics from text (mainly OKCupid profiles).
To run from the command line:
- Make sure that OPENAI_API_KEY is defined in your environment
- Install packages (untested as yet, please let me know if you encounter difficulties): `conda install --file conda_requirements.txt`
- `python test-demographics.py`
NOTE: this is HORRIBLE CODE. This was my experiment with letting GPT-4 generate
most of the individual functions, and then it's just patches on patches
from there.
It suffers further from my initial naivete about typical ML conventions for
eg data representation, so I'm munging data back and forth in a bunch of places.
Ideally I will rewrite it when I get time, but also when do I ever get time?
Caveat emptor.
Note that despite using temperature=0, the probability distribution predicted
by GPT will vary somewhat between runs, so results will differ.