Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/eliahkagan/bed
Text embeddings via OpenAI - assorted Python examples
https://github.com/eliahkagan/bed
Last synced: about 24 hours ago
JSON representation
Text embeddings via OpenAI - assorted Python examples
- Host: GitHub
- URL: https://github.com/eliahkagan/bed
- Owner: EliahKagan
- License: 0bsd
- Created: 2023-01-23T07:29:50.000Z (almost 2 years ago)
- Default Branch: main
- Last Pushed: 2024-08-09T19:23:15.000Z (3 months ago)
- Last Synced: 2024-08-09T20:40:49.474Z (3 months ago)
- Language: Jupyter Notebook
- Size: 738 KB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Text embeddings via OpenAI - assorted Python examples
These are some examples of embeddings using OpenAI models.
Examples are here because I find each to be of some interest, but this is not
intended as a tutorial for how to use embeddings.For instructive examples, see the official OpenAI repository
[**openai-cookbook**](https://github.com/openai/openai-cookbook).This repository, *bed*, is like [*bedj*](https://github.com/EliahKagan/bedj),
but this is in Python (with Jupyter notebooks) and is more extensive.## License
[0BSD](https://spdx.org/licenses/0BSD.html). See [**`LICENSE`**](LICENSE).
## Contents
*Summary forthcoming.* For now, look at the descriptions at the top of each
notebook.## Notes
The examples are written to assume your API key is in a file called `.api_key`.
Do not commit it to Git! The `.gitignore` file excludes it, to help avoid that.## Matrix multiplication
One interesting technique shown here is storing the embeddings as rows of a
matrix, then finding similarities with matrix multiplication.The second operand can be an embedding, in which case we are multiplying a
matrix by a column vector, which is the same as taking the dot products of all
of the matrix's rows with the vector (to make the new coordinates of the
resulting vector).If the second operand is a matrix whose *columns* are embeddings, then each
*(i, j)* entry of the resulting matrix is the dot product of the *i*th row of
the first matrix by the *j*th row of the second matrix, i.e., the similarities
of those embeddings.The dot products are the cosine similarities with OpenAI embeddings, and with
embeddings from some other non-OpenAI models (but not all), because many
embedding models, including all OpenAI models, produce embeddings that are
already normalized (length 1).