Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/transitive-bullshit/bens-bites-ai-search
AI search for all the best resources in AI – powered by Ben's Bites 💯
https://github.com/transitive-bullshit/bens-bites-ai-search
ai beehiiv ml newsletters search semantic-search
Last synced: 3 months ago
JSON representation
AI search for all the best resources in AI – powered by Ben's Bites 💯
- Host: GitHub
- URL: https://github.com/transitive-bullshit/bens-bites-ai-search
- Owner: transitive-bullshit
- License: mit
- Created: 2023-01-18T08:52:13.000Z (about 2 years ago)
- Default Branch: main
- Last Pushed: 2024-02-17T09:16:49.000Z (11 months ago)
- Last Synced: 2024-10-23T07:53:32.633Z (3 months ago)
- Topics: ai, beehiiv, ml, newsletters, search, semantic-search
- Language: TypeScript
- Homepage: https://search.bensbites.co
- Size: 3.82 MB
- Stars: 113
- Watchers: 5
- Forks: 22
- Open Issues: 0
-
Metadata Files:
- Readme: readme.md
- Funding: .github/funding.yml
- License: license
Awesome Lists containing this project
- awesome - transitive-bullshit/bens-bites-ai-search - AI search for all the best resources in AI – powered by Ben's Bites 💯 (TypeScript)
README
Ben's Bites Link Search
Search across all of the AI-related links in the Ben's Bites newsletter – using AI-powered semantic search.- [Intro](#intro)
- [How it works](#how-it-works)
- [Semantic Search](#semantic-search)
- [TODO](#todo)
- [License](#license)## Intro
The goal of this app is to provide a highly curated search for staying up-to-date with the latest AI resources and news.
All search results are extracted from [Ben's Bites AI Newsletter](https://www.bensbites.co/), which is used as a highly curated data source.
## How it works
A cron job is run every 24 hours to update the database.
The steps involved include:
1. Crawling the source [Beehiiv newsletter](https://www.bensbites.co/)
2. Converting each post to markdown
3. Extracting and resolving unique links
4. Fetching opengraph metadata for each link
5. Fetching provider-specific metadata for some links (e.g. tweet text)
6. Generating vector embeddings for each link using OpenAI
7. Upserting all links into a Pinecone vector databaseWe're using [IFramely](https://iframely.com/) to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.
Once we have all of the links locally, we upsert them into a [Pinecone](https://www.pinecone.io/) vector database for semantic search.
### Semantic Search
Semantic search is powered by [OpenAI's \`text-embedding-ada-002\` embedding model](https://platform.openai.com/docs/guides/embeddings/) and [Pinecone's hosted vector database](https://www.pinecone.io/).
## TODO
- better search UX so back button works
- show the number of posts / links on the home page so it's clear when it was last updated
- acutally sort by recency instead of faking it
- set up cron to update the DB daily
- test on safari/firefox
- display which newsletter the post first appeared in
- explore hybrid search
- infinite scroll so you can keep scrolling results## License
MIT © [Travis Fischer](https://transitivebullsh.it)
All link data is extracted from [Ben's Bites AI Newsletter](https://www.bensbites.co/) and is licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/).
If you found this project interesting, please consider [sponsoring me](https://github.com/sponsors/transitive-bullshit) or following me on twitter