Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/transitive-bullshit/bens-bites-ai-search

AI search for all the best resources in AI – powered by Ben's Bites 💯
https://github.com/transitive-bullshit/bens-bites-ai-search

ai beehiiv ml newsletters search semantic-search

Last synced: 3 months ago
JSON representation

AI search for all the best resources in AI – powered by Ben's Bites 💯

Awesome Lists containing this project

README

        


Ben's Bites

Ben's Bites Link Search


Search across all of the AI-related links in the Ben's Bites newsletter – using AI-powered semantic search.


Build Status
MIT License
Prettier Code Formatting

- [Intro](#intro)
- [How it works](#how-it-works)
- [Semantic Search](#semantic-search)
- [TODO](#todo)
- [License](#license)

## Intro

The goal of this app is to provide a highly curated search for staying up-to-date with the latest AI resources and news.

All search results are extracted from [Ben's Bites AI Newsletter](https://www.bensbites.co/), which is used as a highly curated data source.

## How it works

A cron job is run every 24 hours to update the database.

The steps involved include:

1. Crawling the source [Beehiiv newsletter](https://www.bensbites.co/)
2. Converting each post to markdown
3. Extracting and resolving unique links
4. Fetching opengraph metadata for each link
5. Fetching provider-specific metadata for some links (e.g. tweet text)
6. Generating vector embeddings for each link using OpenAI
7. Upserting all links into a Pinecone vector database

We're using [IFramely](https://iframely.com/) to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.

Once we have all of the links locally, we upsert them into a [Pinecone](https://www.pinecone.io/) vector database for semantic search.

### Semantic Search

Semantic search is powered by [OpenAI's \`text-embedding-ada-002\` embedding model](https://platform.openai.com/docs/guides/embeddings/) and [Pinecone's hosted vector database](https://www.pinecone.io/).

## TODO

- better search UX so back button works
- show the number of posts / links on the home page so it's clear when it was last updated
- acutally sort by recency instead of faking it
- set up cron to update the DB daily
- test on safari/firefox
- display which newsletter the post first appeared in
- explore hybrid search
- infinite scroll so you can keep scrolling results

## License

MIT © [Travis Fischer](https://transitivebullsh.it)

All link data is extracted from [Ben's Bites AI Newsletter](https://www.bensbites.co/) and is licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/).

If you found this project interesting, please consider [sponsoring me](https://github.com/sponsors/transitive-bullshit) or following me on twitter twitter