https://github.com/transitive-bullshit/bens-bites-ai-search

AI search for all the best resources in AI – powered by Ben's Bites 💯
https://github.com/transitive-bullshit/bens-bites-ai-search

ai beehiiv ml newsletters search semantic-search

Last synced: about 1 month ago
JSON representation

AI search for all the best resources in AI – powered by Ben's Bites 💯

Host: GitHub
URL: https://github.com/transitive-bullshit/bens-bites-ai-search
Owner: transitive-bullshit
License: mit
Created: 2023-01-18T08:52:13.000Z (over 2 years ago)
Default Branch: main
Last Pushed: 2024-02-17T09:16:49.000Z (over 1 year ago)
Last Synced: 2025-03-27T23:07:24.834Z (2 months ago)
Topics: ai, beehiiv, ml, newsletters, search, semantic-search
Language: TypeScript
Homepage: https://search.bensbites.co
Size: 3.82 MB
Stars: 114
Watchers: 4
Forks: 20
Open Issues: 0
Metadata Files:
- Readme: readme.md
- Funding: .github/funding.yml
- License: license

Awesome Lists containing this project

awesome - transitive-bullshit/bens-bites-ai-search - AI search for all the best resources in AI – powered by Ben's Bites 💯 (TypeScript)

README

        

  



Ben's Bites Link Search




  Search across all of the AI-related links in the Ben's Bites newsletter – using AI-powered semantic search.





  

  

  



- [Intro](#intro)

- [How it works](#how-it-works)

  - [Semantic Search](#semantic-search)

- [TODO](#todo)

- [License](#license)

## Intro

The goal of this app is to provide a highly curated search for staying up-to-date with the latest AI resources and news.

All search results are extracted from [Ben's Bites AI Newsletter](https://www.bensbites.co/), which is used as a highly curated data source.

## How it works

A cron job is run every 24 hours to update the database.

The steps involved include:

1. Crawling the source [Beehiiv newsletter](https://www.bensbites.co/)

2. Converting each post to markdown

3. Extracting and resolving unique links

4. Fetching opengraph metadata for each link

5. Fetching provider-specific metadata for some links (e.g. tweet text)

6. Generating vector embeddings for each link using OpenAI

7. Upserting all links into a Pinecone vector database

We're using [IFramely](https://iframely.com/) to extract opengraph metadata for each link, and we also special-case tweet links to extract the tweet text.

Once we have all of the links locally, we upsert them into a [Pinecone](https://www.pinecone.io/) vector database for semantic search.

### Semantic Search

Semantic search is powered by [OpenAI's \`text-embedding-ada-002\` embedding model](https://platform.openai.com/docs/guides/embeddings/) and [Pinecone's hosted vector database](https://www.pinecone.io/).

## TODO

- better search UX so back button works

- show the number of posts / links on the home page so it's clear when it was last updated

- acutally sort by recency instead of faking it

- set up cron to update the DB daily

- test on safari/firefox

- display which newsletter the post first appeared in

- explore hybrid search

- infinite scroll so you can keep scrolling results

## License

MIT © [Travis Fischer](https://transitivebullsh.it)

All link data is extracted from [Ben's Bites AI Newsletter](https://www.bensbites.co/) and is licensed under [CC BY-NC-ND 4.0](https://creativecommons.org/licenses/by-nc-nd/4.0/).

If you found this project interesting, please consider [sponsoring me](https://github.com/sponsors/transitive-bullshit) or following me on twitter

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/transitive-bullshit/bens-bites-ai-search

Awesome Lists containing this project

README

Ben's Bites Link Search