{"id":28965941,"url":"https://github.com/devflowinc/gymshark-scrape","last_synced_at":"2025-06-24T07:10:38.769Z","repository":{"id":227023358,"uuid":"770195551","full_name":"devflowinc/gymshark-scrape","owner":"devflowinc","description":null,"archived":false,"fork":false,"pushed_at":"2024-03-13T01:50:31.000Z","size":95,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"main","last_synced_at":"2025-06-22T05:17:05.385Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/devflowinc.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null}},"created_at":"2024-03-11T05:39:13.000Z","updated_at":"2024-04-02T17:19:13.000Z","dependencies_parsed_at":"2024-03-11T07:57:48.881Z","dependency_job_id":null,"html_url":"https://github.com/devflowinc/gymshark-scrape","commit_stats":null,"previous_names":["devflowinc/gymshark-scrape"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/devflowinc/gymshark-scrape","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devflowinc%2Fgymshark-scrape","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devflowinc%2Fgymshark-scrape/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devflowinc%2Fgymshark-scrape/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devflowinc%2Fgymshark-scrape/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/devflowinc","download_url":"https://codeload.github.com/devflowinc/gymshark-scrape/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/devflowinc%2Fgymshark-scrape/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":261624969,"owners_count":23186121,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2025-06-24T07:10:38.066Z","updated_at":"2025-06-24T07:10:38.749Z","avatar_url":"https://github.com/devflowinc.png","language":"TypeScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Trieve YC Company Directory Demo\n\nThis is a demonstration of Trieve's source available and self-hostable infrastructure for enterprise search teams on the YC company directory dataset. Trieve combines search language models with tools for human fine-tuning. Find our main repository at [github.com/devflowinc/trieve](https://github.com/devflowinc/trieve).\n\n## Creating the dataset with all of the YC companies\n\n### 1. scrape a list of yc-company links from the offical YC Company Directory Page\n\nNavigate to [ycombinator.com/companies](https://ycombinator.com/companies) and paste the following [gist](https://gist.github.com/skeptrunedev/0e389b6532020f8512180b4f131ceb2b) into the console.\n\nThe result will be JSON containing URLs for all public YC companies.\n\n### 2. paste the JSON array of YC companies from the js browser console script into `./bun-scraper/yc-company-links.json` \n\nThe ingest process will use this list to create chunks which get sent to the Trieve API. \n\n### 3. Get a dataset_id and api key from [dashboard.trieve.ai](https://dashboard.trieve.ai) and add it to the ENV for the scraping process\n\nWithin the root directory of this repository, run `cat ./bun-scraper/example.env \u003e ./bun-scraper/.env`. \n\n1. Navigate to [dashboard.trieve.ai](https://dashboard.trieve.ai) and sign in or make an account\n2. On the first page you see, click **create dataset**\n3. On the dataset creation page, copy your `dataset_id` and paste it into `./bun-scraper/.env` as the value for `DATASET_ID`\n4. Click the button to create an API key\n5. Create a Read+Write type API key, copy the value and paste it into `./bun-scraper/.env` as the value for `API_KEY`\n\n### 4. Run the scraper and create your chunks!\n\n1. Run `cd ./bun-scraper` in the root of this repository\n2. If you have not already installed it, install [bun](https://bun.sh/) with `npm install -g bun` \n3. Run `bun install`\n4. Run `bun index.ts`\n\n## Running the frontend\n\n### 1. Setup the root env file for the frontend\n\n1. Run `cat .env.example \u003e .env` in the root of this repository\n2. Set `VITE_DATASET_ID` in the `.env` file to the ID of the dataset for which you added chunks in the dataset creation step\n3. Set `VITE_API_KEY` in the `.env` file to a read only API key that you created on [dashboard.trieve.ai](https://dashboard.trieve.ai)\n\n### 2. Build the frontend with your environment variables\n\nRun `yarn build` in the root of this repository\n\n### 3. Start the packaged frontend\n\nRun `yarn serve` in the root of this repository\n\n## Final Notes\n\nYou can also navigate to [chat.trieve.ai](https://chat.trieve.ai) or [search.trieve.ai](https://search.trieve.ai) to explore your dataset in both a RAG and search context. \n\nOn [search.trieve.ai](https://search.trieve.ai) you can experiment with manually editing chunks' content and relevance weight to adjust and fine-tune search results. A common use-case is adding weight to top YC companies such that they rank higher in search.","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevflowinc%2Fgymshark-scrape","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdevflowinc%2Fgymshark-scrape","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdevflowinc%2Fgymshark-scrape/lists"}