{"id":13618201,"url":"https://github.com/algolia/npm-search","last_synced_at":"2025-06-19T15:41:45.869Z","repository":{"id":37412734,"uuid":"76697114","full_name":"algolia/npm-search","owner":"algolia","description":"🗿 npm ↔️ Algolia replication tool :skier: :snail: :artificial_satellite:","archived":false,"fork":false,"pushed_at":"2025-06-16T00:02:49.000Z","size":10354,"stargazers_count":138,"open_issues_count":72,"forks_count":22,"subscribers_count":68,"default_branch":"master","last_synced_at":"2025-06-18T22:02:25.327Z","etag":null,"topics":["algolia","couchdb","crawler","npm","search","sync","yarn"],"latest_commit_sha":null,"homepage":"","language":"TypeScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/algolia.png","metadata":{"files":{"readme":"README.md","changelog":"CHANGELOG.md","contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2016-12-17T01:31:19.000Z","updated_at":"2025-06-10T13:40:54.000Z","dependencies_parsed_at":"2023-02-18T03:15:37.973Z","dependency_job_id":"e32482fe-6c82-4edb-a148-2f46041c6ba4","html_url":"https://github.com/algolia/npm-search","commit_stats":null,"previous_names":[],"tags_count":75,"template":false,"template_full_name":null,"purl":"pkg:github/algolia/npm-search","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fnpm-search","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fnpm-search/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fnpm-search/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fnpm-search/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/algolia","download_url":"https://codeload.github.com/algolia/npm-search/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/algolia%2Fnpm-search/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":260781346,"owners_count":23062220,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["algolia","couchdb","crawler","npm","search","sync","yarn"],"created_at":"2024-08-01T20:01:56.146Z","updated_at":"2025-06-19T15:41:40.856Z","avatar_url":"https://github.com/algolia.png","language":"TypeScript","funding_links":[],"categories":["TypeScript","npm"],"sub_categories":[],"readme":"# npm-search\n\n\u003ca href=\"https://www.npmjs.com/\"\u003enpm\u003c/a\u003e ↔️ \u003ca href=\"https://www.algolia.com/\"\u003eAlgolia\u003c/a\u003e replication tool.\nMaintained by \u003ca href=\"https://www.algolia.com/\"\u003eAlgolia\u003c/a\u003e and \u003ca href=\"https://www.jsdelivr.com/\"\u003ejsDelivr\u003c/a\u003e.\n\n\u003ch1 align=\"center\"\u003e\n  \u003cbr/\u003e\n  \u003ca href=\"https://www.algolia.com/\"\u003e\u003cimg src=\"algolia.png\" alt=\"Algolia logo\" height=\"40px\"/\u003e\u003c/a\u003e\n  \u0026nbsp;\u0026\u0026nbsp;\n  \u003ca href=\"https://www.jsdelivr.com/\"\u003e\u003cimg src=\"jsdelivr.png\" alt=\"jsDelivr logo\" height=\"40px\"/\u003e\u003c/a\u003e\n  \u003cbr/\u003e\n\u003c/h1\u003e\n\n[//]: # ([![CircleCI]\u0026#40;https://circleci.com/gh/algolia/npm-search/tree/master.svg?style=svg\u0026#41;]\u0026#40;https://circleci.com/gh/algolia/npm-search/tree/master\u0026#41; \u003ca title=\"Public Status powered by Datadog\" href=\"https://p.datadoghq.com/sb/2b51baa8-c54a-11eb-a5a4-da7ad0900002-4973ed88f5be0d93c350fcb0ea2e7f0c\"\u003e)\n\n[//]: # (  \u003cimg width=\"100\" alt=\"Datadog Status\" src=\"https://www.datocms-assets.com/2885/1611308816-datadog-horizontal-rgb.png?fit=max\u0026fm=png\u0026q=80\" /\u003e)\n\n[//]: # (\u003c/a\u003e)\n\n[//]: # (---)\n\nThis is a failure resilient npm registry to Algolia index replication process.\nIt will replicate all npm packages to an Algolia index and keep it up to date.\nThe state of the replication is saved in Algolia index settings.\n\nThe replication should always be running. **Only one instance per Algolia index must run at the same time**.\nIf the process fails, restart it and the replication process will continue at the last point it remembers.\n\n\u003c!-- START doctoc generated TOC please keep comment here to allow auto update --\u003e\n\u003c!-- DON'T EDIT THIS SECTION, INSTEAD RE-RUN doctoc TO UPDATE --\u003e\n\n- [🗿 npm-search ⛷ 🐌 🛰](#-npm-search---)\n  - [Algolia Index](#algolia-index)\n    - [Using the public index](#using-the-public-index)\n    - [Schema](#schema)\n    - [Ranking](#ranking)\n      - [Textual relevance](#textual-relevance)\n        - [Searchable Attributes](#searchable-attributes)\n        - [Prefix Search](#prefix-search)\n        - [Typo-tolerance](#typo-tolerance)\n        - [Exact Boosting](#exact-boosting)\n      - [Custom/Business relevance](#custombusiness-relevance)\n        - [Number of downloads](#number-of-downloads)\n        - [Popular packages](#popular-packages)\n  - [Usage](#usage)\n    - [Production](#production)\n    - [Restart](#restart)\n  - [How does it work?](#how-does-it-work)\n  - [Contributing](#contributing)\n\n\u003c!-- END doctoc generated TOC please keep comment here to allow auto update --\u003e\n\n## Algolia Index\n\n### Using the public index\n\nThe Algolia index is currently used, for free, by a few selected projects (e.g: [yarnpkg.com](https://yarnpkg.com), [codesandbox.io](https://codesandbox.io), [jsdelivr.com](https://www.jsdelivr.com/), etc...).\n\nIf you want to include this index to your project please create a support request here: [Algolia Support](https://support.algolia.com/hc/en-us/requests/new).\n\nThis product is an open source product for the community and not supported by Algolia.\n\nTo be eligible your project must meet these requirements:\n\n- Publicly available: The project must be publicly usable and, if applicable, include documentation or instructions on how the community can use it.\n- Non-commercial: The project cannot be used to promote a product or service; it has to provide something of value to the community at no cost. Applications for non-commercial projects backed by commercial entities will be reviewed on a case-by-base basis.\n\n\nYou can also use the code or the [public docker image](https://hub.docker.com/r/algolia/npm-search) to run your own (as of September 2021 it will create ~3M records x4).\n\n### Schema\n\nFor every single NPM package, we create a record in the Algolia index. The resulting records have the following schema:\n\n```json5\n{\n  name: 'babel-core',\n  downloadsLast30Days: 10978749,\n  downloadsRatio: 0.08310651682685861,\n  humanDownloadsLast30Days: '11m',\n  jsDelivrHits: 11684192,\n  popular: true,\n  version: '6.26.0',\n  versions: {\n    // [...]\n    '7.0.0-beta.3': '2017-10-15T13:12:35.166Z',\n  },\n  tags: {\n    latest: '6.26.0',\n    old: '5.8.38',\n    next: '7.0.0-beta.3',\n  },\n  description: 'Babel compiler core.',\n  dependencies: {\n    'babel-code-frame': '^6.26.0',\n    // [...]\n  },\n  devDependencies: {\n    'babel-helper-fixtures': '^6.26.0',\n    // [...]\n  },\n  repository: {\n    url: 'https://github.com/babel/babel/tree/master/packages/babel-core',\n    host: 'github.com',\n    user: 'babel',\n    project: 'babel',\n    path: '/tree/master/packages/babel-core',\n    branch: 'master',\n  },\n  readme: '# babel-core\\n\\n\u003e Babel compiler core.\\n\\n\\n [... truncated at 200kb]',\n  owner: {\n    // either GitHub owner or npm owner\n    name: 'babel',\n    avatar: 'https://github.com/babel.png',\n    link: 'https://github.com/babel',\n  },\n  deprecated: 'Deprecated', // This field will be removed, please use `isDeprecated` instead\n  isDeprecated: true,\n  deprecatedReason: 'Deprecated',\n  isSecurityHeld: false, // See https://github.com/npm/security-holder\n  badPackage: false,\n  homepage: 'https://babeljs.io/',\n  license: 'MIT',\n  keywords: [\n    '6to5',\n    'babel',\n    'classes',\n    'const',\n    'es6',\n    'harmony',\n    'let',\n    'modules',\n    'transpile',\n    'transpiler',\n    'var',\n    'babel-core',\n    'compiler',\n  ],\n  created: 1424009748555,\n  modified: 1508833762239,\n  lastPublisher: {\n    name: 'hzoo',\n    email: 'hi@henryzoo.com',\n    avatar: 'https://gravatar.com/avatar/851fb4fa7ca479bce1ae0cdf80d6e042',\n    link: 'https://www.npmjs.com/~hzoo',\n  },\n  owners: [\n    {\n      email: 'me@thejameskyle.com',\n      name: 'thejameskyle',\n      avatar: 'https://gravatar.com/avatar/8a00efb48d632ae449794c094f7d5c38',\n      link: 'https://www.npmjs.com/~thejameskyle',\n    },\n    // [...]\n  ],\n  lastCrawl: '2017-10-24T08:29:24.672Z',\n  dependents: 3321,\n  types: {\n    ts: 'definitely-typed', // definitely-typed | included | false\n    definitelyTyped: '@types/babel__core',\n  },\n  moduleTypes: ['unknown'], // esm | cjs | none | unknown\n  styleTypes: ['none'], // file extensions like css, less, scss or none if no style files present\n  humanDependents: '3.3k',\n  changelogFilename: null, // if babel-core had a changelog, it would be the raw GitHub url here\n  objectID: 'babel-core',\n  // the following fields are considered internal and may change at any time\n  _downloadsMagnitude: 8,\n  _jsDelivrPopularity: 5,\n  _popularName: 'babel-core',\n  _searchInternal: {\n    alternativeNames: [\n      // alternative versions of this name, to show up on confused searches\n    ],\n  },\n}\n```\n\n### Ranking\n\nIf you want to learn more about how Algolia's ranking algorithm is working, you can read [this blog post](https://blog.algolia.com/search-ranking-algorithm-unveiled/).\n\n#### Textual relevance\n\n##### Searchable Attributes\n\nWe're restricting the search to use a subset of the attributes only:\n\n- `_popularName`\n- `name`\n- `description`\n- `keywords`\n- `owner.name`\n- `owners.name`\n\n##### Prefix Search\n\nAlgolia provides default prefix search capabilities (matching words with only the beginning). This is disabled for the `owner.name` and `owners.name` attributes.\n\n##### Typo-tolerance\n\nAlgolia provides default typo-tolerance.\n\n##### Exact Boosting\n\nUsing the `optionalFacetFilters` feature of Algolia, we're boosting exact matches on the name of a package to always be on top of the results.\n\n#### Custom/Business relevance\n\n##### Number of downloads\n\nFor each package, we use the number of downloads in the last 30 days as Algolia's `customRanking` setting. This will be used to sort the results having the same textual-relevance against each others.\n\nFor instance, search for `babel` with match both `babel-core` and `babel-messages`. From a textual-relevance point of view, those 2 packages are exactly matching in the same way. In such case, Algolia will rely on the `customRanking` setting and therefore put the package with the highest number of downloads in the past 30 days first.\n\n##### Popular packages\n\nSome packages will be considered as popular if they have been downloaded \"more\" than others. We currently consider a package popular if it either:\n - has more than `0.005%` of the total number of npm downloads, \n - is in the top thousand of packages at [jsDelivr](https://github.com/jsdelivr/data.jsdelivr.com).\n\nThis `popular` flag is also used to boost some records over non-popular ones.\n\n## Usage\n\n### Production\n\n```sh\nyarn\napiKey=... yarn start\n```\n\n### Restart\nTo restart from a particular point (or from the beginning):\n\n```sh\nseq=0 apiKey=... yarn start\n```\n\nThis is useful when you want to completely resync the npm registry because:\n\n- you changed the way you format packages\n- you added more metadata (like GitHub stars)\n- you are in an unsure state and you just want to restart everything\n\n`seq` represents a [change sequence](http://docs.couchdb.org/en/2.0.0/json-structure.html#changes-information-for-a-database)\nin CouchDB lingo.\n\n## How does it work?\n\nOur goal with this project is to:\n\n- be able to quickly do a complete rebuild\n- be resilient to failures\n- clean the package data\n\nWhen the process starts with `seq=0`:\n\n- save the [current sequence](https://replicate.npmjs.com/) of the npm registry in the state (Algolia settings)\n- bootstrap the initial index content by using [/\\_all_docs](http://docs.couchdb.org/en/2.0.0/api/database/bulk-api.html)\n- replicate registry changes since the current sequence\n- watch for registry changes continuously and replicate them\n\nReplicate and watch are separated because:\n\n1. In replicate we want to replicate a batch of documents in a fast way\n2. In watch we want new changes as fast as possible, one by one. If watch was\n    asking for batches of 100, new packages would be added too late to the index\n\n## Contributing\n\nSee [CONTRIBUTING.md](./CONTRIBUTING.md)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falgolia%2Fnpm-search","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falgolia%2Fnpm-search","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falgolia%2Fnpm-search/lists"}