{"id":13577312,"url":"https://github.com/MarginaliaSearch/MarginaliaSearch","last_synced_at":"2025-04-05T11:32:26.736Z","repository":{"id":146034273,"uuid":"616505806","full_name":"MarginaliaSearch/MarginaliaSearch","owner":"MarginaliaSearch","description":"Internet search engine for text-oriented websites. Indexing the small, old and weird web. ","archived":false,"fork":false,"pushed_at":"2023-12-19T11:22:01.000Z","size":10689,"stargazers_count":603,"open_issues_count":18,"forks_count":13,"subscribers_count":5,"default_branch":"master","last_synced_at":"2023-12-19T17:22:03.948Z","etag":null,"topics":["alt-search","indexer","internet-search","language-processing","no-ai-used","no-cloud","search-engine","small-web","web-crawler"],"latest_commit_sha":null,"homepage":"https://search.marginalia.nu/","language":"HTML","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"other","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/MarginaliaSearch.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":".github/FUNDING.yml","license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null},"funding":{"github":"MarginaliaSearch","patreon":"marginalia_nu","open_collective":null,"ko_fi":null,"tidelift":null,"community_bridge":null,"liberapay":null,"issuehunt":null,"otechie":null,"lfx_crowdfunding":null,"custom":"https://www.buymeacoffee.com/marginalia.nu"}},"created_at":"2023-03-20T14:16:45.000Z","updated_at":"2024-01-26T12:20:32.968Z","dependencies_parsed_at":"2023-12-21T14:14:02.027Z","dependency_job_id":null,"html_url":"https://github.com/MarginaliaSearch/MarginaliaSearch","commit_stats":null,"previous_names":[],"tags_count":12,"template":null,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarginaliaSearch%2FMarginaliaSearch","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarginaliaSearch%2FMarginaliaSearch/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarginaliaSearch%2FMarginaliaSearch/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/MarginaliaSearch%2FMarginaliaSearch/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/MarginaliaSearch","download_url":"https://codeload.github.com/MarginaliaSearch/MarginaliaSearch/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":223186529,"owners_count":17102479,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["alt-search","indexer","internet-search","language-processing","no-ai-used","no-cloud","search-engine","small-web","web-crawler"],"created_at":"2024-08-01T15:01:20.393Z","updated_at":"2025-04-05T11:32:26.722Z","avatar_url":"https://github.com/MarginaliaSearch.png","language":"HTML","funding_links":["https://github.com/sponsors/MarginaliaSearch","https://patreon.com/marginalia_nu","https://www.buymeacoffee.com/marginalia.nu"],"categories":["HTML","java","search-engine","数据库","Java"],"sub_categories":[],"readme":"# Marginalia Search\n\nThis is the source code for [Marginalia Search](https://search.marginalia.nu). \n\nThe aim of the project is to develop new and alternative discovery methods for the Internet. \nIt's an experimental workshop as much as it is a public service, the overarching goal is to\nelevate the more human, non-commercial sides of the Internet.\n\nA side-goal is to do this without requiring datacenters and enterprise hardware budgets, \nto be able to run this operation on affordable hardware with minimal operational overhead. \n\nThe long term plan is to refine the search engine so that it provide enough public value \nthat the project can be funded through grants, donations and commercial API licenses \n(non-commercial share-alike is always free).\n\nThe system can both be run as a copy of Marginalia Search, or as a white-label search engine\nfor your own data (either crawled or side-loaded).  At present the logic isn't very configurable, and a lot of the judgements\nmade are based on the Marginalia project's goals, but additional configurability is being\nworked on!\n\nHere's a demo of the set-up and operation of the self-hostable barebones mode of the search engine: [🌎\u0026nbsp;https://www.youtube.com/watch?v=PNwMkenQQ24](https://www.youtube.com/watch?v=PNwMkenQQ24)\n\n## Set up\n\nTo set up a local test environment, follow the instructions in [📄 run/readme.md](run/readme.md)!\n\nFurther documentation is available at [🌎\u0026nbsp;https://docs.marginalia.nu/](https://docs.marginalia.nu/).\n\nBefore compiling, it's necessary to run [⚙️ run/setup.sh](run/setup.sh). \nThis will download supplementary model data that is necessary to run the code. \nThese are also necessary to run the tests. \n\nIf you wish to hack on the code, check out [📄\u0026nbsp;doc/ide-configuration.md](doc/ide-configuration.md).\n\n## Hardware Requirements\n\nA production-like environment requires a lot of RAM and ideally enterprise SSDs for\nthe index, as well as some additional terabytes of slower harddrives for storing crawl\ndata. It can be made to run on smaller hardware by limiting size of the index.  \n\nThe system will definitely run on a 32 Gb machine, possibly smaller, but at that size it may not perform\nvery well as it relies on disk caching to be fast. \n\nA local developer's deployment is possible with much smaller hardware (and index size). \n\n## Project Structure\n\n[📁 code/](code/) - The Source Code. See [📄 code/readme.md](code/readme.md) for a further breakdown of the structure and architecture.\n\n[📁 run/](run/) - Scripts and files used to run the search engine locally\n\n[📁 third-party/](third-party/) - Third party code\n\n[📁 doc/](doc/) - Supplementary documentation\n\n[📄 CONTRIBUTING.md](CONTRIBUTING.md) - How to contribute\n\n[📄 LICENSE.md](LICENSE.md) - License terms\n\n## Contact\n\nYou can email \u003ckontakt@marginalia.nu\u003e with any questions or feedback.\n\n## License\n\nThe bulk of the project is available with AGPL 3.0, with exceptions. Some parts are co-licensed under MIT, \nthird party code may have different licenses. See the appropriate readme.md / license.md.\n\n## Versioning\n\nThe project uses modified Calendar Versioning, where the first two pairs of numbers are a year and month coinciding \nwith the latest crawling operation, and the third number is a patch number.\n\n```\n            version\n           --\n     yy.mm.VV\n     -----\n     crawl\n```\n\nFor example, `23.03.02` is a release with crawl data from March 2023 (released in May 2023).\nIt is the second patch for the 23.02 release.\n\nVersions with the same year and month are compatible with each other, or offer an upgrade path where the same \ndata set can be used, but across different crawl sets data format changes may be introduced, and you're generally\nexpected to re-crawl the data from scratch as crawler data has shelf life approximately as long as the major release\ncycles of this project. After about 2-3 months it gets noticeably stale with many dead links.\n\nFor development purposes, crawling is discouraged and sample data is available. See [📄\u0026nbsp;run/readme.md](run/readme.md)\nfor more information. \n\n## Funding\n\n### Donations\n\nConsider [donating to the project](https://www.marginalia.nu/marginalia-search/supporting/).\n\n### Grants\n\nThis project was funded through the [NGI0 Entrust Fund](https://nlnet.nl/entrust), a fund established by [NLnet](https://nlnet.nl) with financial support from the European Commission's [Next Generation Internet](https://ngi.eu/) programme, under the aegis of DG Communications Networks, Content and Technology under grant agreement No 101069594.\n\n![NLnet Foundation](nlnet.png)\n![NGI0](NGI0Entrust_tag.svg)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMarginaliaSearch%2FMarginaliaSearch","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FMarginaliaSearch%2FMarginaliaSearch","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FMarginaliaSearch%2FMarginaliaSearch/lists"}