{"id":49876661,"url":"https://github.com/kyteproject/search-engine","last_synced_at":"2026-05-15T12:39:02.018Z","repository":{"id":104846763,"uuid":"368149903","full_name":"KyteProject/search-engine","owner":"KyteProject","description":"Search Engine implemented in Go using Agile. (WIP)","archived":false,"fork":false,"pushed_at":"2023-03-07T00:24:17.000Z","size":445,"stargazers_count":0,"open_issues_count":5,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2024-06-19T22:34:20.243Z","etag":null,"topics":["agile","clean-code","go","microservices","monolithic","solid-principles","work-in-progress"],"latest_commit_sha":null,"homepage":"https://omux.dev","language":"Go","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KyteProject.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2021-05-17T10:46:43.000Z","updated_at":"2021-05-22T16:51:10.000Z","dependencies_parsed_at":"2023-07-04T22:31:34.844Z","dependency_job_id":null,"html_url":"https://github.com/KyteProject/search-engine","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/KyteProject/search-engine","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KyteProject%2Fsearch-engine","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KyteProject%2Fsearch-engine/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KyteProject%2Fsearch-engine/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KyteProject%2Fsearch-engine/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KyteProject","download_url":"https://codeload.github.com/KyteProject/search-engine/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KyteProject%2Fsearch-engine/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":33067473,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-15T11:35:32.926Z","status":"ssl_error","status_checked_at":"2026-05-15T11:35:31.362Z","response_time":103,"last_error":"SSL_connect returned=1 errno=0 peeraddr=140.82.121.5:443 state=error: unexpected eof while reading","robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":false,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["agile","clean-code","go","microservices","monolithic","solid-principles","work-in-progress"],"created_at":"2026-05-15T12:38:58.284Z","updated_at":"2026-05-15T12:39:02.007Z","avatar_url":"https://github.com/KyteProject.png","language":"Go","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Introduction\n\nSearch Engine is a personal, educational project and functional search engine. It is a work-in-progress.\n\nThe purpose is to familiarise myself with some more technology and demonstrate an understanding of fundamental software\nengineering concepts.\n\nThe intention is to mimic a complete `Agile` process, create the `monolithic` application, including `tests`, and then\nsimulate a \"scale up\" scenario by splitting the `monolithic` application into `microservices`.\n\nInitially, the project employs a monolithic architecture. The components are built using the `proxy` design pattern that\nallows them to remain decoupled and communicate - this allows for extracting components into separate services when it\nneeds to scale, and simplifies the transition to a `microservice` architecture.\n\nThe project adheres to `SOLID` principles. \u003csmall\u003e_[3]_\u003c/small\u003e\n\n- **Single Responsibility Principle**\n\n  \u003e \"A class should have one, and only one, reason to change.\"\n\n- **Open-Closed Principle**\n\n  \u003e \"You should be able to extend a classes behaviour without modifying it.\"\n\n- **Liskov Substitution Principle**\n\n  \u003e \"Derived classes must be substitutable for their base classes.\"\n\n- **Interface Segregation Principle**\n\n  \u003e \"Make fine-grained interfaces that are client-specific.\"\n\n- **Dependency Inversion Principle**\n\n  \u003e \"Depend on abstractions, not on concretions.\"\n\nThe project adopts an `Agile` approach to quickly build components and work in small iterations to have a working\nprototype as fast as possible. `User Stories` were created to gather requirements and indicate the set of high-level\ncomponents needed; these were then turned into cards and organised on a `Kanban` board to aid development.\n\n# Requirements Analysis\n\nWe answer two things: *what* do we need to develop, and *how completely* would it meet our requirements. `User Stories`\nuse the template from Atlassian. \u003csmall\u003e_[2]_\u003c/small\u003e\n\n## Functional Requirements\n\n### **User Story - Link Submission**\n\n\u003e As an... **end-user**,\n\u003e I need to be able to... **submit new links**,\n\u003e So as to... **update the link graph and make their contents searchable**.\n\n**Acceptance criteria:**\n\n- A frontend or API endpoint for end-users to submit links.\n- Submitted links must be added to the graph and must be crawled by the system and indexed.\n- Already submitted links should be accepted by the backend but not inserted twice into the graph.\n\n### **User Story - Search**\n\n\u003e As an... **end-user**,\n\u003e I need to be able to... **submit full-text search queries**,\n\u003e So as to... **retrieve a list of relevant results from the indexed content.**\n\n**Acceptance criteria:**\n\n- A frontend or API endpoint for end-users to submit a full-text query.\n- Paginate results when a query returns multiple matches.\n- Each entry in the result list must contain the following items: title or link description, the link to the content,\n  and a timestamp indicating when the link was last crawled.\n- When the query results in no matches, an appropriate response must be returned to the user.\n\n### **User Story - Crawl Link Graph**\n\n\u003e As a... **crawler backend system**,\n\u003e I need to be able to... **obtain a list of sanitised links**,\n\u003e so as to... **fetch and index their contents while also expanding the link graph with newly discovered links.**.\n\n**Acceptance criteria:**\n\n- The crawler can query the link graph and receive a list of stale links that need to be crawled.\n- Links received by the crawler are retrieved from the remote hosts unless the remote server provides an ETag or\n  Last-Modified header that the crawler has already seen before.\n- Retrieved content is scanned for links, and the link graph gets updated.\n- Retrieved content is indexed and added to the search corpus.\n\n### **User Story - Calculate PageRank Scores**\n\n\u003e As a... **PageRank calculator backend system**,\n\u003e I need to be able to... **access the link graph**,\n\u003e So as to... **calculate and persist the PageRank score for each link**.\n\n**Acceptance criteria:**\n\n- The PageRank calculator can obtain an immutable snapshot of the entire link graph.\n- A PageRank score is assigned to every link in the graph.\n- The search corpus entries are annotated with the updated PageRank scores.\n\n### **User Story - Monitor Service Health**\n\n\u003e As a... **Member of the** **engineering team**,\n\u003e I need to be able to... **monitor the health of the application and services**,\n\u003e So as to... **detect and address application issues**.\n\n**Acceptance criteria:**\n\n- All services should periodically submit health- and performance-related metrics to a centralised metrics collection\n  system.\n- A monitoring dashboard is created for each service.\n- A high-level monitoring dashboard tracks the overall system health.\n- Metric-based alerts are defined and linked to a paging service.\n\n## Non-functional Requirements\n\n### Service-level Objectives\n\n| Metric                               | Expectation                   | Measurement Period | Notes                                                                                        |\n|--------------------------------------|-------------------------------|--------------------|----------------------------------------------------------------------------------------------|\n| Website availability                 | 99% uptime                    | Yearly             | Tolerates up to 3d 15h 39m of downtime per year                                              |\n| Index service availability           | 99.9% uptime                  | Yearly             | Tolerates up to 8h 45m of downtime per year                                                 |\n| PageRank service availability        | 70% uptime                    | Yearly             | Not a user-facing component of our system; the service can endure longer periods of downtime |\n| Search response time                 | 30% requests answered in 0.5s | Monthly            |                                                                                              |\n| Search response time                 | 70% requests answered in 1.2s | Monthly            |                                                                                              |\n| Search response time                 | 99% requests answered in 2.0s | Monthly            |                                                                                              |\n| CPU utilisation for PageRank service | 90%                           | Weekly             | should not be paying for idle computing nodes                                                 |                                                   |\n\n### System Component Model\n\n![img.png](img.png)\n\u003csmall\u003eUML component diagram for the search engine. _[1]_\u003c/small\u003e\n\n# Data, Storage and Persistence\n\n## Link Graph\n\nAs this is a search engine that catalogues links and their connections to each other, a `graph-based model` is an\noptimum choice for the link store.\n\n![img_1.png](img_1.png)\n\u003csmall\u003eER diagram for the link graph component. _[1]_\u003c/small\u003e\n\nThis gets implemented as an `in-memory` store to aid with running tests on the link graph component; this allows it to\nstay self-contained and avoid spinning up additional database instances for testing or demonstration.\n\nAdditionally, it uses a database-backed graph implementation with `CockroachDB` as the primary persistence store.\n\nWhat is `CockroachDB?` Here is the official description as provided by the `CockroachDB` docs \u003csmall\u003e_[4]_\u003c/small\u003e:\n\u003e CockroachDB is a distributed SQL database built on a transactional and strongly consistent key-value store. It **scales horizontally**; survives disk, machine, rack, and even datacenter failures with minimal latency disruption and no manual intervention; **supports strongly-consistent ACID transactions**; and provides a **familiar SQL API** for structuring, manipulating, and querying data.\n\nThis means that the database scaled horizontally with very little overhead and only requires adding additional nodes.\nClusters will automatically balance themselves to nodes with more capacity.\n\nEvery transaction in CockroachDB guarantees ACID semantics spanning arbitrary tables and rows, even when data is\ndistributed.\n\nTaken from [Wikipedia](https://en.wikipedia.org/wiki/ACID):\n\u003e ACID (atomicity, consistency, isolation, durability) is a set of properties of database transactions intended to guarantee data validity despite errors, power failures, and other mishaps. In the context of databases, a sequence of database operations that satisfies the ACID properties (which can be perceived as a single logical operation on the data) is called a transaction. For example, a transfer of funds from one bank account to another, even involving multiple changes such as debiting one account and crediting another, is a single transaction. \u003csmall\u003e_[5]_\u003c/small\u003e\n\nWe also get PostgreSQL like query syntax, and the benefit of using the pure-Go Postgres package to connect to the\ndatabase.\n\n## Migrations\n\nMigrations will use the `gomigrate` tool \u003csmall\u003e_[6]_\u003c/small\u003e. This tool helps to update or rollback the database schema\nfrom the command line.\n\n`CockroachDB` migrations are  found in: `linkgraph/store/cockroachdb/migrations`\n\nIn order to run migrations you need an environmental variable with the connection string. We can then run the migrate command manually or use the make file:\n```BASH\ndan@Sol:~/search-engine$ export CDB_MIGRATE='cockroachdb://root@localhost:26257/linkgraph?sslmode=disable'\n\ndan@Sol:~/search-engine$ make db-migrations-up\n1/u create_links_table (40.725ms)\n2/u create_edges_table (168.5279ms)\n\ndan@Sol:~/search-engine$ make db-migrations-down\nAre you sure you want to apply all down migrations? [y/N]\nApplying all down migrations\n2/d create_edges_table (161.7728ms)\n1/d create_links_table (275.748ms)\n```\n\n# Testing\n\nAll tests can be run by using the Makefile command:\n`make test`\n\n```BASH\ndan@Sol:~/github.com/kyteproject/search-engine$ make test\n[go test] running tests and collecting coverage metrics\n=== RUN   Test\nOK: 10 passed\n--- PASS: Test (7.38s)\nPASS\ncoverage: 83.1% of statements\nok      search-engine/linkgraph/store/cockroachdb\n7.398s  coverage: 83.1% of statements\n=== RUN   Test\nOK: 10 passed\n--- PASS: Test (0.31s)\nPASS\ncoverage: 100.0% of statements\nok      search-engine/linkgraph/store/memory\n0.327s  coverage: 100.0% of statements\n```\n\n# Acknowledgements\n\nThis project and information is sourced primarily from the book _Hands-On Software Engineering with Golang_ by Achilleas\nAnagnostopoulos. All copyrighted material referenced in this writeup is for educational use and should be taken as my\npersonal notes. No infringement is intended.\n\n# References\n\n1. [Hands-On Software Engineering with Golang - Achilleas Anagnostopoulos [Packt Publishing]](https://www.amazon.co.uk/Hands-Software-Engineering-Golang-programming/dp/1838554491)\n1. [https://www.atlassian.com/agile/project-management/user-stories](https://www.atlassian.com/agile/project-management/user-stories)\n3. [https://team-coder.com/solid-principles/](https://team-coder.com/solid-principles/)\n4. [https://www.cockroachlabs.com/docs/v20.2/frequently-asked-questions.html#what-is-cockroachdb](https://www.cockroachlabs.com/docs/v20.2/frequently-asked-questions.html#what-is-cockroachdb)\n5. [https://en.wikipedia.org/wiki/ACID](https://en.wikipedia.org/wiki/ACID)\n6. [https://github.com/golang-migrate/migrate](https://github.com/golang-migrate/migrate)","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyteproject%2Fsearch-engine","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkyteproject%2Fsearch-engine","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyteproject%2Fsearch-engine/lists"}