{"id":18585556,"url":"https://github.com/postman-open-technologies/openapi-web-search","last_synced_at":"2025-04-10T13:31:16.700Z","repository":{"id":190992615,"uuid":"613285152","full_name":"postman-open-technologies/openapi-web-search","owner":"postman-open-technologies","description":"OpenAPI Web Search: Revolutionizing the Way Developers find API Definitions 🚀","archived":false,"fork":false,"pushed_at":"2024-04-18T14:58:21.000Z","size":5871,"stargazers_count":22,"open_issues_count":5,"forks_count":4,"subscribers_count":7,"default_branch":"main","last_synced_at":"2025-03-24T21:42:18.261Z","etag":null,"topics":["crawler","dataset","gsoc","gsoc-2023","openapi","search-engine","swagger"],"latest_commit_sha":null,"homepage":"https://github.com/postman-open-technologies/gsoc-2023/issues/7","language":"JavaScript","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/postman-open-technologies.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-03-13T09:20:08.000Z","updated_at":"2025-02-11T17:25:40.000Z","dependencies_parsed_at":"2023-11-14T08:38:58.971Z","dependency_job_id":"f5309da2-ee29-434c-82da-eeef1d73b08a","html_url":"https://github.com/postman-open-technologies/openapi-web-search","commit_stats":null,"previous_names":["postman-open-technologies/openapi-web-search"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postman-open-technologies%2Fopenapi-web-search","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postman-open-technologies%2Fopenapi-web-search/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postman-open-technologies%2Fopenapi-web-search/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/postman-open-technologies%2Fopenapi-web-search/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/postman-open-technologies","download_url":"https://codeload.github.com/postman-open-technologies/openapi-web-search/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248225707,"owners_count":21068078,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["crawler","dataset","gsoc","gsoc-2023","openapi","search-engine","swagger"],"created_at":"2024-11-07T00:34:38.765Z","updated_at":"2025-04-10T13:31:11.685Z","avatar_url":"https://github.com/postman-open-technologies.png","language":"JavaScript","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003cdiv align='center'\u003e\n\u003cimg src='https://cdn.worldvectorlogo.com/logos/openapi-1.svg' height='20%' width='20%'/\u003e\n\u003ch1\u003eOpen API Web Search\u003c/h1\u003e\n\u003c/div\u003e\n\n## Background\nThe [Postman Open Technologies](https://blog.postman.com/announcing-postman-open-technologies/) team maintains a [project](https://github.com/postman-open-technologies/knowledge-base) dedicated to mining and extracting knowledge from the API universe. There is a wealth of knowledge present in the OpenAPI, Swagger, Postman Collections, Spectral, and other API artifacts available on GitHub, but also on the open web. \n\nTo expand the current [knowledge base](https://github.com/postman-open-technologies/knowledge-base), we want to develop an open-source approach for finding Swagger and OpenAPI definitions on the open web, crawling web pages looking for API definitions, validating them, and then consuming and indexing them as part of an ongoing search. \n\nThere are already known sources like [GitHub](https://github.com/), [SwaggerHub](https://swagger.io/tools/swaggerhub/), and [APIs.guru](https://apis.guru/) to find OpenAPI/Swagger specifications but we want to focus on **extracting API definitions from lesser-known sources and presenting them to the world**. The dataset can later be used to [analyze the specifications to obtain insights](https://www.wittern.net/blog/analyzing-api-specs) into some of the practices common among APIs.\n\n## What’s Open API Web Search?\n\nOpen API Web Search project is all about providing a simple way for developers to find existing Swagger and OpenAPI definitions on the open web—mostly from lesser-known sources. The ultimate goal of this project is to build a search engine for APIs where API consumers and producers can discover APIs using keywords that abstract away the complexity of searching the web for specific terms, helping identify APIs in a sea of web pages. Learn how Open API Web Search can help [unleash the power of open APIs](https://vinitshahdeo.dev/open-api-web-search).\n\nThe goal of this project can be achieved with the following milestones:\n\n1. **Crawling**: Crawl webpages looking for **valid** API Definitions—mostly from lesser-known sources.\n2. **Indexing**: Validate \u0026 store indexed crawl results.\n3. **Implementing a search algorithm**: Using this large dataset of OpenAPI/Swagger specifications, expose an API that abstracts away the complexity of searching the web for specific terms for finding APIs\n4. **Providing an interface**: Design a UI for API consumers and producers to initiate a search looking for APIs. Initially, the search can be done using metadata—the info object of the [OpenAPI document](https://spec.openapis.org/oas/latest.html#info-object).\n5. **Updating dataset**: Regularly update the crawl results and re-index them for better search results.\n\n\n# Running the Server\n\n\u003e Fork and/or clone the OpenAPI Web Search repo and change directory into it:\n\n```js\n\ngit clone https://github.com/\u003cusername\u003e/openapi-web-search.git\ncd openapi-web-search/src/server\n\n```\n\n\u003e Install dependencies via yarn: \n\n```js\n\nyarn install\n\n```\n\n\u003e Start local server:\n\n```js\n\nyarn run dev\n\n```\n\n\u003e After launching the local server, we can use Postman to begin sending http requests to the specified endpoints. I've included a postman collection in root of the project to get you started:\n\n\n\u003e Run the following endpoints in the specified order after configuring Postman with the collection above:\n\n```js\n\n1. http://localhost:1337/api/v1/run/crawler?latest=true\n2. http://localhost:1337/api/v1/process/index-files?skip=0\u0026limit=20\u0026sort=aes\n3. http://localhost:1337/api/v1/indexing\n4. http://localhost:1337/api/v1/search?q=\u003cquery\u003e\n\n```\n\n\u003e Explanation:\n\n1. The first endpoint will crawl the common-crawl website to get some files which include the paths to index files that are converted into the appropriate endpoints. \n2. The second endpoint initiates the background process of downloading index files, processing them, and storing the results, which are validated openapi definitions, in mongodb. \n3. Third endpoint begins indexing the previously gathered MongoDB results into Elasticsearch..\n4. The last endpoint is utilised to create a search query for optimum retrival.\n\n\n\n\n\n\n\n\n\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpostman-open-technologies%2Fopenapi-web-search","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fpostman-open-technologies%2Fopenapi-web-search","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fpostman-open-technologies%2Fopenapi-web-search/lists"}