https://github.com/sourcegraph/codesearch.ai
codesearch.ai semantic code search engine
https://github.com/sourcegraph/codesearch.ai
Last synced: 11 months ago
JSON representation
codesearch.ai semantic code search engine
- Host: GitHub
- URL: https://github.com/sourcegraph/codesearch.ai
- Owner: sourcegraph
- License: apache-2.0
- Archived: true
- Created: 2022-07-03T07:53:49.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2023-03-24T23:20:15.000Z (almost 3 years ago)
- Last Synced: 2025-03-01T10:24:09.419Z (11 months ago)
- Language: Go
- Homepage: https://codesearch.ai
- Size: 714 KB
- Stars: 37
- Watchers: 6
- Forks: 9
- Open Issues: 1
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# codesearch.ai
[codesearch.ai](https://codesearch.ai) is a semantic code search engine. It allows searching GitHub functions and StackOverflow answers using natural language queries. It uses HuggingFace Transformers under the hood, and the training procedure is inspired by a paper called [Text and Code Embeddings by Contrastive Pre-Training](https://arxiv.org/pdf/2201.10005.pdf) from OpenAI. The [CodeSearchNet project](https://github.com/github/CodeSearchNet) served as a basis for data collection and cleaning.
The project is split into two sub-projects: data collection and model training. The `codesearch-ai-data` folder corresponds to the data collection part written in Go. And the `codesearch_ai_ml` folder corresponds to the model training part written in Python.
## Requirements
- Go >= 1.18
- Python >= 3.7
- CUDA (for GPU model training)
- Postgres
## Code walkthrough
We prepared a detailed code walkthrough in the form of a [Sourcegraph Notebook](https://sourcegraph.com/notebooks/Tm90ZWJvb2s6MTM0Mw==).