Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/audy/nearproteins
:mag: :book: Amino Acids Database with Fuzzy Querying
https://github.com/audy/nearproteins
Last synced: about 21 hours ago
JSON representation
:mag: :book: Amino Acids Database with Fuzzy Querying
- Host: GitHub
- URL: https://github.com/audy/nearproteins
- Owner: audy
- License: mit
- Created: 2014-09-29T16:28:22.000Z (about 10 years ago)
- Default Branch: master
- Last Pushed: 2016-07-26T19:57:35.000Z (about 8 years ago)
- Last Synced: 2023-03-11T01:21:34.811Z (over 1 year ago)
- Language: Python
- Homepage:
- Size: 2.6 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 2
-
Metadata Files:
- Readme: readme.md
- License: LICENSE
Awesome Lists containing this project
README
# nearproteins
Store for finding similar amino acid sequences using Locality Sensitive Hashing
and Approximate Nearest Neighbors.## Installation
Python, with dependencies:
- Annoy
- BioPython```
pip install -r requirements.txt
```Redis
## Instructions
1. Load proteins into database
```sh
$ ./load-proteins < data/proteins.fasta
```2. Query database
```sh
$ ./query-proteins < data/proteins.fasta # returns JSON for each record
```## Python API
Very basic. I plan to add more configuration.
### Loading data into store
```python
from Bio import SeqIO
import nearproteinsstore = nearproteins.SimilarStringStore()
store.engine.clean_all_buckets()
records = SeqIO.parse(handle, 'fasta')
for record in records:
store.add(str(record.seq), record.id)
```### Retrieving records from store
```python
# returns array of vectors, match IDs, similarities
results = store.get(str(record.seq))
```## Use as a server
You can query and add records to the database using simple sockets.
```sh
$ ./server # start the server, listens on port 1234
```In another window...
```sh
$ nc 127.0.0.1 1234 # connect
SET 1 AUSTIN
SET 2 BOSTON
GET AUSTIN
{"1": 0.0}
GET BOSTON
{"2": 0.0}
```You can use this to build a simple client in another language such as Ruby
```ruby
require 'dna'
require 'socket'
require 'json'HOSTNAME = '127.0.0.1'
PORT = '1234'socket = TCPSocket.open HOSTNAME, PORT
File.open('proteins.fasta') do |handle|
records = Dna.new handle, :format => :fastarecords.each do |record|
socket.puts "GET #{record.sequence}"
resp = JSON.parse(socket.gets)
p resp
end
end
```