https://github.com/jorgechato/word-search-engine
Word search engine based on scraping the html source code and integrated with CI/CD and k8s orchestration
https://github.com/jorgechato/word-search-engine
docker firebase flask k8s public swagger travis
Last synced: 3 months ago
JSON representation
Word search engine based on scraping the html source code and integrated with CI/CD and k8s orchestration
- Host: GitHub
- URL: https://github.com/jorgechato/word-search-engine
- Owner: jorgechato
- Created: 2019-03-27T09:30:11.000Z (over 7 years ago)
- Default Branch: master
- Last Pushed: 2019-04-02T22:24:20.000Z (about 7 years ago)
- Last Synced: 2026-01-03T14:35:25.895Z (6 months ago)
- Topics: docker, firebase, flask, k8s, public, swagger, travis
- Language: Python
- Homepage:
- Size: 18.6 KB
- Stars: 2
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Word search engine
[](https://travis-ci.com/jorgechato/word-search-engine)
[](https://hub.docker.com/r/jorgechato/word-search-engine)
Input:
- Query to search for (a word).
- Source where to search (webpage).
Output:
- Number of times the word exists in the html source.
Constrains:
- Count only the giving word, not another word containing.
## Architecture
TODO: architecture
## API
Base API contract is stored in the [doc](/doc/contract.json) folder.
You can see the UI in http://\:\/ and the live documentation in
http://\:\/swagger.json tho.
The body of the request can be strict or with limiters.
In case the query is restricted, the search engine search for a perfect match.
In case the strict value is `false` you need to provide a limiters. In this case
the word can be encapsulated between this limiters.
You can provide more than one limiter.
## Run
```bash
$ FLASK_APP=src/app flask run
# or
$ python src/app.py
```
```bash
$ curl -X PUT http://:/search \
-H 'content-type: application/json' \
-d @'base.template.json'
```
## Deploy
The deployment is automated by the CI/CD pipeline but you can always run it in
your local machine.
```bash
# Build docker
$ docker build -t word-search-engine:latest .
$ docker run -p 8000:8000 -e PORT=8000 --name word-search-engine word-search-engine:latest
```
Pull the latest version from [hub.docker](https://hub.docker.com/r/jorgechato/word-search-engine) from any machine with docker installed on it.
You can automate the process with Terraform, and a CI/CD pipeline if you are
using ECS or create a deploy/rollback in K8s
```bash
$ docker pull jorgechato/word-search-engine:latest
# example with k8s
# do not forget to export the ENV_VARIABLES for the DB connection first
$ kubectl apply -f deploy/k8s.yml
```
---
## Requirements
### Must have
- [python 3.x](https://www.python.org/downloads/)
- pip3
### Recommendation for development
- [anaconda](https://anaconda.org/anaconda/python)
#### Install dependencies
```bash
# with anaconda
$ conda env create -f environment.yml # create virtual environment
$ conda activate backend # enter VE
# or
$ source activate backend
(backend) $ conda deactivate # exit VE
```
---
## FAQ
**Can the word be part of any html tag, css or js embedded in the source code of
the page?**
See also:
* [business] MVP Questions ([#1][i1])
**Does the scrapping search take place in the hole site-map of the domain?**
No, the search engine only search in the endpoint provided by the requester.
**Why K8S and not AWS lambdas?**
**P 1**: When using Serverless platforms the first invocation of a function takes
some time since the code needs to be initialized. In this case we will need a fast
response since this service will be integrated with a stack of MS.
**P 2**: Kubernetes might provide better scalability features than some Serverless
platforms, since Kubernetes is more mature and provides even HA (high availability)
between different zones which not all Serverless platforms provide yet.
And we plan to expand our business to different zones.
**P 3**: it might be easier to use Kubernetes for more complex applications because
the platform is more mature. And since we are planning to use a database to
store the outcome of the logic, that make sense.
**P 4**: Serverless doesn’t automatically mean lower costs, like when your
applications need to run 24/7. There can also be some hidden costs like extra
costs for API management or the costs for the function invocations for tests.
**P 5**: The monitoring capabilities of Kubernetes applications are much more
mature.
[i1]: https://github.com/jorgechato/word-search-engine/issues/1