https://github.com/commonsearch/cosr-back

Backend of Common Search. Analyses webpages and sends them to the index.
https://github.com/commonsearch/cosr-back

Last synced: 2 months ago
JSON representation

Backend of Common Search. Analyses webpages and sends them to the index.

Host: GitHub
URL: https://github.com/commonsearch/cosr-back
Owner: commonsearch
License: apache-2.0
Created: 2016-02-17T04:44:17.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2017-05-31T13:54:26.000Z (about 8 years ago)
Last Synced: 2024-11-09T20:37:45.926Z (8 months ago)
Language: HTML
Homepage: https://about.commonsearch.org
Size: 4.35 MB
Stars: 123
Watchers: 23
Forks: 24
Open Issues: 42
Metadata Files:
- Readme: README.md
- Contributing: CONTRIBUTING.md
- License: LICENSE

Awesome Lists containing this project

awesome-repositories - commonsearch/cosr-back - Backend of Common Search. Analyses webpages and sends them to the index. (HTML)

README

        # cosr-back

[![Chat with us on Slack](https://slack.commonsearch.org/badge.svg)](https://slack.commonsearch.org) [![Build Status](https://travis-ci.org/commonsearch/cosr-back.svg?branch=master)](https://travis-ci.org/commonsearch/cosr-back) [![Coverage Status](https://coveralls.io/repos/github/commonsearch/cosr-back/badge.svg?branch=master)](https://coveralls.io/github/commonsearch/cosr-back?branch=master) [![Apache License 2.0](https://img.shields.io/github/license/commonsearch/cosr-back.svg)](LICENSE)

This repository contains the main components of the [Common Search](https://about.commonsearch.org) backend.

Your help is welcome! We have a complete guide on [how to contribute](CONTRIBUTING.md).

## Understand the project

This repository has 4 components:

 - **cosrlib**: Python code for parsing, analyzing and indexing documents

 - **spark**: Spark jobs using cosrlib.

 - **urlserver**: A service for getting metadata about URLs from static databases

 - **explainer**: A web service for explaining and debugging results, hosted at [explain.commonsearch.org](https://explain.commonsearch.org/)

Here is how they fit in our [general architecture](https://about.commonsearch.org/developer/architecture):

![General technical architecture of Common Search](https://about.commonsearch.org/images/developer/architecture-2016-02.svg)

## Local install

A complete guide available in [INSTALL.md](INSTALL.md).

## Launching the tests

See [tests/README.md](tests/README.md).

## Using plugins

Common Search supports the insertion of user-provided plugins in its processing pipeline. Some are included by default, for instance:

```

make docker_shell

spark-submit spark/jobs/pipeline.py --source url:https://about.commonsearch.org/ --plugin plugins.grep.Words:words="common search",output=/tmp/grep_result

```

See the [plugins/](plugins/) directory for more examples and [Analyzing the web with Spark](https://about.commonsearch.org/developer/tutorials/analyzing-the-web-with-spark-on-ec2) for a complete tutorial.

## Launching the explainer

The explainer allows you to debug results easily. Just run:

```

make docker_explainer

```

Then open [http://192.168.99.100:9703](http://192.168.99.100:9703) in your browser (Assuming `192.168.99.100` is the IP of your Docker host)

## Launching an index job

```

make docker_shell

spark-submit spark/jobs/pipeline.py --source commoncrawl:limit=1 --plugin plugins.filter.Homepages:index_body=1 --profile

```

After this, if you have a `cosr-front` instance connected to the same Elasticsearch service, you will see the results!

A tutorial is currently being written on this topic.

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/commonsearch/cosr-back

Awesome Lists containing this project

README