https://github.com/afroisalreadyinu/stashpy

A Logstash replacement in Python
https://github.com/afroisalreadyinu/stashpy
Last synced: 3 months ago
JSON representation
A Logstash replacement in Python
Host: GitHub
URL: https://github.com/afroisalreadyinu/stashpy
Owner: afroisalreadyinu
License: gpl-3.0
Created: 2016-02-21T06:33:05.000Z (over 9 years ago)
Default Branch: master
Last Pushed: 2019-03-07T08:40:36.000Z (over 6 years ago)
Last Synced: 2025-03-21T12:06:32.584Z (3 months ago)
Language: Python
Homepage:
Size: 109 KB
Stars: 110
Watchers: 11
Forks: 5
Open Issues: 0
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

starred-awesome - stashpy - A Logstash replacement in Python (Python)
README

        [![Build Status](https://travis-ci.org/afroisalreadyinu/stashpy.svg?branch=master)](https://travis-ci.org/afroisalreadyinu/stashpy)

[![PyPI version](https://badge.fury.io/py/stashpy.svg)](https://badge.fury.io/py/stashpy)

## What is this?

Stashpy aims to be a slimmed-down Python 3 replacement for

[Logstash](https://www.elastic.co/products/logstash), a log

aggregator. Stashpy accepts connections on a TCP port, parses messages

passed to it over this connection, and indexes them on an

ElasticSearch cluster.

Stashpy is still in development.

## Installing and running

Stashpy requires Python 3. All Linux distros have a relatively new

version in their official repositories. On Mac OS, the Homebrew

version is recommended.

Among the various methods of installing package dependencies,

virtualenv is recommended. Python 3.5 comes with pyvenv, which is a

built-in equivalent of virtualenv. For earlier versions, you will need

to manually install virtualenv with `sudo pip install virtualenv`.  If

you already have a 2.* version of Python installed together with

virtualenv, you can install Python 3 and call virtualenv with the

`--python` argument to use the new version, as in ``virtualenv stashpy

--python `which python3` ``.

It is possible to install Stashpy simply as a dependency or with `pip

install stashpy` as it is [on

PyPI](https://pypi.python.org/pypi/stashpy). For development purposes,

you can simply create a virtualenv wherever you prefer, and run

Stashpy from there. The entry point is named `stashpy`, and accepts a

configuration file as the only argument. A sample configuration file

`sample-config.yml` is provided. See the Usage section for

documentation on different configuration options.

The recommended way to run Stashpy as a service is by checking it out

into `/opt`, and also creating a virtualenv there. To run as an

upstart-managed service, consult the provided

`stashpy.conf`. Unfortunately, we haven't moved on to systemd yet, so

no sample configuration for that, but it shouldn't be too difficult.

## Usage

As can be seen from the extension of `sample-config.yml`, the

configuration format is YAML. Configuration options are the following:

* `address`, `port`: The address and port on which Stashpy should listen.

* `indexer_config`: Configuration options for the ElasticSearch

  cluster to index on. If this key is excluded, nothing will be

  indexed, which is a useful setup for debugging purposes. Must have

  the following keys:

  - `host`: ES host

  - `port`: ES port

  - `index_pattern`: The base pattern to be used for determining index

    name. This pattern will be passed on to

    [`datetime.strftime`](https://docs.python.org/3/library/datetime.html#datetime.date.strftime),

    and will then be formatted with the parsed values dictionary.

* `logging`: This option will be passed on as-is to the

  `logging.config.dictConfig` method. If it is not supplied,

  `stashpy.main.DEFAULT_LOGGING`, which simply logs to stdout, will be

  used.

* `heartbeat_count`: The number of messages at which a heartbeat log

  is written.

* `processor_spec`: The parsing specification. See the next section

  for details.

## Parsing Specification

Stashpy turns log lines (i.e. strings that end with a newline)

supplied through a TCP connection into JSON documents, and indexes

these in ElasticSearch. The log lines are parsed using regular

expressions, which can be specified in one of two different formats:

the **parse format**, or **Oniguruma-flavored named regular

expressions**. The first of these is a normal Python 3 formattable

string that is processed using the [parse

library](https://pypi.python.org/pypi/parse). An example for such an

expression is the following:

* `My name is {name} and I'm {age:d} years old`

Parsing the sentence `My name is Afro and I'm 40 years old` would lead

to the JSON document `{"age": 40, "name": "Afro"}`.

The second format, Oniguruma-flavored regular expressions, uses the

[regex library](https://pypi.python.org/pypi/regex) to provide an

experience similar to that of the [grok plugin for

Logstash](https://www.elastic.co/guide/en/logstash/current/plugins-filters-grok.html). In

an Oniguruma RE, you can build complex regular expressions by

combining simpler components. For example, take the following regular

expression:

* `My name is %{USERNAME:name} and I'm %{INT:age} years old`

This expression, when used as a specification, leads to the following

regular expression:

* `My name is (?P[a-zA-Z0-9._-]+) and I'm (?P(?:[+-]?(?:[0-9]+))) years old`

See the file `stashpy/patterns/grok_patterns.txt` for a list of the

various components you can use in your regular expressions.

For both of these parsing options, it is possible to specify a type to

which the parsed string should be converted. `parse` does this

automatically, converting e.g. a `age` above to integer. With

Oniguruma, you can specify the type the parsed value should be

converted to with a constructor appended to the pattern name,

separated by a colon. For the above example, converting `age` to an

integer would work with the following expression:

* `My name is %{USERNAME:name} and I'm %{INT:age:int} years old`

The type conversion key is used for looking up a function from the

`__builtins__` dictionary; any constructor available there can be

used.

## Specifying the parsing pipeline

There are two alternative ways of providing the order of processing.

### By `processor_spec`

The first method is by providing the `processor_spec` option in the

configuration file. This option can have two keys:

* `to_dict`: A list of expressions that are turned into JSON documents

  without further processing. These expressions can be a parse string

  or an Oniguruma RE.

* `to_format`: A list of dictionaries whose keys are specifications

  and values are dictionariers that are to be formatted based on

  parsed values.

Here's the relevant part from `sample-config.yml`:

```yml

processor_spec:

  to_dict:

    - "My name is {name} and I'm {age:d} years old."

  to_format:

    "Her name is {name} and she's {age:d} years old.":

      name_line: "Name is {name}"

      age_line: "Age is {age:d}"

```

Passing the log line `My name is Afro and I'm 40 years old.` to

Stashpy with this configuration will result in the JSON document

`{"age": 40, "name": "Afro"}`. The log line `Her name is Luna and

she's 4 years old.` will, however, activate the `to_format` section

and lead to the JSON document `{"age_line": "Age is 4", "name_line":

"Name is Luna"}`.

### By custom class

The second method of processing log lines is by specifying a class

that is responsible for accepting them and returning dictionaries. The

path of this class can be passed using the `processor_class`

option. This class must `stashpy.LineProcessor` and implement the

method `for_line(self, line)`, which will be called for each log

line. Two useful methods from the parent class that can be used for

more specialized processing are `do_dict_specs(self, line)`, and

`do_format_specs(self, line)`. The first method returns the result for

the first match from the `self.dict_specs` list, while the second does

the same for the `self.format_specs` attribute. Both return `None` if

there are no matches. If your class has the class attributes `TO_DICT`

or `TO_FORMAT`, these will be used to populate the instance

attributes. The following custom class is equivalent to the

`processor_spec` example above:

```python

class KitaHandlerTwo(stashpy.LineProcessor):

    TO_DICT = ["My name is {name} and I'm {age:d} years old."]

    TO_FORMAT = {"Her name is {name} and she's {age:d} years old.":

                 {'name_line':"Name is {name}", 'age_line':"Age is {age}"}}

    def for_line(self, line):

        dict_result = self.do_dict_specs(line)

        if dict_result:

            return dict_result

        format_result = self.do_format_specs(line)

        if format_result:

            return format_result

        return None

```

### Processing pipeline

The order of processing in Stashpy is relatively straightforward.

First, the `to_dict` specs are applied; if any of the patterns match,

the resulting dictionary is returned. If there are no such matches,

the `to_format` specs are applied, and the result from the first match

is returned. If you are using a custom class for processing, you can

introduce your own ordering and logic.

## Testing

One thing that really annoyed me about Logstash was that testing

patterns was incredibly difficult; the only reliable test could be

done on the live system. Stashpy aims to make testing patterns

simpler. In order to test a parsing specification, simply subclass

`stashpy.tests.PatternTest`. This class has three methods that can be

used to process a logline with a parsing specification:

* `process_to_dict(self, to_dict_spec, logline)`: Processes a list of

  to-dictionary parsing specifications, and returns result.

* `process_to_format(self, to_format_spec, logline)`: Processes a list

  of to-format parsing specifications, and returns result.

Here is a sample test:

```python

import unittest

from stashpy.tests import PatternTest

class SampleTest(PatternTest):

    def test_pattern(self):

        self.assertDictEqual(

            self.process_to_dict(["My name is {name} and I'm {age:d} years old."],

                                 "My name is Yuri and I'm 3 years old."),

            {'name': 'Yuri', 'age': 3})

if __name__ == '__main__':

    unittest.main()

```
ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/afroisalreadyinu/stashpy

Awesome Lists containing this project

README