Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.

Awesome Lists | Featured Topics | Projects

https://github.com/LibtraceTeam/libprotoident

Network traffic classification library that requires minimal application payload
https://github.com/LibtraceTeam/libprotoident

Last synced: about 2 months ago
JSON representation

Network traffic classification library that requires minimal application payload

Awesome Lists containing this project

README

        

libprotoident 2.0.15

---------------------------------------------------------------------------
Copyright (c) 2011-2020 The University of Waikato, Hamilton, New Zealand.
All rights reserved.

This code has been developed by the University of Waikato WAND
research group. For further information please see http://www.wand.net.nz/.
---------------------------------------------------------------------------

See the file COPYING and COPYING.LESSER for full licensing details for this
software.

Report and bugs, questions or comments to [email protected]

NEW: You can now lodge bugs by filing an issue on the libprotoident github:
https://github.com/wanduow/libprotoident

Authors:
Shane Alcock

With contributions from:
Donald Neal
Aaron Murrihy
Paweł Foremski
Fabian Weisshaar
Jeroen Roovers
Jiri Havranek
Romain Fontugne
Jacob van Walraven

Introduction
============
Libprotoident is a library designed to perform application protocol
identification using a very limited form of deep packet inspection, i.e. using
the first four bytes of application payload sent in each direction. The
library provides a simple API that will enable programmers to develop their own
tools that utilise application protocol information and we have also included
some tools that can be used to perform simple analysis of traffic flows.

Required Libraries
==================
libtrace 4.0.1 or later
* available from https://github.com/LibtraceTeam/libtrace

libflowmanager 3.0.0 or later
* optional, but required to build the tools
* available from https://github.com/LibtraceTeam/libflowmanager

Installation
============
After having installed the required libraries, running the following series
of commands should install libprotoident

./bootstrap.sh (only if you've cloned the source from GitHub)
./configure
make
make install

By default, libprotoident installs to /usr/local - this can be changed by
appending the --prefix= option to ./configure.

The libprotoident tools are built by default - this can be changed by using the
--with-tools=no option with ./configure.

Protocols Supported
===================
A full list of supported protocols can be found at
https://github.com/wanduow/libprotoident/wiki/SupportedProtocols

Libprotoident also currently has rules for several "mystery" protocols. These
are patterns that commonly occur in our trace sets that we cannot tie to an
actual protocol. It would be nice to know what these protocols actually are -
if you have any suggestions please feel free to email us at [email protected].

In addition, a flow can be assigned into a "category" based on the protocol
determined by libprotoident, enabling broader analysis. For example,
BitTorrent, Gnutella and eMule all fall into the P2P category, whereas SMTP,
POP3 and IMAP are part of the Mail category.

Tools
=====
There are three tools included with libprotoident.

* lpi_protoident

Description:

This tool attempts to identify each individual flow within the provided
trace. Identification only occurs when the flow has concluded or
expired, so it is not very effective for real-time applications.

Usage:
lpi_protoident

The input trace must be a valid libtrace URI.
See https://github.com/LibtraceTeam/libtrace/wiki/Supported-Trace-Formats
to learn more about libtrace URIs. Note that a URI may be a live
source, such as a network interface.

Output:
For each flow in the input trace, a single line is printed to stdout
describing the flow. The line contains the following fields separated
by spaces (in order):

* Application protocol (as reported by libprotoident)
* IP address of the first endpoint
* IP address of the second endpoint
* Port used by the first endpoint
* Port used by the second endpoint
* Transport protocol (6 = TCP, 17 = UDP)
* Unix timestamp when the flow began
* Unix timestamp when the flow ended
* Total bytes sent from first endpoint to second endpoint
* Total bytes sent from second endpoint to first endpoint
* First four bytes of payload sent from first endpoint (in hex)
* First four bytes of payload sent from first endpoint (ASCII)
* Size of first payload-bearing packet sent from first endpoint
* First four bytes of payload sent from second endpoint (in hex)
* First four bytes of payload sent from second endpoint (ASCII)
* Size of first payload-bearing packet sent from second endpoint

* lpi_find_unknown

Description:

This tool reports all the flows in a trace which libprotoident
was unable to identify. Identification only occurs when the flow has
concluded or expired, so it is not very effective for real-time
applications.

This is mainly intended as a tool to aid development of new protocol
identifiers.

Usage:
lpi_find_unknown

The input trace must be a valid libtrace URI.
See https://github.com/LibtraceTeam/libtrace/wiki/Supported-Trace-Formats
to learn more about libtrace URIs. Note that a URI may be a live
source, such as a network interface.

Output:
For each unknown flow in the input trace, a single line is printed to
stdout describing the flow. The line contains the following fields
separated by spaces (in order):

* IP address of the first endpoint
* IP address of the second endpoint
* Port used by the first endpoint
* Port used by the second endpoint
* Transport protocol (6 = TCP, 17 = UDP)
* Unix timestamp when the flow began
* Total bytes sent from first endpoint to second endpoint
* Total bytes sent from second endpoint to first endpoint
* First four bytes of payload sent from first endpoint (in hex)
* First four bytes of payload sent from first endpoint (ASCII)
* Size of first payload-bearing packet sent from first endpoint
* First four bytes of payload sent from second endpoint (in hex)
* First four bytes of payload sent from second endpoint (ASCII)
* Size of first payload-bearing packet sent from second endpoint

* lpi_arff

Description:
This tool is similar to lpi_protoident except that it writes its
output in the ARFF format so that it is compatible with the Weka
machine learning software (http://www.cs.waikato.ac.nz/ml/weka/).

This tool was contributed by Paweł Foremski .

Usage:
lpi_arff

The input trace must be a valid libtrace URI.
See https://github.com/LibtraceTeam/libtrace/wiki/Supported-Trace-Formats
to learn more about libtrace URIs. Note that a URI may be a live
source, such as a network interface.

Output:
The output begins with a series of lines describing each feature that
will be used to describe each flow. Following that,
for each flow in the input trace, a single line is printed to stdout
describing the flow. The line contains the following fields separated
by commas (in order):

* Application protocol (as reported by libprotoident)
* ID number for the application protocol
* Total number of packets sent from first endpoint to second endpoint
* Total number of bytes sent from first endpoint to second endpoint
* Total number of packets sent from second endpoint to first endpoint
* Total number of bytes sent from second endpoint to first endpoint
* Minimum payload size sent from first endpoint to second endpoint
* Mean payload size sent from first endpoint to second endpoint
* Maximum payload size sent from first endpoint to second endpoint
* Standard deviation of payload size sent from first endpoint to
second endpoint
* Minimum payload size sent from second endpoint to first endpoint
* Mean payload size sent from second endpoint to first endpoint
* Maximum payload size sent from second endpoint to first endpoint
* Standard deviation of payload size sent from second endpoint to
first endpoint
* Minimum packet interarrival time for packets sent from first
endpoint to second endpoint
* Mean packet interarrival time for packets sent from first
endpoint to second endpoint
* Maximum packet interarrival time for packets sent from first
endpoint to second endpoint
* Standard deviation of packet interarrival time for packets sent from
first endpoint to second endpoint
* Minimum packet interarrival time for packets sent from second
endpoint to first endpoint
* Mean packet interarrival time for packets sent from second
endpoint to first endpoint
* Maximum packet interarrival time for packets sent from second
endpoint to first endpoint
* Standard deviation of packet interarrival time for packets sent from
second endpoint to first endpoint
* Flow duration (in microseconds)
* Flow start time (as a Unix timestamp)

API
===

If you want to develop your own tools based on libprotoident, you'll need to
use the libprotoident API. The API is very simple and the best way to learn it
is to examine how the existing tools work. The source for the tools is
located in the tools/ directory.

The tools use libflowmanager to do the flow tracking, using an instance of a
FlowManager class. You will probably want to incorporate this into your own
tool. Usage of libprotoident itself is through functions beginning with 'lpi_'.

The libprotoident API functions themselves are documented in
lib/libprotoident.h if you need further guidance.

Further documentation of the API can also be found at
https://github.com/LibtraceTeam/libflowmanager

If all else fails, drop me a line at [email protected]