Ecosyste.ms: Awesome

An open API service indexing awesome lists of open source software.
Awesome Lists | Featured Topics | Projects
https://github.com/rouanw/flamongo

Helps your MongoDB queries fly by finding the best indexes for your data
https://github.com/rouanw/flamongo
Last synced: 4 days ago
JSON representation
Helps your MongoDB queries fly by finding the best indexes for your data
Host: GitHub
URL: https://github.com/rouanw/flamongo
Owner: rouanw
License: mit
Created: 2017-06-27T13:33:19.000Z (over 7 years ago)
Default Branch: master
Last Pushed: 2022-04-10T00:55:30.000Z (over 2 years ago)
Last Synced: 2024-04-25T13:21:20.474Z (7 months ago)
Language: JavaScript
Homepage:
Size: 303 KB
Stars: 10
Watchers: 2
Forks: 2
Open Issues: 4
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project

README

        # Flamongo

> A tool that helps you find the most efficient indexes for your MongoDB collections



[![Build Status](https://travis-ci.org/rouanw/flamongo.svg?branch=master)](https://travis-ci.org/rouanw/flamongo)

[![GitHub license](https://img.shields.io/github/license/rouanw/flamongo.svg)](https://github.com/rouanw/flamongo/blob/master/LICENSE)

[![npm version](https://badge.fury.io/js/flamongo.svg)](https://badge.fury.io/js/flamongo)

## What does Flamongo do?

If you want to figure out how to make your MongoDB queries more performant by finding the best indexes, you've come to the right place.

`flamongo explain` provides you with helpful, human-readable stats on how a your MongoDB indexes perform for your queries.

`flamongo best-index` finds the best MongoDB index for each of your queries.

- Flamongo will pump a test collection full of stub data

- Depending on the command you're running, flamongo will either create the indexes you've specified or create every possible index in turn

- Run your queries against the test collection

- Print out useful information and statistics, which will help you decide which indexes are best for your needs

## Install

```sh

$ npm install -g flamongo

```

## API

### `explain`

Explain how your queries perform against the provided indexes.

```

$ flamongo explain input.json

```

Where `input.json` looks something like:

```json

{

  "indexKeys": [

    { "name.first": 1 },

    { "birthday": 1 }

  ],

  "queries": [

    { "name.first": "Richard" },

    { "name.first": "John", "vegan": false, "name.last": { "$nin": ["Smith"] } }

  ],

  "schema": {

    "name": {

      "first": "first",

      "last": "last"

    },

    "vegan": "bool",

    "birthday": "date",

    "friends": [{

      "name": {

        "first": "first",

        "last": "last"

      }

    }]

  }

}

```

Example output:

```sh

Connection open

Inserting 90000 random documents

Created 3 indexes

Results for query:

{"name.first":"Richard"}

The query took 1 milliseconds to run and returned 167 documents.

167 keys and 167 documents were examined.

Stages:

2. Retrieve documents (FETCH):

Took around 10  milliseconds and returned 167 documents

Examined 167 documents

1. Scan index keys (IXSCAN):

Took around 10  milliseconds and returned 167 documents

Examined 167 keys on index name.first_1_vegan_1 in a forward direction

Results for query:

{"name.first":"John","vegan":false,"name.last":{"$nin":["Smith"]}}

The query took 2 milliseconds to run and returned 83 documents.

83 keys and 83 documents were examined.

Stages:

2. Retrieve documents (FETCH):

Took around 0  milliseconds and returned 83 documents

Examined 83 documents, using filter {"$not":{"name.last":{"$in":["Smith"]}}}

1. Scan index keys (IXSCAN):

Took around 0  milliseconds and returned 83 documents

Examined 83 keys on index name.first_1_vegan_1 in a forward direction

```

#### Programmatic API

```js

const flamongo = require('flamongo');

flamongo.explain(input, options, /* optional logging function */)

  .then((results) => { /* use results */ });

```

### `best-index`

Find the fastest index based on your queries and data.

```

$ flamongo best-index input.json

```

Where `input.json` looks something like:

```json

{

  "queries": [

    { "name.first": "Richard", "vegan": true },

    { "name.first": "John", "vegan": false, "name.last": { "$nin": ["Smith"] } }

  ],

  "schema": {

    "name": {

      "first": "first",

      "last": "last"

    },

    "vegan": "bool",

    "birthday": "date",

    "friends": [{

      "name": {

        "first": "first",

        "last": "last"

      }

    }]

  }

}

```

Example output:

```sh

Connection open

---------------------------------------------------------------------------------------------

Running the following query against 4 indexes to find the best index:

{"name.first":"Richard","vegan":true}

The best index based on your input appears to be vegan_1_name.first_1, which took 0 milliseconds. (102 documents were returned.)

You can create the index using this key: {"vegan":1,"name.first":1}

Rank  |  Index                 |  Time (ms)  |  Documents examined  |  Keys examined

1     |  vegan_1_name.first_1  |  0          |  102                 |  102

2     |  name.first_1_vegan_1  |  1          |  102                 |  102

3     |  name.first_1          |  1          |  182                 |  182

4     |  vegan_1               |  53         |  45141               |  45141

---------------------------------------------------------------------------------------------

Running the following query against 15 indexes to find the best index:

{"name.first":"John","vegan":false,"name.last":{"$nin":["Smith"]}}

The best index based on your input appears to be vegan_1_name.first_1, which took 1 milliseconds. (86 documents were returned.)

You can create the index using this key: {"vegan":1,"name.first":1}

Rank  |  Index                             |  Time (ms)  |  Documents examined  |  Keys examined

1     |  vegan_1_name.first_1              |  1          |  86                  |  86

2     |  name.first_1_vegan_1_name.last_1  |  1          |  86                  |  87

3     |  name.first_1                      |  1          |  161                 |  161

4     |  name.first_1_name.last_1          |  1          |  161                 |  162

5     |  name.last_1_name.first_1          |  9          |  161                 |  1160

6     |  name.first_1_vegan_1              |  11         |  86                  |  86

7     |  vegan_1                           |  52         |  44859               |  44859

8     |  vegan_1_name.last_1               |  91         |  44761               |  44763

9     |  name.last_1_vegan_1               |  94         |  44761               |  45261

10    |  name.last_1                       |  164        |  89811               |  89812

```

#### Alias

`flamongo best`

#### Programmatic API

```js

const flamongo = require('flamongo');

flamongo.bestIndex(input, options, /* optional logging function */)

  .then((results) => { /* use results */ });

```

#### Performance

`flamongo` is not yet clever enough to guess which indexes to try first, so it just tries every possible one. Note that due to the nature of [combinatorics](https://en.wikipedia.org/wiki/Enumerative_combinatorics), queries with many search terms will result in a huge number of [permutations](https://en.wikipedia.org/wiki/Permutation). 4 search terms will test 64 indexes, which will be slow but fine. 5 terms will need to check 325 indexes, which is a really long wait. Anything more than that is not worth doing.

## Input format

- `queries`: An array of queries to run against your Mongo collection. Flamongo will output stats for each one in turn.

- `schema`: A schema that Flamongo will use to fill a test collection with data. The type descriptions map to [Chance.js](http://chancejs.com/) generators. Optional if the `preserveData` option is specified (see below).

- `indexKeys`: An array of indexes to create. Only used for `flamongo explain`.

If you need to do something more complex, you can also specify a plain `.js` file that exports (i.e. `module.exports =`) a similar object.

### Schema

The schema __flamongo__ uses (via [`randoc`](https://www.npmjs.com/package/randoc)) loosely maps to functions offered by [Chance.js](http://chancejs.com/), with a few additional options. Here's an example schema that showcases some of the available functionality

```js

schema: {

  widget: {

    name: 'string',

    inventor: 'name',

    dateObtained: 'date',

    discountable: 'bool',

    warehouseId: {

      _type: 'enum',

      options: [543, 999, 1232, 110],

    },

    deleted: {

      _type: 'bool',

      args: {

        likelihood: 5,

      },

    },

    outOfStock: {

      _type: 'bool',

      args: {

        likelihood: 10,

      },

    },

  },

  status: {

    _type: 'enum',

    options: ['new', 'active', 'cancelled', ''],

  },

},

```

## Options

Option|Description|Default

---|---|---

`url`|URL of Mongo server. Note that Flamongo is meant for testing. See the Mongo [Connection String](https://docs.mongodb.com/manual/reference/connection-string/) docs for the URL format. By default, Flamongo is destructive. Use `preserveData` and `preserveIndexes` and be careful if you're planning to point it at your production server.|`mongodb://localhost:27017`

`databaseName` | Name of database to use | `test_indexes_db`

`collectionName` | Name of collection to use | `test_indexes_collection`

`preserveData` | When `true`, Flamongo will not insert or remove data | `false`

`preserveIndexes` | When `true`, Flamongo will not create or drop indexes. Note that this option will __not__ be honoured by `flamongo best-index`, which will always drop and create indexes. | `false`

`numberOfRecords` | Number of stub records to insert, using the specified `schema` | `90000`

`verbose` | Print out the full output of MongoDB's [explain results](https://docs.mongodb.com/manual/reference/explain-results/) | `false`

You can pass options on the command line. Example: `flamongo explain --numberOfRecords 240000 --preserveData true flamongo.json`

## Further reading

Reading the Mongo Docs on [Index Strategies](https://docs.mongodb.com/manual/applications/indexes/) will help you understand what factors influence how performant an index is. Some queries will actually be faster without any indexes!

## Helpful queries

Here are some queries you can run against your collection to see how your indexes are being used:

- `db.collection.aggregate( [ { $indexStats: { } } ] )` - usage info for each index

- `db.collection.stats().indexSizes` - memory size of each index

- `db.collection.totalIndexSize()` - total memory used by indexes

---

Logo: By Creative Tail [CC BY 4.0], via Wikimedia Commons