https://github.com/nlink-jp/lookup
A powerful Go-based CLI tool to enrich JSON/JSONL data streams by looking up values in CSV or JSON files. Inspired by Splunk's lookup command.
https://github.com/nlink-jp/lookup
cli csv data-enrichment data-pipeline devpos-tools go golang json jsonl lookup splunk
Last synced: 2 months ago
JSON representation
A powerful Go-based CLI tool to enrich JSON/JSONL data streams by looking up values in CSV or JSON files. Inspired by Splunk's lookup command.
- Host: GitHub
- URL: https://github.com/nlink-jp/lookup
- Owner: nlink-jp
- License: mit
- Created: 2025-08-26T09:31:49.000Z (10 months ago)
- Default Branch: main
- Last Pushed: 2026-03-29T06:16:43.000Z (3 months ago)
- Last Synced: 2026-04-04T11:40:02.329Z (3 months ago)
- Topics: cli, csv, data-enrichment, data-pipeline, devpos-tools, go, golang, json, jsonl, lookup, splunk
- Language: Go
- Homepage:
- Size: 69.3 KB
- Stars: 0
- Watchers: 0
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- Changelog: CHANGELOG.md
- License: LICENSE
Awesome Lists containing this project
README
# lookup: A Powerful CLI Lookup Tool
`lookup` is a command-line utility inspired by Splunk's powerful `lookup` command. It enriches JSON data streams by adding fields based on matching values in an external data source (like a CSV or JSON file). It's designed to be a flexible and high-performance tool for data enrichment pipelines.
The tool reads JSON objects (either as a JSON Array or as JSON Lines) from standard input, performs lookups based on sophisticated rules, and outputs the enriched JSON objects to standard output.
---
## Features
- **Multiple Data Sources**: Use either **CSV** or **JSON** files as your lookup table.
- **Advanced Matching Methods**:
- `exact`: Case-sensitive or insensitive exact string matching.
- `wildcard`: Glob-style wildcard matching (e.g., `bot-*`).
- `regex`: Powerful matching using regular expressions.
- `cidr`: Match IP addresses against CIDR blocks (e.g., `10.0.0.0/8`).
- **Flexible Configuration**: A central JSON configuration file separates lookup logic from your data, allowing for complex matching rules.
- **Built-in DNS Lookup**: Perform forward (`A` record) or reverse (`PTR` record) DNS lookups as a native feature.
- Optionally specify a custom DNS server for queries.
- **Flexible Field Mapping**: Intuitive syntax (` as OUTPUT out1 as new1, ...`) to control which fields are matched and how new fields are named.
- **Handles Multiple Input Formats**: Automatically detects and processes both **JSON Array** and **JSON Lines (JSONL)** from stdin.
- **Cross-Platform**: Written in Go, it compiles to a single binary with no external dependencies, running on Linux, macOS, and Windows.
---
## Installation
Pre-compiled binaries for macOS, Windows, and Linux are available on the [Releases](https://github.com/nlink-jp/lookup/releases) page.
---
## Configuration
The configuration file defines where your data is and how to match against it.
### Configuration File (`config.json`)
```json
{
"data_source": "./path/to/your/data.csv",
"matchers": [
{
"input_field": "field_from_stdin",
"lookup_field": "column_in_data_source",
"method": "exact",
"case_sensitive": false
},
{
"input_field": "another_field_from_stdin",
"lookup_field": "another_column",
"method": "regex"
}
]
}
```
- **`data_source`**: (string) The relative or absolute path to your lookup data file (CSV or JSON).
- **`matchers`**: (array) A list of objects, where each object defines a specific matching rule.
- **`input_field`**: A name for this lookup rule. This is what you refer to in the `-m` flag.
- **`lookup_field`**: The column/key name in your `data_source` file to match against.
- **`method`**: The matching algorithm to use. Supported values:
- `"exact"` (default)
- `"wildcard"`
- `"regex"`
- `"cidr"`
- **`case_sensitive`**: (boolean, optional) If `true`, the match will be case-sensitive. Defaults to `false`. This applies to `exact`, `wildcard`, and `regex` methods.
### Configuration Helper (`generate-config`)
To make setup easier, `lookup` provides a helper command to generate a configuration template from your data file. It scans your CSV, JSON, or JSONL file and creates a valid `config.json` structure based on the headers or keys it finds.
```sh
./lookup generate-config -file
```
- **`-file `**: The path to your data source file (e.g., `users.csv` or `data.jsonl`).
**Example:** Given a `users.csv` file with columns `username`, `department`, and `role`:
```sh
./lookup generate-config -file users.csv
```
Output (save as `config.json` and edit as needed):
```json
{
"data_source": "users.csv",
"matchers": [
{
"input_field": "username",
"lookup_field": "username",
"method": "exact",
"case_sensitive": false
},
{
"input_field": "department",
"lookup_field": "department",
"method": "exact",
"case_sensitive": false
},
{
"input_field": "role",
"lookup_field": "role",
"method": "exact",
"case_sensitive": false
}
]
}
```
---
## Usage
The basic command structure is:
```sh
cat input.json | ./lookup -c -m ""
```
### Command-Line Flags
| Flag | Description | Required |
| :------------- | :--------------------------------------------------------------------------------------------------------------------------------------- | :------- |
| `-c ` | Path to the JSON configuration file that defines the data source and matching rules. | Yes |
| `-m ` | The mapping rule that specifies how to link input data to the lookup table. (See [Mapping Syntax](#mapping-syntax) below). | Yes |
| `--dns` | Enables DNS lookup mode. When used, the `-c` flag is ignored. | No |
| `--dns-server` | (Optional) Specifies a custom DNS server for DNS lookups (e.g., `8.8.8.8` or `1.1.1.1:53`). If not set, the system's default resolver is used. | No |
### Mapping Syntax
The `-m` flag defines the link between the input stream and the lookup table, and controls the output.
```
"CONFIG_REF_FIELD as INPUT_FIELD [OUTPUT original_name1 as new_name1, original_name2 as new_name2]"
```
- **`CONFIG_REF_FIELD as INPUT_FIELD`**: (Required)
- `CONFIG_REF_FIELD` must match an `input_field` in one of your matchers in `config.json`.
- `INPUT_FIELD` is the name of the field in the incoming JSON object to use for the lookup.
- **`OUTPUT ...`**: (Optional)
- Controls which fields from the lookup file are added to the output and allows you to rename them.
- If omitted, all columns from the matched row are added with their original names.
### Examples
Let's use the following files for our examples.
**`users.csv`** (Data Source)
```csv
username,department,role,building,ip_range
jdoe,Sales,Manager,A,192.168.1.10
asmith,Engineering,Developer,B,192.168.1.25
b-*,Engineering,QA,B,10.0.0.0/8
^scanner-.*$,IT,Service,A,
```
**`lookup_config.json`** (Configuration)
```json
{
"data_source": "./users.csv",
"matchers": [
{
"input_field": "user_lookup",
"lookup_field": "username",
"method": "exact",
"case_sensitive": false
},
{
"input_field": "host_lookup",
"lookup_field": "username",
"method": "wildcard"
},
{
"input_field": "process_lookup",
"lookup_field": "username",
"method": "regex"
},
{
"input_field": "ip_lookup",
"lookup_field": "ip_range",
"method": "cidr"
}
]
}
```
**`input.jsonl`** (Input Data)
```json
{"timestamp": "2023-10-28T11:00:00Z", "user": "JDOE", "event": "login"}
{"timestamp": "2023-10-28T11:01:00Z", "hostname": "b-jones", "event": "connect"}
{"timestamp": "2023-10-28T11:02:00Z", "process": "scanner-01", "event": "scan"}
{"timestamp": "2023-10-28T11:03:00Z", "client_ip": "10.20.30.40", "event": "access"}
{"timestamp": "2023-10-28T11:04:00Z", "client_ip": "8.8.8.8", "event": "external_access"}
```
#### Example 1: Case-Insensitive `exact` Match
```sh
cat input.jsonl | ./lookup \
-c lookup_config.json \
-m "user_lookup as user OUTPUT department as dept, role"
```
**Output (for the first line):**
```json
{"dept":"Sales","event":"login","role":"Manager","timestamp":"2023-10-28T11:00:00Z","user":"JDOE"}
```
#### Example 2: `cidr` Match
```sh
cat input.jsonl | ./lookup \
-c lookup_config.json \
-m "ip_lookup as client_ip"
```
**Output (for the fourth line):**
```json
{"building":"B","client_ip":"10.20.30.40","department":"Engineering","event":"access","ip_range":"10.0.0.0/8","role":"QA","timestamp":"2023-10-28T11:03:00Z","username":"b-*"}
```
#### Example 3: DNS Lookup
**Command (using system resolver):**
```sh
cat input.jsonl | ./lookup \
--dns \
-m "dns_reverse_lookup as client_ip OUTPUT hostname as resolved_host"
```
**Command (using a custom DNS server):**
```sh
cat input.jsonl | ./lookup \
--dns \
--dns-server "8.8.8.8" \
-m "dns_reverse_lookup as client_ip OUTPUT hostname as resolved_host"
```
**Output (for the last line, may vary):**
```json
{"client_ip":"8.8.8.8","event":"external_access","resolved_host":"dns.google","timestamp":"2023-10-28T11:04:00Z"}
```
---
## Building
To build from source, you need Go and Make installed.
```sh
# Build for the current platform
make build
# Cross-compile for all platforms (Linux amd64/arm64, macOS amd64/arm64, Windows amd64)
make build-all
# Build all binaries and create .zip archives in dist/
make package
```
The build artifacts will be placed in the `dist/` directory.
---
## Documentation
- [Architecture](docs/en/architecture.md) — Module structure, data flow, testing strategy
- [アーキテクチャ](docs/ja/architecture.ja.md) — Japanese version
- [日本語ドキュメント](README.ja.md)
## License
This project is licensed under the [MIT License](LICENSE).