Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/pxyup/fitter
New way for collect information from the API's/Websites
https://github.com/pxyup/fitter
api go golang json parsing scraper scrawler
Last synced: 3 days ago
JSON representation
New way for collect information from the API's/Websites
- Host: GitHub
- URL: https://github.com/pxyup/fitter
- Owner: PxyUp
- License: mit
- Created: 2023-02-05T19:18:13.000Z (almost 2 years ago)
- Default Branch: master
- Last Pushed: 2024-12-12T01:12:20.000Z (14 days ago)
- Last Synced: 2024-12-15T16:05:07.582Z (10 days ago)
- Topics: api, go, golang, json, parsing, scraper, scrawler
- Language: Go
- Homepage:
- Size: 44.3 MB
- Stars: 120
- Watchers: 5
- Forks: 9
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
- Code of conduct: CODE_OF_CONDUCT.md
Awesome Lists containing this project
README
# [Fitter + Fitter CLI](https://github.com/PxyUp/fitter)
Fitter - new way for collect information from the API's/Websites
Fitter CLI - small cli command which provide result from Fitter for test/debug/home usage
Fitter Lib - library which provide functional of fitter CLI as a library
![](https://github.com/PxyUp/fitter/blob/master/demo.gif)
# Way to collect information
1. **Server** - parsing response from some API's or http request(usage of http.Client)
2. **Browser** - emulate real browser using chromium + docker + playwright/cypress and get DOM information
3. **Static** - parsing static string as data# Format which can be parsed
1. **JSON** - parsing JSON to get specific information
2. **XML** - parsing xml tree to get specific information
3. **HTML** - parsing dom tree to get specific information
4. **XPath** - parsing dom tree to get specific information but by xpath# Use like a library
```bash
go get github.com/PxyUp/fitter
``````go
package mainimport (
"fmt"
"github.com/PxyUp/fitter/lib"
"github.com/PxyUp/fitter/pkg/config"
"log"
"net/http"
)func main() {
res, err := lib.Parse(&config.Item{
ConnectorConfig: &config.ConnectorConfig{
ResponseType: config.Json,
Url: "https://random-data-api.com/api/appliance/random_appliance",
ServerConfig: &config.ServerConnectorConfig{
Method: http.MethodGet,
},
},
Model: &config.Model{
ObjectConfig: &config.ObjectConfig{
Fields: map[string]*config.Field{
"my_id": {
BaseField: &config.BaseField{
Type: config.Int,
Path: "id",
},
},
"generated_id": {
BaseField: &config.BaseField{
Generated: &config.GeneratedFieldConfig{
UUID: &config.UUIDGeneratedFieldConfig{},
},
},
},
"generated_array": {
ArrayConfig: &config.ArrayConfig{
RootPath: "@this|@keys",
ItemConfig: &config.ObjectConfig{
Field: &config.BaseField{
Type: config.String,
},
},
},
},
},
},
},
}, nil, nil, nil, nil)
if err != nil {
log.Fatal(err)
}
fmt.Println(res.ToJson())
}```
Output:
```json
{
"generated_array": ["id","uid","brand","equipment"],
"my_id": 6000,
"generated_id": "26b08b73-2f2e-444d-bcf2-dac77ac3130e"
}
```# How to use Fitter
[Download latest version from the release page](https://github.com/PxyUp/fitter/releases)
or locally:
```bash
go run cmd/fitter/main.go --path=./examples/config_api.json
```### Arguments
1. **--path** - string[""] - path for the configuration of the Fitter
2. **--url** - string[""] - url for the configuration of the Fitter
3. **--verbose** - bool[false] - enable logging
4. **--plugins** - string[""] - [path for plugins for Fitter](https://github.com/PxyUp/fitter/blob/master/examples/plugin/README.md)
5. **--log-level** - enum["info", "error", "debug", "fatal"] - set log level(only if verbose set to true)# How to use Fitter_CLI
[Download latest version from the release page](https://github.com/PxyUp/fitter/releases)
or locally:
```bash
go run cmd/cli/main.go --path=./examples/cli/config_cli.json
```### Arguments
1. **--path** - string[""] - path for the configuration of the Fitter_CLI
2. **--url** - string[""] - url for the configuration of the Fitter_CLI
3. **--copy** - bool[false] - copy information into clipboard
4. **--pretty** - bool[true] - make readable result(also affect on copy)
5. **--verbose** - bool[false] - enable logging
6. **--omit-error-pretty** - bool[false] - Provide pure value if pretty is invalid
7. **--plugins** - string[""] - [path for plugins for Fitter](https://github.com/PxyUp/fitter/blob/master/examples/plugin/README.md)
8. **--log-level** - enum["info", "error", "debug", "fatal"] - set log level(only if verbose set to true)
9. **--input** - string[""] - specify input value for [formatting](#placeholder-list). Examples: `--input=\""124"\"` `--input=124` `--input='{"test": 5}'````bash
./fitter_cli_${VERSION} --path=./examples/cli/config_cli.json --copy=true
```Examples:
1. **Server version** [HackerNews + Quotes + Guardian News](https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json) - using API + HTML + XPath parsing
2. **Chromium version** [Guardian News + Quotes](https://github.com/PxyUp/fitter/blob/master/examples/cli/config_browser.json) - using HTML parsing + browser emulation
3. **Docker version** [Docker version: Guardian News + Quotes](https://github.com/PxyUp/fitter/blob/master/examples/cli/config_docker.json) - using HTML parsing + browser from Docker image
4. **Playwright version** [Playwright version: Guardian News + Quotes](https://github.com/PxyUp/fitter/blob/master/examples/cli/config_playwright.json) - using HTML parsing + browser from Playwright framework
5. **Playwright version** [Playwright version: England Cities + Weather](https://github.com/PxyUp/fitter/blob/master/examples/cli/config_weather.json) - using HTML + XPath parsing + browser from Playwright framework
6. **JSON version** [Generate pagination](https://github.com/PxyUp/fitter/blob/master/examples/cli/config_static_connector.json) - using static connector for generate pagination array
7. **Server version** [Get current time](https://github.com/PxyUp/fitter/blob/master/examples/cli/config_current_time.json) - get time from url and format it# Configuration
## Connector
It is the way how you fetch the data```go
type ConnectorConfig struct {
ResponseType ParserType `json:"response_type" yaml:"response_type"`
Url string `json:"url" yaml:"url"`
Attempts uint32 `json:"attempts" yaml:"attempts"`
NullOnError bool `yaml:"null_on_error" json:"null_on_error"`
StaticConfig *StaticConnectorConfig `json:"static_config" yaml:"static_config"`
IntSequenceConfig *IntSequenceConnectorConfig `json:"int_sequence_config" yaml:"int_sequence_config"`
ServerConfig *ServerConnectorConfig `json:"server_config" yaml:"server_config"`
BrowserConfig *BrowserConnectorConfig `yaml:"browser_config" json:"browser_config"`
PluginConnectorConfig *PluginConnectorConfig `json:"plugin_connector_config" yaml:"plugin_connector_config"`
ReferenceConfig *ReferenceConnectorConfig `yaml:"reference_config" json:"reference_config"`
FileConfig *FileConnectorConfig `json:"file_config" yaml:"file_config"`
}
```- NullOnError[false] - if set to true then all errors a ignored
- ResponseType - enum["HTML", "json","xpath"] - in which format data comes from the connector
- Attempts - how many attempts to use for fetch data by connector
- Url - define which address to request. Important: can be with [inject of the parent value as a string](#placeholder-list)
`https://api.open-meteo.com/v1/forecast?latitude={{{latitude}}}&longitude={{{longitude}}}&hourly=temperature_2m&forecast_days=1`Config can be one of:
- [ServerConfig](#serverconnectorconfig)
- [BrowserConfig](#browserconnectorconfig)
- [StaticConfig](#staticconnectorconfig)
- [PluginConnectorConfig](#pluginconnectorconfig)
- [ReferenceConfig](#referenceconnectorconfig)
- [IntSequenceConfig](#intsequenceconnectorconfig)
- [FileConfig](#fileconnectorconfig)Example:
```json
{
"response_type": "xpath",
"attempts": 3,
"url": "https://openweathermap.org/find?q={PL}",
"browser_config": {
"playwright": {
"timeout": 30,
"wait": 30,
"install": false,
"browser": "Chromium"
}
}
}
```### PluginConnectorConfig
Connector can be defined via plugin system. For use that you need apply next flags to Fitter/Cli(location of the plugins):
```bash
... --plugins=./examples/plugin
```--plugins - looking for all files with ".so" extension in provided folder(subdirs excluded)
```go
type PluginConnectorConfig struct {
Name string `json:"name" yaml:"name"`
Config json.RawMessage `json:"config" yaml:"config"`
}
``````json
{
"name": "connector",
"config": {
"name": "Elon"
}
}
```- Name - name of the plugin
- Config - json config of the plugin#### How to build plugin
Build plugin
```bash
go build -buildmode=plugin -gcflags="all=-N -l" -o examples/plugin/connector.so examples/plugin/hardcoder/connector.go
```Make sure you export **Plugin** variable which implements **pl.ConnectorPlugin** interface
Example for CLI:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_plugin.json#L5
Plugin example:
```go
package mainimport (
"encoding/json"
"fmt"
"github.com/PxyUp/fitter/pkg/config"
"github.com/PxyUp/fitter/pkg/logger"
"github.com/PxyUp/fitter/pkg/builder"
pl "github.com/PxyUp/fitter/pkg/plugins/plugin"
)var (
_ pl.ConnectorPlugin = &plugin{}Plugin plugin
)type plugin struct {
log logger.Logger
Name string `json:"name" yaml:"name"`
}func (pl *plugin) Get(parsedValue builder.Interfacable, index *uint32, input builder.Interfacable) ([]byte, error) {
return []byte(fmt.Sprintf(`{"name": "%s"}`, pl.Name)), nil
}func (pl *plugin) SetConfig(cfg *config.PluginConnectorConfig, logger logger.Logger) {
pl.log = loggerif cfg.Config != nil {
err := json.Unmarshal(cfg.Config, pl)
if err != nil {
pl.log.Errorw("cant unmarshal plugin configuration", "error", err.Error())
return
}
}
}
```### ReferenceConnectorConfig
Connector which allow get prefetched data from [references](#references)```go
type ReferenceConnectorConfig struct {
Name string `yaml:"name" json:"name"`
}
```Example
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json#L66
- Name - reference name from [references](#references) map
### IntSequenceConnectorConfig
Improved version of static connector which generate int sequence as result```go
type IntSequenceConnectorConfig struct {
Start int `json:"start" yaml:"start"`
End int `json:"end" yaml:"end"`
Step int `json:"step" yaml:"step"`
}
```
- Start[0] - start point for generation(**included**)
- End[0] - end point for generation(**excluded** from final result like range in any lang)
- Step[1] - interval for sequenceExample
```json
{
"start": 0,
"end": 2
// Generate [0, 1]
}
```[Config example](https://github.com/PxyUp/fitter/blob/master/examples/cli/config_seq.json)
### FileConnectorConfig
Connector type which fetch data from provided file```go
type FileConnectorConfig struct {
Path string `yaml:"path" json:"path"`
UseFormatting bool `yaml:"use_formatting" json:"use_formatting"`
}
```- Path - file path. Support [formatting](#placeholder-list)
- UseFormatting[false] - use [formatting](#placeholder-list) file content or not### StaticConnectorConfig
Connector type which fetch data from provided string
```go
type StaticConnectorConfig struct {
Value string `json:"value" yaml:"value"`
Raw json.RawMessage `json:"raw" yaml:"raw"`
}
```- Value - static string as data, can be html, json
- Raw - accept raw json. [Example](https://github.com/PxyUp/fitter/blob/master/examples/config_static.json). Also support [formatting](#placeholder-list)Example:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_static_connector.json#L5
```json
{
"value": "[1,2,3,4,5]"
}
```### ServerConnectorConfig
Connector type which fetch data using golang http.Client(server side request like curl)```go
type ServerConnectorConfig struct {
Method string `json:"method" yaml:"method"`
Headers map[string]string `yaml:"headers" json:"headers"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
JsonRawBody json.RawMessage `json:"json_raw_body" yaml:"json_raw_body"`
Body string `yaml:"body" json:"body"`
Proxy *ProxyConfig `yaml:"proxy" json:"proxy"`
}
```- Method - supported all http methods: GET, POST, PUT, DELETE, PATCH, OPTIONS, HEAD
- Headers - predefine headers for using during request [can be injected into key/value](#placeholder-list)
- Timeout[sec] - default 60sec timeout or used provided
- Body - body of the request, parsed value [can be injected](#placeholder-list)
- JsonRawBody - body of the request in json format; value [can be injected](#placeholder-list)
- Proxy - setup proxy for request [config](#proxy-config)Example:
```json
{
"method": "GET",
"proxy": {
"server": "http://localhost:8080",
"username": "pyx"
}
}
```Right now default timeout it is 10 sec.
##### Proxy config
```go
type ProxyConfig struct {
// Proxy to be used for all requests. HTTP and SOCKS proxies are supported, for example
// `http://myproxy.com:3128` or `socks5://myproxy.com:3128`. Short form `myproxy.com:3128`
// is considered an HTTP proxy.
Server string `json:"server" yaml:"server"`
// Optional username to use if HTTP proxy requires authentication.
Username string `json:"username" yaml:"username"`
// Optional password to use if HTTP proxy requires authentication.
Password string `json:"password" yaml:"password"`
}
```- Server - address with schema of proxy server. Also support [formatting](#placeholder-list)
- Username - username for proxy(can be empty). Also support [formatting](#placeholder-list)
- Password - password for proxy(can be empty). Also support [formatting](#placeholder-list)```json
{
"server": "http://localhost:8080",
"username": "pyx"
}
```##### Environment variables
1. **FITTER_HTTP_WORKER** - int[1000] - default concurrent HTTP workers### BrowserConnectorConfig
Connector type which emulate fetching of data via browser```go
type BrowserConnectorConfig struct {
Chromium *ChromiumConfig `json:"chromium" yaml:"chromium"`
Docker *DockerConfig `json:"docker" yaml:"docker"`
Playwright *PlaywrightConfig `json:"playwright" yaml:"playwright"`
}
```Config can be one of:
- [Chromium](#chromium) - use local installed Chromium for fetch data
- [Docker](#docker) - use docker as service for spin up container for fetch data
- [Playwright](#playwright) - use playwright framework for fetch dataExample:
```json
{
"docker": {
"wait": 10000,
"image": "docker.io/zenika/alpine-chrome:with-node",
"entry_point": "chromium-browser",
"purge": true
}
}
```#### Chromium
Use locally installed Chromium for fetch the data```go
type ChromiumConfig struct {
Path string `yaml:"path" json:"path"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Wait uint32 `yaml:"wait" json:"wait"`
Flags []string `yaml:"flags" json:"flags"`
}
```- Path - path to binary of Chromium
- Timeout[sec] - timeout for execution of the chromium
- Wait[msec] - timeout of page loading
- Flags - flags for Chromium default: "--headless", "--proxy-auto-detect", "--temp-profile", "--incognito", "--disable-logging", "--disable-extensions", "--no-sandbox"Example:
```json
{
"path": "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"wait": 10000
}
```#### Docker
Use Docker for spin up container for fetch data```go
type DockerConfig struct {
Image string `yaml:"image" json:"image"`
EntryPoint string `json:"entry_point" yaml:"entry_point"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Wait uint32 `yaml:"wait" json:"wait"`
Flags []string `yaml:"flags" json:"flags"`
Purge bool `json:"purge" yaml:"purge"`
NoPull bool `yaml:"no_pull" json:"no_pull"`
PullTimeout uint32 `yaml:"pull_timeout" json:"pull_timeout"`
}
```**Docker default image**: docker.io/zenika/alpine-chrome
- Image - image for the docker registry(provide with registry host)
- EntryPoint - cmd which will be run inside container
- Timeout[sec] - timeout for run container(without pulling image)
- Wait[msec] - timeout of page loading (works just for Chromium based containers)
- Flags - cmd arguments for run containers, default for Chromium based: "--no-sandbox","--headless", "--proxy-auto-detect", "--temp-profile", "--incognito", "--disable-logging", "--disable-gpu"
- Purge - should we remove container after work done(like docker rm)
- NoPull - prevent pulling of the image
- PullTimeout - define timeout for pull contains##### Environment variables
1. **DOCKER_HOST** - string - (EnvOverrideHost) to set the URL to the docker server.
2. **DOCKER_API_VERSION** - string - (EnvOverrideAPIVersion) to set the version of the API to use, leave empty for latest.
3. **DOCKER_CERT_PATH** - string - (EnvOverrideCertPath) to specify the directory from which to load the TLS certificates (ca.pem, cert.pem, key.pem).
4. **DOCKER_TLS_VERIFY** - bool - (EnvTLSVerify) to enable or disable TLS verification (off by default)Example:
```json
{
"wait": 10000,
"image": "docker.io/zenika/alpine-chrome:with-node",
"entry_point": "chromium-browser",
"purge": true
}
```#### Playwright
Run browsers via playwright framework```go
type PlaywrightConfig struct {
Browser PlaywrightBrowser `json:"browser" yaml:"browser"`
Install bool `yaml:"install" json:"install"`
Timeout uint32 `yaml:"timeout" json:"timeout"`
Wait uint32 `yaml:"wait" json:"wait"`
TypeOfWait *playwright.WaitUntilState `json:"type_of_wait" yaml:"type_of_wait"`
PreRunScript string `json:"pre_run_script" yaml:"pre_run_script"`
Stealth bool `json:"stealth" yaml:"stealth"`
Proxy *ProxyConfig `yaml:"proxy" json:"proxy"`
}
```- Browser - enum["Chromium", "FireFox", "WebKit"] - which browser to use
- Install - should we install browser
- Timeout[sec] - timeout to run playwright
- Wait[sec] - timeout of page loading
- TypeOfWait - enum["load", "domcontentloaded", "networkidle", "commit"] which state of page we waiting, default is "load"
- PreRunScript[""] - script which will be executed before reading content of the page. Also support placeholder [{PL}](#placeholder-list)
- Stealth[false] - add script for trying passing bot defends
- Proxy - setup proxy for request [config](#proxy-config)Example
```json
{
"timeout": 30,
"wait": 30,
"install": false,
"browser": "Chromium"
}
```## Model
With model we define result of the scrapping```go
type Model struct {
ObjectConfig *ObjectConfig `yaml:"object_config" json:"object_config"`
ArrayConfig *ArrayConfig `json:"array_config" yaml:"array_config"`
BaseField *BaseField `json:"base_field" yaml:"base_field"`
IsArray bool `json:"is_array" yaml:"is_array"`
}
```Config can be one of:
- [ObjectConfig](#objectconfig) - configuration of object format
- [ArrayConfig](#arrayconfig) - configuration of array format
- [BaseField](#basefield) - configuration of single/generated field
- IsArray - bool[false] - force indicate that field is array(usable in case of [model field](#model-field) with [base field](#basefield))Example:
```json
{
"object_config": {}
}
```### ObjectConfig
Configuration of the object and fields```go
type ObjectConfig struct {
Fields map[string]*Field `json:"fields" yaml:"fields"`
Field *BaseField `json:"field" yaml:"field"`
ArrayConfig *ArrayConfig `json:"array_config" yaml:"array_config"`
}
```Config can be one of:
- [Fields](#field) - map of each field definition; key - field name, value - configuration
- [Field](#basefield) - used for element of array; fields which will be deserialized like basic type like "string", "int" and etc (used here for case array of basic types)
- [ArrayConfig](#arrayconfig) - used for element of array; deserialization array of arrayExample:
```json
{
"fields": {
"title": {
"base_field": {
"type": "string",
"path": "type"
}
}
}
}
```### ArrayConfig
Configuration of the array and fields```go
type ArrayConfig struct {
RootPath string `json:"root_path" yaml:"root_path"`
Reverse bool `yaml:"reverse" json:"reverse"`
ItemConfig *ObjectConfig `json:"item_config" yaml:"item_config"`
LengthLimit uint32 `json:"length_limit" yaml:"length_limit"`
StaticConfig *StaticArrayConfig `json:"static_array" yaml:"static_array"`
}
```- RootPath - selector for find root element of the array or repeated element in case of html parsing, size of array will be amount of children element under the root
- Reverse - bool[false] - indicate that need use reverse iteration(n to 1)
- LengthLimit - for define size of array only for generated(not working for static)Config can be one of:
- [ItemConfig](#objectconfig) - configuration of each element of the array
- [StaticConfig](#static-array-config) - configuration of the static arrayExample:
```json
{
"root_path": "#content dt.quote > a",
"item_config": {
"field": {
"type": "string"
}
}
}
```#### Field
Common of the field```go
type Field struct {
BaseField *BaseField `json:"base_field" yaml:"base_field"`
ObjectConfig *ObjectConfig `json:"object_config" yaml:"object_config"`
ArrayConfig *ArrayConfig `json:"array_config" yaml:"array_config"`FirstOf []*Field `json:"first_of" yaml:"first_of"`
}
```Config can be one of:
- [BaseField](#basefield) - fields which will be deserialized like basic type like "string", "int" and etc
- [ObjectConfig](#objectconfig) - in case our field in nested object
- [ArrayConfig](#arrayconfig) - in case our field in array
- [FirstOf](#field) - first not empty resolved field will be selectedExample:
```json
{
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
```#### BaseField
In case we want get some static information or generate new one```go
type BaseField struct {
Type FieldType `yaml:"type" json:"type"`
Path string `yaml:"path" json:"path"`HTMLAttribute string `json:"html_attribute" yaml:"html_attribute"`
Generated *GeneratedFieldConfig `yaml:"generated" json:"generated"`
FirstOf []*BaseField `json:"first_of" yaml:"first_of"`
}
```- FieldType - enum["null", "boolean", "string", "int", "int64", "float", "float64", "array", "object", "html", "raw_string"] - static field for parse. **Important**: type html will only works from connector which return HTML (HTMLAttribute - have no effect in this case). [Example](https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json#L25)
- Path - selector(relative in case it is array child) for parsing
- HTMLAttribute - extra value which have effect only in HTML parsing via **goquery**. Here you can specify which attribute need to be parsed.**Important**: by default "string" type trimmed and all special chars is replaced, if you need plain string use "raw_string"
Config can be one of or empty:
- [Generated](#generatedfieldconfig) - field can be generated one which custom configuration
- [FirstOf](#basefield) - first not empty resolved field will be selectedExamples
```json
{
"generated": {
"uuid": {}
}
}
``````json
{
"type": "string",
"path": "text()"
}
```#### GeneratedFieldConfig
Provide functionality of generating field on the flight```go
type GeneratedFieldConfig struct {
UUID *UUIDGeneratedFieldConfig `yaml:"uuid" json:"uuid"`
Static *StaticGeneratedFieldConfig `yaml:"static" json:"static"`
Formatted *FormattedFieldConfig `json:"formatted" yaml:"formatted"`
Plugin *PluginFieldConfig `yaml:"plugin" json:"plugin"`
Calculated *CalculatedConfig `yaml:"calculated" json:"calculated"`
File *FileFieldConfig `yaml:"file" json:"file"`
Model *ModelField `yaml:"model" json:"model"`
FileStorageField *FileStorageField `json:"file_storage" yaml:"file_storage"`
}
```Config can be one of:
- [UUID](#uuid) - generate random UUID V4
- [Static](#static) - generate static field
- [Formatted](#formatted-field-config) - format field
- [Model](#model-field) - model generated from the other connector and model
- [Plugin](#plugin-field) - plugin field
- [Calculated](#calculated-field) - calculated field
- [File](#file-field) - file field (for download file from server)
- [FileStorage](#file-storage-field) - file field which can be saved to local fileExamples:
```json
{
"uuid": {}
}
```https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L58
```json
{
"model": {
"type": "array",
"model": {
"array_config": {
"root_path": "#content dt.quote > a",
"item_config": {
"field": {
"type": "string"
}
}
}
},
"connector_config": {
"response_type": "HTML",
"attempts": 3,
"browser_config": {
"url": "http://www.quotationspage.com/random.php",
"chromium": {
"path": "/Applications/Google Chrome.app/Contents/MacOS/Google Chrome",
"wait": 10000
}
}
}
}
}
```#### UUID
Generate random UUID V4 on the flight, can be used for generate uniq id```go
type UUIDGeneratedFieldConfig struct {
Regexp string `yaml:"regexp" json:"regexp"`
}
```- Regexp - provide matcher which can be used for get part of generated uuid
#### Static
Generate static field```go
type StaticGeneratedFieldConfig struct {
Type FieldType `yaml:"type" json:"type"`
Value string `json:"value" yaml:"value"`
Raw json.RawMessage `json:"raw" yaml:"raw"`
}
```- Type - enum["null", "boolean", "string", "int","int64","float","float64", "array", "object"] - type of the field
- Value - string value of the field
- Raw - pure json value of the fieldExample
```json
{
"type": "int",
"value": "65"
}
``````json
{
"type": "array",
"value": "[65,45]"
}
``````json
{
"type": "array",
"raw": [65,45]
}
```#### Formatted Field Config
Generate formatted field which will pass value from parent [base field](#basefield)```go
type FormattedFieldConfig struct {
Template string `yaml:"template" json:"template"`
}
```- Template - template in with placeholder [{PL}](#placeholder-list) where [parent](#basefield) value will be injected like string
Example:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L98
```json
{
"template": "https://news.ycombinator.com/item?id={PL}"
}
```#### File Storage Field
Field can be used for store field result as local file
```go
type FileStorageField struct {
Content string `json:"content" yaml:"content"`
Raw json.RawMessage `yaml:"raw" yaml:"raw"`
FileName string `json:"file_name" yaml:"file_name"`
Path string `json:"path" yaml:"path"`
Append bool `json:"append" yaml:"append"`
}
```- Content - template string for content. Important: can be with [inject of the parent value as a string](#placeholder-list)
- Raw - raw json content of field. Important: can be with [inject of the parent value as a string](#placeholder-list)
- FileName - local file name for storing file. By default, it is try get FileName from header, after that from url. Important: can be with [inject of the parent value as a string](#placeholder-list).
- Path - local file parent directory for storing file. Default path it is process directory. Important: can be with [inject of the parent value as a string](#placeholder-list)
- Append[false] - append to file or not```json
{
"content": "{{{id}}}, {{{message}}}\n",
"append": true,
"file_name": "{{{id}}}.csv",
"path": "/Users/pxyup/fitter/examples/cli/test/csv"
}
```#### File Field
Field can be used for download file from server locally
```go
type FileFieldConfig struct {
Config *ServerConnectorConfig `yaml:"config" json:"config"`Url string `yaml:"url" json:"url"`
FileName string `json:"file_name" yaml:"file_name"`
Path string `json:"path" yaml:"path"`
}
```- Config - [ServerConfig](#serverconnectorconfig) use default fitter http.Client for send request
- Url - url of the image. Important: URL in the connector can be with [inject of the parent value as a string](#placeholder-list)
- FileName - local file name for storing file. By default, it is try get FileName from header, after that from url. Important: can be with [inject of the parent value as a string](#placeholder-list).
- Path - local file parent directory for storing file. Default path it is process directory. Important: can be with [inject of the parent value as a string](#placeholder-list)Result of the field will be local file path as string
```json
{
"url": "https://images.shcdn.de/resized/w680/p/dekostoff-gobelinstoff-panel-oriental-cat-46-x-46_P19-KP_2.jpg",
"path": "/Users/pxyup/fitter/bin",
"config": {
"method": "GET"
}
}
```With propagated URL ([inject of the parent value as a string](#placeholder-list))
```json
{
"url": "https://picsum.photos{PL}",
"path": "/Users/pxyup/fitter/bin",
"config": {
"method": "GET"
}
}
```Config example:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_image.json
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_image_multiple.json
#### Calculated field
Field can generate different types depends from expression
```go
type CalculatedConfig struct {
Type FieldType `yaml:"type" json:"type"`
Expression string `yaml:"expression" json:"expression"`
}
```- Type - resulting type of expression\
- Expression - expression for calculation (we use [this lib](https://github.com/expr-lang/expr) for calculated expression)##### Predefined values
**FNull** - alias for builder.Nullvalue
**FNil** - alias for nil
**isNull(value T)** - function for check is value is FNull
**fRes** - it is raw(with proper type) result from the parsing [base field](#basefield)
**fIndex** - it is index in parent array(only if parent was array field)
**fResJson** - it is JSON string representation of the raw result
**fResRaw** - result in bytes format
**FNewLine** - new line separator
```json
{
"type": "bool",
"expression": "fRes > 500"
}
```#### Plugin field
Field can be some external plugin for fitter
[More](https://github.com/PxyUp/fitter/blob/master/examples/plugin/README.md)
```go
type PluginFieldConfig struct {
Name string `json:"name" yaml:"name"`
Config json.RawMessage `json:"config" yaml:"config"`
}
```- Name - name of the plugin(without extension just name)
- Config - json config of the plugin#### Model Field
Field type which can be generated on the flight by news [model](#model) and [connector](#connector)```go
type ModelField struct {
// Type of parsing
ConnectorConfig *ConnectorConfig `yaml:"connector_config" json:"connector_config"`
// Model of the response
Model *Model `yaml:"model" json:"model"`Type FieldType `yaml:"type" json:"type"`
Path string `yaml:"path" json:"path"`Expression string `yaml:"expression" json:"expression"`
}
```- [ConnectorConfig](#connector) - which connector to use. Important: URL in the connector can be with [inject of the parent value as a string](#placeholder-list)
- [Model](#model) - configuration of the underhood model
- Type - enum["null", "boolean", "string", "int", "int64", "float", "float64", "array", "object"] - type of generated field
- Path - in case we cant extract some information from generated field we can use json selector for extract
- Expression - string which can be used for post processing of the Model (**ignoring path field**)Examples:
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L60
```json
{
"type": "array",
"model": {
"array_config": {
"root_path": "#content dt.quote > a",
"item_config": {
"field": {
"type": "string"
}
}
}
}
}
```https://github.com/PxyUp/fitter/blob/master/examples/cli/config_weather.json#L37
```json
{
"type": "string",
"path": "temp.temp",
"model": {
"object_config": {
"fields": {
"temp": {
"base_field": {
"type": "string",
"path": "//div[@id='forecast_list_ul']//td/b/a/@href",
"generated": {
"model": {
"type": "string",
"model": {
"object_config": {
"fields": {
"temp": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
}
},
"connector_config": {
"response_type": "HTML",
"attempts": 4,
"url": "https://openweathermap.org{PL}",
"browser_config": {
"playwright": {
"timeout": 30,
"wait": 30,
"install": false,
"browser": "FireFox",
"type_of_wait": "networkidle"
}
}
}
}
}
}
}
}
}
},
"connector_config": {
"response_type": "xpath",
"attempts": 3,
"url": "https://openweathermap.org/find?q={PL}",
"browser_config": {
"playwright": {
"timeout": 30,
"wait": 30,
"install": false,
"browser": "Chromium"
}
}
}
}
```#### Static Array Config
Provide static(fixed length) array generation```go
type StaticArrayConfig struct {
Items map[uint32]*Field `yaml:"items" json:"items"`
Length uint32 `yaml:"length" json:"length"`
}
```
- [Items](#field) - map[uint32]*[Field](#field) - key is index in array, value is field definition
- Length - if set(1+) can be used for define custom length of arrayExamples:
```json
{
"0": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
``````json
{
"length": 4,
"0": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
``````json
{
"length": 4,
"2": {
"base_field": {
"type": "string",
"path": "div.current-temp span.heading"
}
}
}
```#### Placeholder list
1. {PL} - for inject value
2. {INDEX} - for inject index in parent array
3. {HUMAN_INDEX} - for inject index in parent array in human way
4. {{{json_path}}} - will get information from propagated "object"/"array" field
5. {{{RefName=SomeName}}} - get [reference](#references) value by name. [Example](https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json#L67)
6. {{{RefName=SomeName json.path}}} - get [reference](#references) value by name and extract value by json path. [Example](https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json#L67)
7. {{{FromEnv=ENV_KEY}}} - get value from environment variable
8. {{{FromExp=fRes + 5 + fIndex}}} - get value from the [expression](https://github.com/expr-lang/expr). [Predefined values](#predefined-values)
9. {{{FromInput=.}}} or {{{FromInput=json.path}}} - get value from input of trigger or library
10. {{{FromFile=./test_file.log}}} - get value from file by path. Content of file also can contain placeholders
11. {{{FromURL=http://localhost:8081}}} - get response from urlExamples:
```text
{{{FromExp="{{{FromEnv=TEST_VAL}}}" + "hello"}}}
```
```text
Current time is: {PL} with token from TokenRef={{{RefName=TokenRef}}} and TokenObjectRef={{{RefName=TokenObjectRef token}}}
```
```text
Current time is: {PL} with token from TokenRef={{{RefName=TokenRef}}} and TokenObjectRef={{{RefName=TokenObjectRef token}}}
```
```text
TokenRef={{{RefName=TokenRef}}} and TokenObjectRef={{{RefName=TokenObjectRef token}}} Object={{{value}}} {PL} Env={{{FromEnv=TEST_VAL}}} {INDEX} {HUMAN_INDEX}
```## References
Special map which **prefetched**(before any processing) and can be user for [connector](#referenceconnectorconfig) or for [placeholder](#placeholder-list)Can be used for:
1. Cache jwt token and use them in headers
2. Cache values
3. Etc### Reference
```go
type Reference struct {
*ModelField
Expire uint32 `yaml:"expire" json:"expire"`
}
```- [ModelField](#model-field) - is embedded struct, you can use same fields
- Expire[sec] - duration when reference is expired after fetching. Not set => **forever cached**. Set to 0 => **every time re-fetch**. Set to n > 0 => **cached for n second**For [Fitter](#how-to-use-fitter)
```go
type RefMap map[string]*Referencetype Config struct {
// Other Config FieldsLimits *Limits `yaml:"limits" json:"limits"`
References RefMap `json:"references" yaml:"references"`
}
```For [Fitter Cli](#how-to-use-fittercli)
```go
type RefMap map[string]*Referencetype CliItem struct {
// Other Config FieldsLimits *Limits `yaml:"limits" json:"limits"`
References RefMap `json:"references" yaml:"references"`
}
```- References - map[string]*Reference - object where is key if ReferenceName (can be user for [connector](#referenceconnectorconfig) or [placeholder](#placeholder-list)) and value is [Reference](#reference)
- [Limits](#limits)Example
https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json#L2
```json
{
"references": {
"TokenRef": {
"expire": 10,
"connector_config": {
"response_type": "json",
"static_config": {
"value": "\"plain token\""
}
},
"model": {
"base_field": {
"type": "string"
}
}
},
"TokenObjectRef": {
"connector_config": {
"response_type": "json",
"static_config": {
"value": "{\"token\":\"token from object\"}"
}
},
"model": {
"object_config": {
"fields": {
"token": {
"base_field": {
"type": "string",
"path": "token"
}
}
}
}
}
}
}
}
```[Example](https://github.com/PxyUp/fitter/blob/master/examples/cli/config_ref.json)
## Limits
Provide limitation for prevent DDOS, big usage of memory```go
type Limits struct {
HostRequestLimiter HostRequestLimiter `yaml:"host_request_limiter" json:"host_request_limiter"`
ChromiumInstance uint32 `yaml:"chromium_instance" json:"chromium_instance"`
DockerContainers uint32 `yaml:"docker_containers" json:"docker_containers"`
PlaywrightInstance uint32 `yaml:"playwright_instance" json:"playwright_instance"`
}
```- HostRequestLimiter - map[string]int64 - limitation per host name, key is host, value is amount of parallel request(usage for [server connector](#serverconnectorconfig))
- ChromiumInstance - amount of parallel [chromium](#chromium) instance
- DockerContainers - amount of parallel [docker](#docker) instance
- PlaywrightInstance - amount of parallel [playwright](#playwright) instancehttps://github.com/PxyUp/fitter/blob/master/examples/cli/config_cli.json#L2
```json
{
"limits": {
"host_request_limiter": {
"hacker-news.firebaseio.com": 5
},
"chromium_instance": 3,
"docker_containers": 3,
"playwright_instance": 3
}
}
```# Roadmap
1. Add browser scenario for preparing, after parsing
2. Add scrolling support for scenario
3. Add pagination support for scenario
4. Add notification methods for Fitter: Webhook/Queue