An open API service indexing awesome lists of open source software.

https://github.com/kyleburton/abstract-tables

Ruby ETL Utilities
https://github.com/kyleburton/abstract-tables

Last synced: 9 months ago
JSON representation

Ruby ETL Utilities

Awesome Lists containing this project

README

          

h1. Abstract Table Library

In the spirit of the standard Unix utilites (like cat, grep, cut, etc.), this library creates an abstraction over tabular data. The library implements a series of drivers for sources with different encodings that act like sequences.

h1. Command Line Utilities

h2. atcat


# dump a postgres table to the console (default is tab delimited)
atcat dbi://user:pass@Pg/localhost/db_name/table_name | atview | less
# if you create an $HOME/.abtab.dbrc (see below), you can omit the user/pass:
atcat dbi://pg/localhost/database_name/table_name

# export a table to a tab file:
atcat dbi://pg/localhost/database_name/table_name tab://table_name.tab

# you can omit the schema for files, the schema will be guessed based on the file extension
atcat dbi://pg/localhost/database_name/table_name table_name.tab

# dump to csv
atcat dbi://pg/localhost/database_name/table_name csv://table_name.csv
# you can omit the scehma
atcat dbi://pg/localhost/database_name/table_name table_name.csv

# convert from csv to tab
atcat csv://some-file.csv tab://some-file.tab

atcat some-file.csv some-file.tab

# convert form csv to pipe
atcat csv://some-file.csv 'tab://some-file.tab?col_sep=|'

h2. atview


atview tab://some-file.tab | less
atview csv://some-file.csv | less
atview dbi://pg/localhost/database_name/table_name | less
# view a list of tables:
atview dbi://pg/localhost/database_name | less

h2. atgrep

Filter a table source by passing an expression, which has access to the record via the var @r@:


atgrep -e 'r[0] == "this"' test/fixtures/files/file1.csv | atview

# ruby is mutable, so if you feel like being naughty, you can modify during the grep:
atgrep -e 'r[0] = r[0].upcase; r[0] == "THIS"' test/fixtures/files/file1.csv

# but that's what atmod is for...

h2. atcut

Filter a source, selecting specific columns.


# pull only the first and 3rd column:
bin/atcut -f 1,3 test/fixtures/files/file1.tab

h1. Suported Drivers

h2. tab

Native Ruby for now.

h3. col_sep

Defaults to a tab character.

h2. csv

Via the "FasterCSV":http://fastercsv.rubyforge.org/ ruby gem.

h3. col_sep

Override the default ',' column seperator.

h3. quote_char

Override the default quote_char (").

h2. dbi

(only tested with PostgreSQL)

h3. @$HOME/.abtab.dbrc@

With the dbi driver, you can create the file @.abtab.dbrc@ in your home directory and include the connection credentails for the databases you commonly use. It must be a YAML file, you can create it from a ruby script or irb with:


require 'yaml'

cfg = {
"localhost" => {
"database_name" => {
"user" => "username",
"pass" => "password"
}
}
}
File.open("#{ENV['HOME']}/.abtab.dbrc","w") do |f|
f.puts cfg.to_yaml
end

h1. License

h1. Authors

Kyle Burton