Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/orangesi/tinypandas
easy-to-use data structures and data analysis tools ( still be in draft, inspired by Python Pandas )
https://github.com/orangesi/tinypandas
crystal crystal-lang csv dataframe pandas-python vcf
Last synced: 2 months ago
JSON representation
easy-to-use data structures and data analysis tools ( still be in draft, inspired by Python Pandas )
- Host: GitHub
- URL: https://github.com/orangesi/tinypandas
- Owner: orangeSi
- License: mit
- Created: 2019-10-01T01:07:22.000Z (over 5 years ago)
- Default Branch: master
- Last Pushed: 2020-09-27T05:41:20.000Z (over 4 years ago)
- Last Synced: 2023-10-19T16:43:50.113Z (over 1 year ago)
- Topics: crystal, crystal-lang, csv, dataframe, pandas-python, vcf
- Language: Crystal
- Homepage:
- Size: 1.72 MB
- Stars: 10
- Watchers: 3
- Forks: 2
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# tinypandas
TODO: Write a description here
## Installation
1. Add the dependency to your `shard.yml`:
```yaml
dependencies:
tinypandas:
github: orangeSi/tinypandas
```2. Run `shards install`
## Features
```
1. support seprated by tab format or csv or vcf format file
```
## Usagetest code is in ```example/test.cr``` like this:
```crystal
require "tinypandas"pd = Tinypandas.new
## support seprate by tab format file
df = pd.read_table(ifile, sep: "\t") # def read_table(filepath_or_buffer : String, sep = "\t", delimiter : String = "\n", header : HeaderType = 0, index_col : IndexColType = 0, comment : String|Regex = "#", skiprows : SkiprowsType = false, skip_blank_lines : Bool = true)puts "df is #{df}\n"
puts "df.to_str is\n#{df.to_str}\n"
puts "df[A2][B3] is #{df["A2"]["B3"]}\n"
puts "df[df[A2]>=5].to_str is"
puts df[df["A2"]>=5].to_strputs "df[df[A3]==9][A2].to_str is "
puts df[df["A3"]==9]["A2"].to_strputs "df[df[A3]>=3][A2].to_str is "
puts df[df["A3"]>=3]["A2"].to_strt = df["A2"]
puts "t = df[A2]is #{t}"
puts "t>2 is #{t>2}"puts "df.t.to_str is\n#{df.t.to_str}"
puts "df.t[B3][A1] is "
puts df.t["B3"]["A1"]## support vcf format file
df = pd.load_vcf("demo.vcf")
puts "df.head(1).to_s is\n"
puts df.head(1).to_s
puts "\n"## support csv format file
df = pd.load_csv("sample.csv")
puts "df is #{df}\n"
puts "df.to_str is\n#{df.to_str}\n"
puts "df[col2][2] is #{df["col2"]["2"]}\n"## convert Array(Array) to DataFrame
data = [[1,2,3],[4,5,6],[6,7,8]]
df = DataFrame.new(data, columns: ["c1","c2","c3"]) # read_array_by_row: true
puts "\nArray(Array()):#{data} to DataFrame:\n#{df.to_s}"## read Hash(String, Array()) as DataFrame
data = {"c1"=>[1,2,3], "c2"=>[4,5,6], "c3"=>[6,7,8]}
df = DataFrame.new(data)
puts "\nHash(String, Array()):#{data} to DataFrame:\n#{df.to_s}"```
then go to example ```cd example; crystal build test.cr --release```
```
$cat demo.xls
# note
A1 A3 A2
B1 1 3 2
B2 7 2 8
B3 4 9 5
```
then ```./test demo.xls``` or ```./test demo.xls.gz```
will get this:
```
## support seprate by tab format file
intpu file demo.xlsdf is DataFrame(@dict={"A1" => Series(@dict={"B1" => 1, "B2" => 7, "B3" => 4}), "A3" => Series(@dict={"B1" => 3, "B2" => 2, "B3" => 9}), "A2" => Series(@dict={"B1" => 2, "B2" => 8, "B3" => 5})}, @index=["B1", "B2", "B3"], @columns=["A1", "A3", "A2"])
df.to_str is
A1 A3 A2
B1 1 3 2
B2 7 2 8
B3 4 9 5df[A2][B3] is 5
df[df[A2]>=5].to_str is
A1 A3 A2
B2 7 2 8
B3 4 9 5df[df[A3]==9][A2].to_str is
B3 5df[df[A3]>=3][A2].to_str is
B1 2
B3 5
t = df[A2]is Series(@dict={"B1" => 2, "B2" => 8, "B3" => 5})
t>2 is Series(@dict={"B2" => 8, "B3" => 5})df.t.to_str is
B1 B2 B3
A1 1 7 4
A3 3 2 9
A2 2 8 5df.t[B3][A1] is
4## support vcf format file
df.head(1).to_s is
#CHROM POS ID REF ALT QUAL FILTER INFO FORMAT HG00096 HG00097 HG00099
0 MT 10 . T C 100 fa VT=S;AC=3 GT 0 0 0## support csv format file
df is DataFrame(@dict={"date" => Series(@dict={"0" => "2020-02-01 12:00:02", "1" => "2020-02-01 12:00:07", "2" => "2020-02-01 12:00:12", "3" => "2020-02-01 12:00:17", "4" => "2020-02-01 12:00:22", "5" => "2020-02-01 12:00:27", "6" => "2020-02-01 12:00:32", "7" => "2020-02-01 12:00:37"}), "col1" => Series(@dict={"0" => 66808, "1" => 66873, "2" => 66875, "3" => 66874, "4" => 66881, "5" => 66858, "6" => 66905, "7" => 66885}), "col2" => Series(@dict={"0" => 0.68, "1" => 0.67, "2" => 0.65, "3" => 0.67, "4" => 0.67, "5" => 0.66, "6" => 0.64, "7" => 0.66}), "col3" => Series(@dict={"0" => "TRUE", "1" => "FALSE", "2" => "TRUE", "3" => "FALSE", "4" => "TRUE", "5" => "FALSE", "6" => "TRUE", "7" => "FALSE"}), "col4" => Series(@dict={"0" => "str1", "1" => "str2", "2" => "str3", "3" => "str4", "4" => "str5", "5" => "str6", "6" => "str7", "7" => "str8"})}, @index=["0", "1", "2", "3", "4", "5", "6", "7"], @columns=["date", "col1", "col2", "col3", "col4"])
df.to_str is
date col1 col2 col3 col4
0 2020-02-01 12:00:02 66808 0.68 TRUE str1
1 2020-02-01 12:00:07 66873 0.67 FALSE str2
2 2020-02-01 12:00:12 66875 0.65 TRUE str3
3 2020-02-01 12:00:17 66874 0.67 FALSE str4
4 2020-02-01 12:00:22 66881 0.67 TRUE str5
5 2020-02-01 12:00:27 66858 0.66 FALSE str6
6 2020-02-01 12:00:32 66905 0.64 TRUE str7
7 2020-02-01 12:00:37 66885 0.66 FALSE str8df[col2][2] is 0.65
Array(Array()):[[1, 2, 3], [4, 5, 6], [6, 7, 8]] to DataFrame:
c1 c2 c3
0 1 2 3
1 4 5 6
2 6 7 8Hash(String, Array()):{"c1" => [1, 2, 3], "c2" => [4, 5, 6], "c3" => [6, 7, 8]} to DataFrame:
c1 c2 c3
0 1 4 6
1 2 5 7
2 3 6 8```
TODO: Write usage instructions here
## Development
TODO: Write development instructions here
## Contributing
1. Fork it ()
2. Create your feature branch (`git checkout -b my-new-feature`)
3. Commit your changes (`git commit -am 'Add some feature'`)
4. Push to the branch (`git push origin my-new-feature`)
5. Create a new Pull Request## Contributors
- [orangeSi](https://github.com/orangeSi) - creator and maintainer