An open API service indexing awesome lists of open source software.

https://github.com/xiaodaigh/julia-data-science-base-docker-img

Julia Data Science Docker with data science packages compiled for instant loading!
https://github.com/xiaodaigh/julia-data-science-base-docker-img

Last synced: 2 months ago
JSON representation

Julia Data Science Docker with data science packages compiled for instant loading!

Awesome Lists containing this project

README

        

# There is a now a better way to do this with PackageCompiler.jl

# Intro
Julia Data Science Docker with data science packages compiled for instant loading!

Time-to-first-plot (TTFP) is often regarded as one of Julia's main pain points. The PackageCompiler.jl package can compile these package and alleviate the pain. It works by pre-"compiling" the packages and baking them into the julia sysimage so that `using Pkg1` will be fast just like base packages.

This is an experimental first attempt at making data science packages used by me into a docker image with pre-compiled data science packages.

## Usage

Firstly, install Docker. If you are running Windows, I recommend installing git so you have access to git bash.

On Windows you IP can be found using `ipconfig` and on Linux with `ifconfig`. This is needed if you wish to do plotting from the docker image.

**Basic: Windows**
```bash
docker run --rm \
-e DISPLAY=YOUR_IP:0.0 \
-e JUPYTER_ENABLE_LAB=yes \
-v "$PWD":/home/jovyan/work\
-it -p 8888:8888 \
xiaodaidocker2019/julia-data-science-base
```

Often one may wish to save the data to somewhere on the hard drive, you may do this by attaching a local folder to the directory `somedir`.

## Packages

The below packages are compiled using PackageCompiler.jl into the image

| Package | Type | Notes |
| ------------------- | ------------------------------- | ------------------------------- |
| CategoricalArrays | Foundation | |
| Clustering | Unsupervised learning | |
| CSV | Data IO | |
| DataConvenience | Data Manipulation/Convenience | |
| DataFrames | Data Manipulation | |
| DataFramesMeta | Data Manipulation | |
| DecisionTree | Supervised learning | |
| FastGroupBy | Data Manipulation/Convenience | |
| Feather | Data IO | |
| FreqTables | Foundation/Statistics | |
| GLM | Supervised learning | |
| JDF | Data IO | For reading/writing JDF files |
| JLBoost | Supervised learning | |
| Lazy | Data Manipulation/Convenience | |
| Missings | Foundation | |
| Parquet | Data IO | ParquetFiles is quite broken at the moment |
| Plots | Plotting | |
| RDatasets | Data | |
| SortingLab | Data Manipulation/Convenience | |
| StatsBase | Foundation/Statistics | |
| StatsPlots | Plotting | |
| Tables | Data Manipulation/Convenience | |
| TableView | Data Viewing | |
| XGBoost | Supervised learning | |

The below packages are included but not compiled

| Package | Type | Notes |
| -- | -- | -- |
| Pipe | Data Manipulation/Convenience | If compiled into base then there is warning message with Pipe |
| TableView | Data Viewing | If compiled then doesn't work with JupyterLab |