{"id":13937809,"url":"https://github.com/mratsim/Arch-Data-Science","last_synced_at":"2025-07-20T00:31:09.320Z","repository":{"id":85263703,"uuid":"80167332","full_name":"mratsim/Arch-Data-Science","owner":"mratsim","description":"Archlinux PKGBUILDs for Data Science, Machine Learning, Deep Learning, NLP and Computer Vision","archived":true,"fork":false,"pushed_at":"2019-11-09T09:35:46.000Z","size":157,"stargazers_count":96,"open_issues_count":0,"forks_count":4,"subscribers_count":14,"default_branch":"master","last_synced_at":"2025-07-19T19:53:18.362Z","etag":null,"topics":["archlinux","cuda","cudnn","data-science","deep-learning","lightgbm","machine-learning","mkl","mxnet","natural-language-processing","natural-language-understanding","nervana","opencv","package","pandas","pytorch","scikit-learn","spacy","tensorflow","xgboost"],"latest_commit_sha":null,"homepage":"","language":"Shell","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mratsim.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2017-01-26T23:54:59.000Z","updated_at":"2025-07-07T13:20:15.000Z","dependencies_parsed_at":"2023-05-25T04:30:10.961Z","dependency_job_id":null,"html_url":"https://github.com/mratsim/Arch-Data-Science","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/mratsim/Arch-Data-Science","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mratsim%2FArch-Data-Science","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mratsim%2FArch-Data-Science/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mratsim%2FArch-Data-Science/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mratsim%2FArch-Data-Science/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mratsim","download_url":"https://codeload.github.com/mratsim/Arch-Data-Science/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mratsim%2FArch-Data-Science/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266048493,"owners_count":23868738,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["archlinux","cuda","cudnn","data-science","deep-learning","lightgbm","machine-learning","mkl","mxnet","natural-language-processing","natural-language-understanding","nervana","opencv","package","pandas","pytorch","scikit-learn","spacy","tensorflow","xgboost"],"created_at":"2024-08-07T23:03:55.589Z","updated_at":"2025-07-20T00:31:08.608Z","avatar_url":"https://github.com/mratsim.png","language":"Shell","funding_links":[],"categories":["Shell"],"sub_categories":[],"readme":"# Data Science packages for Archlinux\n\nWelcome to my repo to build Data Science, Machine Learning, Computer Vision, Natural language Processing and Deep Learning packages from source.\n\n## Performance considerations\n\nMy aim is to squeeze the maximum performance for my current configuration (Skylake-X i9-9980XE + 2x RTX 2080Ti) so:\n\n* All packages are build with -O3 -march=native if the package ignores /etc/makepkg.conf config.\n* I do not use fast-math except if it's the default upstream (example opencv). You might want to enable it for GCC and NVCC (Nvidia compiler)\n* All CUDA packages are build with CUDA 10.1, cuDNN 7.6 and Compute capabilities 7.5 (Turing).\n* Pytorch is build\n  * with MAGMA support. Magma is a linear algebra library for heterogeneous computing (CPU + GPU hybridization)\n  * with MKLDNN support. MKLDNN is a optimized x86 backend for deep learning.\n* BLAS library is MKL except for Tensorflow (Eigen).\n* Parallel library is Intel OpenMP except for Tensorflow (Eigen), PyTorch (because linking is buggy) and OpenCV (Intel TBB, because linking is buggy as well)\n* OpenCV is further optimized with Intel IPP (Integrated Performance Primitives)\n* Nvidia libraries (CuBLAS, CuFFT, CuSPARSE ...) are used wherever possible\n\nIf running in a LXC container, bazel (necessary to build Tensorflow), must be build with its auto-sandboxing disabled.\n\n## Caveats\nPlease note that current mxnet and lightgbm packages are working but must be improved: they put their libraries in /usr/mxnet and /usr/lightgbm\nPackages included are those not available by default in Archlinux AUR or that needed substantial modifications. So check Archlinux AUR for standard packages like Numpy or Pandas.\n\n## Suggestions\n\nBeyond the packages provided here, here are some useful tools:\n* CSV manipulation from command-line\n    * [xsv](https://github.com/BurntSushi/xsv) - The fastest, multi-processing CSV library. Written in Rust.\n* Geographical data (combined them with a clustering algorithm)\n    * Geopy\n    * Shapely\n* GPU computation\n    * Nvidia's RAPIDS (to be wrapped)\n      * [GPU Dataframes](https://github.com/rapidsai/cudf)\n      * [Sklearn-like on GPU](https://github.com/rapidsai/cuml)\n* Monitoring\n    * htop - Monitor CPU, RAM, load, kill programs\n    * [nvtop](https://github.com/Syllo/nvtop) - Monitor Nvidia GPU\n    * nvidia-smi - Monitor Nvidia GPU (included with nvidia driver)\n        1. nvidia-smi -q -g 0 -d TEMPERATURE,POWER,CLOCK,MEMORY -l #Flags can be UTILIZATION, PERFORMANCE (on Tesla) ...\n        2. nvidia-smi dmon\n        3. nvidia-smi -l 1\n* Rapid prototyping, Research\n    * Jupyter - Code Python, R, Haskell, Julia with direct feedback in your browser\n    * jupyter_contrib_nbextensions - Extensions for jupyter (commenting code, ...)\n* Text\n    * gensim - word2vec\n* Time data\n    * Workalendar - Business calendar for multiple countries\n* Video\n    * Vapoursynth - Frameserver for video pre-processing\n* Visualization\n    * The [Vega ecosystem](https://vega.github.io/)\n      * [Altair](https://github.com/altair-viz/altair) - declarative data visualization\n      * [Voyager](https://github.com/vega/voyager) - Automatic Exploratory Data Analysis\n      * [Lyra](https://github.com/vega/lyra) - Tableau-like data visualization design\n    * [Seaborn](https://github.com/mwaskom/seaborn)\n    * [Plot.ly](https://github.com/plotly/plotly.py)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmratsim%2FArch-Data-Science","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmratsim%2FArch-Data-Science","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmratsim%2FArch-Data-Science/lists"}