{"id":26032267,"url":"https://github.com/apache/madlib","last_synced_at":"2025-04-12T21:36:21.943Z","repository":{"id":37602257,"uuid":"42763345","full_name":"apache/madlib","owner":"apache","description":"Mirror of Apache MADlib","archived":false,"fork":false,"pushed_at":"2024-05-11T01:10:09.000Z","size":21155,"stargazers_count":465,"open_issues_count":10,"forks_count":148,"subscribers_count":41,"default_branch":"master","last_synced_at":"2025-04-05T00:03:30.356Z","etag":null,"topics":["madlib"],"latest_commit_sha":null,"homepage":"https://madlib.apache.org/","language":"C++","has_issues":false,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/apache.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2015-09-19T07:00:06.000Z","updated_at":"2025-03-06T08:08:17.000Z","dependencies_parsed_at":"2024-05-11T02:35:58.700Z","dependency_job_id":null,"html_url":"https://github.com/apache/madlib","commit_stats":null,"previous_names":[],"tags_count":82,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fmadlib","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fmadlib/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fmadlib/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/apache%2Fmadlib/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/apache","download_url":"https://codeload.github.com/apache/madlib/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248521006,"owners_count":21117974,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["madlib"],"created_at":"2025-03-06T21:55:26.824Z","updated_at":"2025-04-12T21:36:21.911Z","avatar_url":"https://github.com/apache.png","language":"C++","funding_links":[],"categories":[],"sub_categories":[],"readme":"![](doc/imgs/magnetic-icon.png?raw=True) ![](doc/imgs/agile-icon.png?raw=True) ![](doc/imgs/deep-icon.png?raw=True)\n=================================================\n**MADlib\u003csup\u003e\u0026reg;\u003c/sup\u003e** is an open-source library for scalable in-database analytics.\nIt provides data-parallel implementations of mathematical, statistical and\nmachine learning methods for structured and unstructured data.\n\n[![Build Status](https://ci-builds.apache.org/job/Madlib/job/madlib-build/job/master/badge/icon)](https://ci-builds.apache.org/job/Madlib/job/madlib-build/job/master/)\n\nInstallation and Contribution\n==============================\nSee the project website  [MADlib Home](http://madlib.apache.org/) for links to the\nlatest binary and source packages.\n\nWe appreciate all forms of project contributions to MADlib including bug reports, providing help to new users, documentation, or code patches. Please refer to [Contribution Guidelines](https://cwiki.apache.org/confluence/display/MADLIB/Contribution+Guidelines) for instructions.\n\nFor more installation and contribution guides,\nplease refer to the [MADlib Wiki](https://cwiki.apache.org/confluence/display/MADLIB/).\n\n[Compiling from source on Linux](https://cwiki.apache.org/confluence/display/MADLIB/Installation+Guide#InstallationGuide-CompileFromSourceCompilingFromSource) details are\nalso on the wiki.\n\n\nDevelopment with Docker\n=======================\nWe provide a Docker image with necessary dependencies required to compile and test MADlib on PostgreSQL 10.5. You can view the dependency Docker file at ./tool/docker/base/Dockerfile_ubuntu16_postgres10. The image is hosted on Docker Hub at madlib/postgres_10:latest. Later we will provide a similar Docker image for Greenplum Database.\n\nWe provide a script to quickly run this docker image at ./tool/docker_start.sh, which will mount your local madlib directory, build MADlib and run install check on this Docker image. At the end, it will `docker exec` as postgres user. Note that you have to run this script from inside your madlib directory, and you can specify your docker CONTAINER_NAME (default is madlib) and IMAGE_TAG (default is latest). Here is an example:\n\n```\nCONTAINER_NAME=my_madlib IMAGE_TAG=LaTex ./tool/docker_start.sh\n```\nNotice that this script only needs to be run once. After that, you will have a local docker container with CONTAINER_NAME running. To get access to the container, run the following command and you can keep working on it.\n\n```\ndocker exec -it CONTAINER_NAME bash\n```\n\nTo kill this docker container, run:\n\n```\ndocker kill CONTAINER_NAME\ndocker rm CONTAINER_NAME\n```\n\nYou can also manually run those commands to do the same thing:\n\n```\n## 1) Pull down the `madlib/postgres_10:latest` image from docker hub:\ndocker pull madlib/postgres_10:latest\n\n## 2) Launch a container corresponding to the MADlib image, name it\n##    madlib, mounting the source code folder to the container:\ndocker run -d -it --name madlib \\\n    -v (path to madlib directory):/madlib/ madlib/postgres_10\n# where madlib is the directory where the MADlib source code resides.\n\n################################# * WARNING * #################################\n# Please be aware that when mounting a volume as shown above, any changes you\n# make in the \"madlib\" folder inside the Docker container will be\n# reflected on your local disk (and vice versa). This means that deleting data\n# in the mounted volume from a Docker container will delete the data from your\n# local disk also.\n###############################################################################\n\n## 3) When the container is up, connect to it and build MADlib:\ndocker exec -it madlib bash\nmkdir /madlib/build_docker\ncd /madlib/build_docker\ncmake ..\nmake\nmake doc\nmake install\n\n## 4) Install MADlib:\nsrc/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres install\n\n## 5) Several other commands can now be run, such as:\n# Run install check, on all modules:\nsrc/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres install-check\n# Run install check, on a specific module, say svm:\nsrc/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres install-check -t svm\n# Reinstall MADlib:\nsrc/bin/madpack -p postgres -c postgres/postgres@localhost:5432/postgres reinstall\n\n## 6) Kill and remove containers (after exiting the container):\ndocker kill madlib\ndocker rm madlib\n```\n\nInstruction for building design pdf on Docker:\n\nFor users who wants to build design pdf, make sure you use the `IMAGE_TAG=LaTex` parameter when running the script. After launching your docker container, run the following to get `design.pdf`:\n\n```\ncd /madlib/build_docker\nmake design_pdf\ncd doc/design\n```\n\nDetailed build instructions are available in [`ReadMe_Build.txt`](ReadMe_Build.txt)\n\nUser and Developer Documentation\n==================================\nThe latest documentation of MADlib modules can be found at [`MADlib\nDocs`](http://madlib.apache.org/docs/latest/index.html).\n\n\nArchitecture\n=============\nThe following block-diagram gives a high-level overview of MADlib's\narchitecture.\n\n\n![MADlib Architecture](doc/imgs/architecture.png?raw=True)\n\n\nThird Party Components\n======================\nMADlib incorporates software from the following third-party components.  Bundled with source code:\n\n1. [`libstemmer`](http://snowballstem.org/) \"small string processing language\"\n2. [`m_widen_init`](licenses/third_party/_M_widen_init.txt) \"allows compilation with recent versions of gcc with runtime dependencies from earlier versions of libstdc++\"\n3. [`argparse 1.2.1`](http://code.google.com/p/argparse/) \"provides an easy, declarative interface for creating command line tools\"\n4. [`PyYAML 3.10`](http://pyyaml.org/wiki/PyYAML) \"YAML parser and emitter for Python\"\n5. [`UseLATEX.cmake`](https://github.com/kmorel/UseLATEX/blob/master/UseLATEX.cmake) \"CMAKE commands to use the LaTeX compiler\"\n\nDownloaded at build time (or supplied as build dependencies):\n\n6. [`Boost 1.61.0 (or newer)`](http://www.boost.org/) \"provides peer-reviewed portable C++ source libraries\"\n7. [`PyXB 1.2.6`](http://pyxb.sourceforge.net/) \"Python library for XML Schema Bindings\"\n8. [`Eigen 3.2.2`](http://eigen.tuxfamily.org/index.php?title=Main_Page) \"C++ template library for linear algebra\"\n\nLicensing\n==========\nLicensed to the Apache Software Foundation (ASF) under one or more contributor license agreements. See the [`NOTICE`](NOTICE) file distributed with this work for additional information regarding copyright ownership. The ASF licenses this project to You under the Apache License, Version 2.0 (the \"License\"); you may not use this project except in compliance with the License. You may obtain a copy of the License at [`LICENSE`](LICENSE).\n\nUnless required by applicable law or agreed to in writing, software distributed under the License is distributed on an \"AS IS\" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.\n\nAs specified in [`LICENSE`](LICENSE) additional license information regarding included third-party libraries can be\nfound inside the [`licenses`](licenses) directory.\n\nRelease Notes\n=============\nChanges between MADlib versions are described in the\n[`ReleaseNotes.txt`](RELEASE_NOTES) file.\n\nPapers and Talks\n=================\n* [`MAD Skills : New Analysis Practices for Big Data (VLDB 2009)`](http://db.cs.berkeley.edu/papers/vldb09-madskills.pdf)\n* [`Hybrid In-Database Inference for Declarative Information Extraction (SIGMOD 2011)`](https://amplab.cs.berkeley.edu/publication/hybrid-in-database-inference-for-declarative-information-extraction/)\n* [`Towards a Unified Architecture for In-Database Analytics (SIGMOD 2012)`](http://www.cs.stanford.edu/~chrismre/papers/bismarck-full.pdf)\n* [`The MADlib Analytics Library or MAD Skills, the SQL (VLDB 2012)`](http://www.eecs.berkeley.edu/Pubs/TechRpts/2012/EECS-2012-38.html)\n\n\nRelated Software\n=================\n* [`PivotalR`](https://github.com/pivotalsoftware/PivotalR) - PivotalR also\nlets the user run the functions of the open-source big-data machine learning\npackage `MADlib` directly from R.\n* [`PyMADlib`](https://github.com/pivotalsoftware/pymadlib) - PyMADlib is a python\nwrapper for MADlib, which brings you the power and flexibility of python\nwith the number crunching power of `MADlib`.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fmadlib","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fapache%2Fmadlib","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fapache%2Fmadlib/lists"}