{"id":15032418,"url":"https://github.com/alibaba/alink","last_synced_at":"2025-05-14T02:06:08.333Z","repository":{"id":38307944,"uuid":"150938406","full_name":"alibaba/Alink","owner":"alibaba","description":"Alink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform. ","archived":false,"fork":false,"pushed_at":"2024-06-07T14:56:29.000Z","size":18863,"stargazers_count":3605,"open_issues_count":59,"forks_count":800,"subscribers_count":137,"default_branch":"master","last_synced_at":"2025-05-09T10:45:58.325Z","etag":null,"topics":["apriori","classification","clustering","data-mining","feature-engineering","flink","flink-machine-learning","flink-ml","fm","graph-algorithms","graph-embedding","kafka","machine-learning","recommender","recommender-system","regression","statistics","word2vec","xgboost"],"latest_commit_sha":null,"homepage":"","language":"Java","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/alibaba.png","metadata":{"files":{"readme":"README.en-US.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2018-09-30T06:36:11.000Z","updated_at":"2025-05-08T15:33:47.000Z","dependencies_parsed_at":"2024-01-27T05:38:23.309Z","dependency_job_id":"e5149196-258d-43af-813a-7ab1e19681cd","html_url":"https://github.com/alibaba/Alink","commit_stats":{"total_commits":280,"total_committers":18,"mean_commits":"15.555555555555555","dds":0.6285714285714286,"last_synced_commit":"fe7798ca95bade691cf39bcf568a975cd5fd028d"},"previous_names":[],"tags_count":111,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2FAlink","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2FAlink/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2FAlink/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/alibaba%2FAlink/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/alibaba","download_url":"https://codeload.github.com/alibaba/Alink/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":254052696,"owners_count":22006716,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["apriori","classification","clustering","data-mining","feature-engineering","flink","flink-machine-learning","flink-ml","fm","graph-algorithms","graph-embedding","kafka","machine-learning","recommender","recommender-system","regression","statistics","word2vec","xgboost"],"created_at":"2024-09-24T20:18:20.556Z","updated_at":"2025-05-14T02:06:03.321Z","avatar_url":"https://github.com/alibaba.png","language":"Java","readme":"\u003cfont size=7\u003eEnglish| [简体中文](README.md)\u003c/font\u003e\n\n# Alink\n\nAlink is the Machine Learning algorithm platform based on Flink, developed by the PAI team of Alibaba computing platform.\nWelcome everyone to join the Alink open source user group to communicate.\n \n \n\u003cdiv align=center\u003e\n\u003cimg src=\"https://img.alicdn.com/tfs/TB1kQU0sQY2gK0jSZFgXXc5OFXa-614-554.png\" height=\"25%\" width=\"25%\"\u003e\n\u003c/div\u003e\n\n#### List of Algorithms\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"https://img.alicdn.com/tfs/TB1AEOeoBr0gK0jSZFnXXbRRXXa-1320-1048.png\" height=\"60%\" width=\"60%\"\u003e\n\u003c/div\u003e\n\n#### PyAlink\n\n\u003cdiv align=center\u003e\n\u003cimg src=\"https://img.alicdn.com/tfs/TB1TmKloAL0gK0jSZFxXXXWHVXa-2070-1380.png\" height=\"60%\" width=\"60%\"\u003e\n\u003c/div\u003e\n\n# Quick start\n\n## PyAlink Manual\n\n### Preparation before use:\n---------\n\n\n#### About package names and versions:\n  - PyAlink provides different Python packages for Flink versions that Alink supports: \n  package `pyalink` always maintains Alink Python API against the latest Flink version, which is 1.13, \n  while `pyalink-flink-***` support old-version Flink, which are `pyalink-flink-1.12`, `pyalink-flink-1.11`, `pyalink-flink-1.10` and `pyalink-flink-1.9` for now. \n  - The version of python packages always follows Alink Java version, like `1.6.2`.\n  \n#### Installation steps:\n\n1. Make sure the version of python3 on your computer is 3.6, 3.7 or 3.8.\n2. Make sure Java 8 is installed on your computer.\n3. Use pip to install:\n  `pip install pyalink`, `pip install pyalink-flink-1.12`, `pip install pyalink-flink-1.11`, `pip install pyalink-flink-1.10` or `pip install pyalink-flink-1.9`.\n\n\n#### Potential issues:\n\n1. `pyalink` and/or `pyalink-flink-***` can not be installed at the same time. Multiple versions are not allowed.\nIf `pyalink` or `pyalink-flink-***` was/were installed, please use `pip uninstall pyalink` or `pip uninstall pyalink-flink-***` to remove them.\n\n2. If `pip install` is slow of failed, refer to [this article](https://segmentfault.com/a/1190000006111096) to change the pip source, or use the following download links:\n    - Flink 1.13：[Link](https://alink-release.oss-cn-beijing.aliyuncs.com/v1.6.2.post0/pyalink-1.6.2.post0-py3-none-any.whl) (MD5: d4b7b1fe6474b11ca7f45d0fb0daf5bc)\n    - Flink 1.12：[Link](https://alink-release.oss-cn-beijing.aliyuncs.com/v1.6.2.post0/pyalink_flink_1.12-1.6.2.post0-py3-none-any.whl) (MD5: 527b9ac24383ccc8593cd61b06cc610d)\n    - Flink 1.11：[Link](https://alink-release.oss-cn-beijing.aliyuncs.com/v1.6.2.post0/pyalink_flink_1.11-1.6.2.post0-py3-none-any.whl) (MD5: 7e59ba00b3739386996cf55d8f522ed2)\n    - Flink 1.10：[Link](https://alink-release.oss-cn-beijing.aliyuncs.com/v1.6.2.post0/pyalink_flink_1.10-1.6.2.post0-py3-none-any.whl) (MD5: 6d5d9048c9a44f27285467c5117e8deb)\n    - Flink 1.9: [Link](https://alink-release.oss-cn-beijing.aliyuncs.com/v1.6.2.post0/pyalink_flink_1.9-1.6.2.post0-py3-none-any.whl) (MD5: e89ac35a6a1c63c0426f3d9ca1025880)\n3. If multiple version of Python exist, you may need to use a special version of `pip`, like `pip3`;\nIf Anaconda is used, the command should be run in Anaconda prompt. \n\n\n#### Download file system and Catalog dependency jar files:\n\nAfter PyAlink installed, you can run ```download_pyalink_dep_jars``` to download dependency jars for file system and Hive.\n(If there is an error that could not find the command, you can run the python command ```python3 -c 'from pyalink.alink.download_pyalink_dep_jars import main;main()'``` directly.)\n\nAfter executed the command, you'll see a prompt asking you about the dependencies and their versions to be downloaded. \nThe following dependencies and their versions of jars are supported:\n\n- OSS：3.4.1\n- Hadoop：2.8.3\n- Hive：2.3.4\n- MySQL: 5.1.27\n- Derby: 10.6.1.0\n- SQLite: 3.19.3\n- S3-hadoop: 1.11.788\n- S3-presto: 1.11.788\n- odps: 0.36.4-public\n\nThese jars will be installed to the ```lib/plugins``` folder of PyAlink. \nNote that these command require the access for the folder.\n\nYou can also add the argument ```-d``` when executing the command, i.e.  ```download_pyalink_dep_jars -d```.\nIt will install all dependency jars.\n\n### Start using: \n-------\nYou can start using PyAlink with Jupyter Notebook to provide a better experience.\n\nSteps for usage: \n\n1. Start Jupyter: ```jupyter notebook``` in terminal\n, and create Python 3 notebook.\n\n2. Import the pyalink package: ```from pyalink.alink import *```.\n\n3. Use this command to create a local runtime environment:\n\n   ```useLocalEnv(parallism, flinkHome=None, config=None)```.\n\n   Among them, the parameter  ```parallism```  indicates the degree of parallelism used for execution;```flinkHome``` is the full path of flink, and usually no need to set; ```config``` is the configuration parameter accepted by Flink. After running, the following output appears, indicating that the initialization of the running environment is successful.\n```\nJVM listening on ***\nPython listening on ***\n```\n4. Start writing PyAlink code, for example:\n```python\nsource = CsvSourceBatchOp()\\\n    .setSchemaStr(\"sepal_length double, sepal_width double, petal_length double, petal_width double, category string\")\\\n    .setFilePath(\"https://alink-release.oss-cn-beijing.aliyuncs.com/data-files/iris.csv\")\nres = source.select([\"sepal_length\", \"sepal_width\"])\ndf = res.collectToDataframe()\nprint(df)\n```\n\n### Write code: \n------\nIn PyAlink, the interface provided by the algorithm component is basically the same as the Java APIs, that is, an algorithm component is created through the default construction method, then the parameters are set through ```setXXX```, and other components are connected through ```link / linkTo / linkFrom```.\n\nHere, Jupyter Notebook's auto-completion mechanism can be used to provide writing convenience.\n\nFor batch jobs, you can trigger execution through methods such as ```print / collectToDataframe / collectToDataframes``` of batch components or ```BatchOperator.execute ()```; for streaming jobs, start the job with ```StreamOperator.execute ()```.\n\n### More usage: \n------\n - [Interchange between DataFrame and Operator](docs/pyalink/pyalink-dataframe.md)\n - [StreamOperator data preview](docs/pyalink/pyalink-stream-operator-preview.md)\n - [UDF/UDTF/SQL usage](docs/pyalink/pyalink-udf.md)\n - [Use with PyFlink](docs/pyalink/pyalink-pyflink.md)\n - [PyAlink Q\u0026A](docs/pyalink/pyalink-qa.md)\n\n## Java API Manual\n\n### KMeans Example\n```java\nString URL = \"https://alink-release.oss-cn-beijing.aliyuncs.com/data-files/iris.csv\";\nString SCHEMA_STR = \"sepal_length double, sepal_width double, petal_length double, petal_width double, category string\";\n\nBatchOperator data = new CsvSourceBatchOp()\n        .setFilePath(URL)\n        .setSchemaStr(SCHEMA_STR);\n\nVectorAssembler va = new VectorAssembler()\n        .setSelectedCols(new String[]{\"sepal_length\", \"sepal_width\", \"petal_length\", \"petal_width\"})\n        .setOutputCol(\"features\");\n\nKMeans kMeans = new KMeans().setVectorCol(\"features\").setK(3)\n        .setPredictionCol(\"prediction_result\")\n        .setPredictionDetailCol(\"prediction_detail\")\n        .setReservedCols(\"category\")\n        .setMaxIter(100);\n\nPipeline pipeline = new Pipeline().add(va).add(kMeans);\npipeline.fit(data).transform(data).print();\n```\n\n### With Flink-1.13\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.alibaba.alink\u003c/groupId\u003e\n    \u003cartifactId\u003ealink_core_flink-1.13_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.6.2\u003c/version\u003e\n\u003c/dependency\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n    \u003cartifactId\u003eflink-streaming-scala_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.13.0\u003c/version\u003e\n\u003c/dependency\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n    \u003cartifactId\u003eflink-table-planner_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.13.0\u003c/version\u003e\n\u003c/dependency\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n    \u003cartifactId\u003eflink-clients_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.13.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n### With Flink-1.12\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.alibaba.alink\u003c/groupId\u003e\n    \u003cartifactId\u003ealink_core_flink-1.12_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.6.2\u003c/version\u003e\n\u003c/dependency\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n    \u003cartifactId\u003eflink-streaming-scala_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.12.1\u003c/version\u003e\n\u003c/dependency\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n    \u003cartifactId\u003eflink-table-planner_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.12.1\u003c/version\u003e\n\u003c/dependency\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n    \u003cartifactId\u003eflink-clients_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.12.1\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n### With Flink-1.11\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.alibaba.alink\u003c/groupId\u003e\n    \u003cartifactId\u003ealink_core_flink-1.11_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.6.2\u003c/version\u003e\n\u003c/dependency\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n    \u003cartifactId\u003eflink-streaming-scala_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.11.0\u003c/version\u003e\n\u003c/dependency\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n    \u003cartifactId\u003eflink-table-planner_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.11.0\u003c/version\u003e\n\u003c/dependency\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n    \u003cartifactId\u003eflink-clients_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.11.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n### With Flink-1.10\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.alibaba.alink\u003c/groupId\u003e\n    \u003cartifactId\u003ealink_core_flink-1.10_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.6.2\u003c/version\u003e\n\u003c/dependency\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n    \u003cartifactId\u003eflink-streaming-scala_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.10.0\u003c/version\u003e\n\u003c/dependency\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n    \u003cartifactId\u003eflink-table-planner_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.10.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n### With Flink-1.9\n\n```xml\n\u003cdependency\u003e\n    \u003cgroupId\u003ecom.alibaba.alink\u003c/groupId\u003e\n    \u003cartifactId\u003ealink_core_flink-1.9_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.6.2\u003c/version\u003e\n\u003c/dependency\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n    \u003cartifactId\u003eflink-streaming-scala_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.9.0\u003c/version\u003e\n\u003c/dependency\u003e\n\u003cdependency\u003e\n    \u003cgroupId\u003eorg.apache.flink\u003c/groupId\u003e\n    \u003cartifactId\u003eflink-table-planner_2.11\u003c/artifactId\u003e\n    \u003cversion\u003e1.9.0\u003c/version\u003e\n\u003c/dependency\u003e\n```\n\n\nGet started to run Alink Algorithm with a Flink Cluster\n--------\n\n1. Prepare a Flink Cluster:\n```shell\n  wget https://archive.apache.org/dist/flink/flink-1.13.0/flink-1.13.0-bin-scala_2.11.tgz\n  tar -xf flink-1.13.0-bin-scala_2.11.tgz \u0026\u0026 cd flink-1.13.0\n  ./bin/start-cluster.sh\n```\n\n2. Build Alink jar from the source:\n```shell\n  git clone https://github.com/alibaba/Alink.git\n  # add \u003cscope\u003eprovided\u003c/scope\u003e in pom.xml of alink_examples.\n  cd Alink \u0026\u0026 mvn -Dmaven.test.skip=true clean package shade:shade\n```\n\n3. Run Java examples:\n```shell\n  ./bin/flink run -p 1 -c com.alibaba.alink.ALSExample [path_to_Alink]/examples/target/alink_examples-1.5-SNAPSHOT.jar\n  # ./bin/flink run -p 1 -c com.alibaba.alink.GBDTExample [path_to_Alink]/examples/target/alink_examples-1.5-SNAPSHOT.jar\n  # ./bin/flink run -p 1 -c com.alibaba.alink.KMeansExample [path_to_Alink]/examples/target/alink_examples-1.5-SNAPSHOT.jar\n```\n\nDeployment\n---------\n\n[Cluster](docs/deploy/cluster-deploy.en-US.md)\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falibaba%2Falink","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Falibaba%2Falink","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Falibaba%2Falink/lists"}