{"id":13468127,"url":"https://github.com/mars-project/mars","last_synced_at":"2025-04-25T14:45:30.522Z","repository":{"id":37416520,"uuid":"160543708","full_name":"mars-project/mars","owner":"mars-project","description":"Mars is a tensor-based unified framework for large-scale data computation which scales numpy, pandas, scikit-learn and Python functions.","archived":false,"fork":false,"pushed_at":"2024-01-02T10:00:14.000Z","size":38755,"stargazers_count":2722,"open_issues_count":215,"forks_count":327,"subscribers_count":91,"default_branch":"master","last_synced_at":"2025-04-24T08:36:10.444Z","etag":null,"topics":["dask","dataframe","joblib","lightgbm","machine-learning","numpy","pandas","python","pytorch","ray","scikit-learn","statsmodels","tensor","tensorflow","xgboost"],"latest_commit_sha":null,"homepage":"https://mars-project.readthedocs.io","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/mars-project.png","metadata":{"files":{"readme":"README.rst","changelog":null,"contributing":"CONTRIBUTING.rst","funding":null,"license":"LICENSE","code_of_conduct":"CODE_OF_CONDUCT.md","threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":null,"support":"SUPPORT.md","governance":null,"roadmap":null,"authors":null}},"created_at":"2018-12-05T16:04:03.000Z","updated_at":"2025-04-18T10:03:23.000Z","dependencies_parsed_at":"2022-08-08T20:15:41.651Z","dependency_job_id":"f379606d-2835-4fa3-8c4a-01686fe12cc7","html_url":"https://github.com/mars-project/mars","commit_stats":{"total_commits":1296,"total_committers":56,"mean_commits":"23.142857142857142","dds":0.7237654320987654,"last_synced_commit":"bcc000554c8bd9ebd5cafe7c61b1f0090ab9d53b"},"previous_names":[],"tags_count":117,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mars-project%2Fmars","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mars-project%2Fmars/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mars-project%2Fmars/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/mars-project%2Fmars/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/mars-project","download_url":"https://codeload.github.com/mars-project/mars/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250836611,"owners_count":21495404,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["dask","dataframe","joblib","lightgbm","machine-learning","numpy","pandas","python","pytorch","ray","scikit-learn","statsmodels","tensor","tensorflow","xgboost"],"created_at":"2024-07-31T15:01:05.960Z","updated_at":"2025-04-25T14:45:30.504Z","avatar_url":"https://github.com/mars-project.png","language":"Python","funding_links":[],"categories":["Python","资源列表","Linear Algebra / Statistics Toolkit","其他_机器学习与深度学习","Distributed Computing Libraries","Uncategorized"],"sub_categories":["分布式计算","General Purpose Tensor Library","Columnar Based Storage","General-Purpose Machine Learning","Uncategorized"],"readme":".. image:: https://raw.githubusercontent.com/mars-project/mars/master/docs/source/images/mars-logo-title.png\n\n|PyPI version| |Docs| |Build| |Coverage| |Quality| |License|\n\nMars is a tensor-based unified framework for large-scale data computation\nwhich scales numpy, pandas, scikit-learn and many other libraries.\n\n`Documentation`_, `中文文档`_\n\nInstallation\n------------\n\nMars is easy to install by\n\n.. code-block:: bash\n\n    pip install pymars\n\n\nInstallation for Developers\n```````````````````````````\n\nWhen you want to contribute code to Mars, you can follow the instructions below to install Mars\nfor development:\n\n.. code-block:: bash\n\n    git clone https://github.com/mars-project/mars.git\n    cd mars\n    pip install -e \".[dev]\"\n\nMore details about installing Mars can be found at\n`installation \u003chttps://mars-project.readthedocs.io/en/latest/installation/index.html\u003e`_ section in\nMars document.\n\n\nArchitecture Overview\n---------------------\n\n.. image:: https://raw.githubusercontent.com/mars-project/mars/master/docs/source/images/architecture.png\n\n\nGetting Started\n---------------\n\nStarting a new runtime locally via:\n\n.. code-block:: python\n\n    \u003e\u003e\u003e import mars\n    \u003e\u003e\u003e mars.new_session()\n\nOr connecting to a Mars cluster which is already initialized.\n\n.. code-block:: python\n\n    \u003e\u003e\u003e import mars\n    \u003e\u003e\u003e mars.new_session('http://\u003cweb_ip\u003e:\u003cui_port\u003e')\n\n\nMars Tensor\n-----------\n\nMars tensor provides a familiar interface like Numpy.\n\n+-----------------------------------------------+-----------------------------------------------+\n| **Numpy**                                     | **Mars tensor**                               |\n+-----------------------------------------------+-----------------------------------------------+\n|.. code-block:: python                         |.. code-block:: python                         |\n|                                               |                                               |\n|    import numpy as np                         |    import mars.tensor as mt                   |\n|    N = 200_000_000                            |    N = 200_000_000                            |\n|    a = np.random.uniform(-1, 1, size=(N, 2))  |    a = mt.random.uniform(-1, 1, size=(N, 2))  |\n|    print((np.linalg.norm(a, axis=1) \u003c 1)      |    print(((mt.linalg.norm(a, axis=1) \u003c 1)     |\n|          .sum() * 4 / N)                      |            .sum() * 4 / N).execute())         |\n|                                               |                                               |\n+-----------------------------------------------+-----------------------------------------------+\n|.. code-block::                                |.. code-block::                                |\n|                                               |                                               |\n|    3.14174502                                 |     3.14161908                                |\n|    CPU times: user 11.6 s, sys: 8.22 s,       |     CPU times: user 966 ms, sys: 544 ms,      |\n|               total: 19.9 s                   |                total: 1.51 s                  |\n|    Wall time: 22.5 s                          |     Wall time: 3.77 s                         |\n|                                               |                                               |\n+-----------------------------------------------+-----------------------------------------------+\n\nMars can leverage multiple cores, even on a laptop, and could be even faster for a distributed setting.\n\n\nMars DataFrame\n--------------\n\nMars DataFrame provides a familiar interface like pandas.\n\n+-----------------------------------------+-----------------------------------------+\n| **Pandas**                              | **Mars DataFrame**                      |\n+-----------------------------------------+-----------------------------------------+\n|.. code-block:: python                   |.. code-block:: python                   |\n|                                         |                                         |\n|    import numpy as np                   |    import mars.tensor as mt             |\n|    import pandas as pd                  |    import mars.dataframe as md          |\n|    df = pd.DataFrame(                   |    df = md.DataFrame(                   |\n|        np.random.rand(100000000, 4),    |        mt.random.rand(100000000, 4),    |\n|        columns=list('abcd'))            |        columns=list('abcd'))            |\n|    print(df.sum())                      |    print(df.sum().execute())            |\n|                                         |                                         |\n+-----------------------------------------+-----------------------------------------+\n|.. code-block::                          |.. code-block::                          |\n|                                         |                                         |\n|    CPU times: user 10.9 s, sys: 2.69 s, |    CPU times: user 1.21 s, sys: 212 ms, |\n|               total: 13.6 s             |               total: 1.42 s             |\n|    Wall time: 11 s                      |    Wall time: 2.75 s                    |\n+-----------------------------------------+-----------------------------------------+\n\n\nMars Learn\n----------\n\nMars learn provides a familiar interface like scikit-learn.\n\n+---------------------------------------------+----------------------------------------------------+\n| **Scikit-learn**                            | **Mars learn**                                     |\n+---------------------------------------------+----------------------------------------------------+\n|.. code-block:: python                       |.. code-block:: python                              |\n|                                             |                                                    |\n|    from sklearn.datasets import make_blobs  |    from mars.learn.datasets import make_blobs      |\n|    from sklearn.decomposition import PCA    |    from mars.learn.decomposition import PCA        |\n|    X, y = make_blobs(                       |    X, y = make_blobs(                              |\n|        n_samples=100000000, n_features=3,   |        n_samples=100000000, n_features=3,          |\n|        centers=[[3, 3, 3], [0, 0, 0],       |        centers=[[3, 3, 3], [0, 0, 0],              |\n|                 [1, 1, 1], [2, 2, 2]],      |                  [1, 1, 1], [2, 2, 2]],            |\n|        cluster_std=[0.2, 0.1, 0.2, 0.2],    |        cluster_std=[0.2, 0.1, 0.2, 0.2],           |\n|        random_state=9)                      |        random_state=9)                             |\n|    pca = PCA(n_components=3)                |    pca = PCA(n_components=3)                       |\n|    pca.fit(X)                               |    pca.fit(X)                                      |\n|    print(pca.explained_variance_ratio_)     |    print(pca.explained_variance_ratio_)            |\n|    print(pca.explained_variance_)           |    print(pca.explained_variance_)                  |\n|                                             |                                                    |\n+---------------------------------------------+----------------------------------------------------+\n\nMars learn also integrates with many libraries:\n\n- `TensorFlow \u003chttps://mars-project.readthedocs.io//en/latest/user_guide/learn/tensorflow.html\u003e`_\n- `PyTorch \u003chttps://mars-project.readthedocs.io/en/latest/user_guide/learn/pytorch.html\u003e`_\n- `XGBoost \u003chttps://mars-project.readthedocs.io/en/latest/user_guide/learn/xgboost.html\u003e`_\n- `LightGBM \u003chttps://mars-project.readthedocs.io/en/latest/user_guide/learn/lightgbm.html\u003e`_\n- `Joblib \u003chttps://mars-project.readthedocs.io/en/latest/user_guide/learn/joblib.html\u003e`_\n- `Statsmodels \u003chttps://mars-project.readthedocs.io/en/latest/user_guide/learn/statsmodels.html\u003e`_\n\nMars remote\n-----------\n\nMars remote allows users to execute functions in parallel.\n\n+-------------------------------------------+--------------------------------------------+\n| **Vanilla function calls**                | **Mars remote**                            |\n+-------------------------------------------+--------------------------------------------+\n|.. code-block:: python                     |.. code-block:: python                      |\n|                                           |                                            |\n|    import numpy as np                     |    import numpy as np                      |\n|                                           |    import mars.remote as mr                |\n|                                           |                                            |\n|    def calc_chunk(n, i):                  |    def calc_chunk(n, i):                   |\n|        rs = np.random.RandomState(i)      |        rs = np.random.RandomState(i)       |\n|        a = rs.uniform(-1, 1, size=(n, 2)) |        a = rs.uniform(-1, 1, size=(n, 2))  |\n|        d = np.linalg.norm(a, axis=1)      |        d = np.linalg.norm(a, axis=1)       |\n|        return (d \u003c 1).sum()               |        return (d \u003c 1).sum()                |\n|                                           |                                            |\n|    def calc_pi(fs, N):                    |    def calc_pi(fs, N):                     |\n|        return sum(fs) * 4 / N             |        return sum(fs) * 4 / N              |\n|                                           |                                            |\n|    N = 200_000_000                        |    N = 200_000_000                         |\n|    n = 10_000_000                         |    n = 10_000_000                          |\n|                                           |                                            |\n|    fs = [calc_chunk(n, i)                 |    fs = [mr.spawn(calc_chunk, args=(n, i)) |\n|          for i in range(N // n)]          |          for i in range(N // n)]           |\n|    pi = calc_pi(fs, N)                    |    pi = mr.spawn(calc_pi, args=(fs, N))    |\n|    print(pi)                              |    print(pi.execute().fetch())             |\n|                                           |                                            |\n+-------------------------------------------+--------------------------------------------+\n|.. code-block::                            |.. code-block::                             |\n|                                           |                                            |\n|    3.1416312                              |    3.1416312                               |\n|    CPU times: user 32.2 s, sys: 4.86 s,   |    CPU times: user 616 ms, sys: 307 ms,    |\n|               total: 37.1 s               |               total: 923 ms                |\n|    Wall time: 12.4 s                      |    Wall time: 3.99 s                       |\n|                                           |                                            |\n+-------------------------------------------+--------------------------------------------+\n\nDASK on Mars\n------------\n\nRefer to `DASK on Mars`_ for more information.\n\nEager Mode\n```````````\n\nMars supports eager mode which makes it friendly for developing and easy to debug.\n\nUsers can enable the eager mode by options, set options at the beginning of the program or console session.\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from mars.config import options\n    \u003e\u003e\u003e options.eager_mode = True\n\nOr use a context.\n\n.. code-block:: python\n\n    \u003e\u003e\u003e from mars.config import option_context\n    \u003e\u003e\u003e with option_context() as options:\n    \u003e\u003e\u003e     options.eager_mode = True\n    \u003e\u003e\u003e     # the eager mode is on only for the with statement\n    \u003e\u003e\u003e     ...\n\nIf eager mode is on, tensor, DataFrame etc will be executed immediately\nby default session once it is created.\n\n.. code-block:: python\n\n    \u003e\u003e\u003e import mars.tensor as mt\n    \u003e\u003e\u003e import mars.dataframe as md\n    \u003e\u003e\u003e from mars.config import options\n    \u003e\u003e\u003e options.eager_mode = True\n    \u003e\u003e\u003e t = mt.arange(6).reshape((2, 3))\n    \u003e\u003e\u003e t\n    array([[0, 1, 2],\n           [3, 4, 5]])\n    \u003e\u003e\u003e df = md.DataFrame(t)\n    \u003e\u003e\u003e df.sum()\n    0    3\n    1    5\n    2    7\n    dtype: int64\n\n\nMars on Ray\n------------\nMars also has deep integration with Ray and can run on `Ray \u003chttps://docs.ray.io/en/latest/\u003e`_ efficiently and\ninteract with the large ecosystem of machine learning and distributed systems built on top of the core Ray.\n\nStarting a new Mars on Ray runtime locally via:\n\n.. code-block:: python\n\n    import mars\n    mars.new_session(backend='ray')\n    # Perform computation\n\nInteract with Ray Dataset:\n\n.. code-block:: python\n\n    import mars.tensor as mt\n    import mars.dataframe as md\n    df = md.DataFrame(\n        mt.random.rand(1000_0000, 4),\n        columns=list('abcd'))\n    # Convert mars dataframe to ray dataset\n    ds = md.to_ray_dataset(df)\n    print(ds.schema(), ds.count())\n    ds.filter(lambda row: row[\"a\"] \u003e 0.5).show(5)\n    # Convert ray dataset to mars dataframe\n    df2 = md.read_ray_dataset(ds)\n    print(df2.head(5).execute())\n\nRefer to `Mars on Ray`_ for more information.\n\n\nEasy to scale in and scale out\n------------------------------\n\nMars can scale in to a single machine, and scale out to a cluster with thousands of machines.\nIt's fairly simple to migrate from a single machine to a cluster to\nprocess more data or gain a better performance.\n\n\nBare Metal Deployment\n`````````````````````\n\nMars is easy to scale out to a cluster by starting different components of\nmars distributed runtime on different machines in the cluster.\n\nA node can be selected as supervisor which integrated a web service,\nleaving other nodes as workers.  The supervisor can be started with the following command:\n\n.. code-block:: bash\n\n    mars-supervisor -h \u003chost_name\u003e -p \u003csupervisor_port\u003e -w \u003cweb_port\u003e\n\nWorkers can be started with the following command:\n\n.. code-block:: bash\n\n    mars-worker -h \u003chost_name\u003e -p \u003cworker_port\u003e -s \u003csupervisor_endpoint\u003e\n\nAfter all mars processes are started, users can run\n\n.. code-block:: python\n\n    \u003e\u003e\u003e sess = new_session('http://\u003cweb_ip\u003e:\u003cui_port\u003e')\n    \u003e\u003e\u003e # perform computation\n\n\nKubernetes Deployment\n`````````````````````\n\nRefer to `Run on Kubernetes`_ for more information.\n\n\nYarn Deployment\n```````````````\n\nRefer to `Run on Yarn`_ for more information.\n\n\nGetting involved\n----------------\n\n- Read `development guide \u003chttps://mars-project.readthedocs.io/en/latest/development/index.html\u003e`_.\n- Join our Slack workgroup: `Slack \u003chttps://join.slack.com/t/mars-computing/shared_invite/zt-17pw2cfua-NRb2H4vrg77pr9T4g3nQOQ\u003e`_.\n- Join the mailing list: send an email to `mars-dev@googlegroups.com`_.\n- Please report bugs by submitting a `GitHub issue`_.\n- Submit contributions using `pull requests`_.\n\nThank you in advance for your contributions!\n\n\n.. |Build| image:: https://github.com/mars-project/mars/workflows/Mars%20CI%20Core/badge.svg\n   :target: https://github.com/mars-project/mars/actions\n.. |Coverage| image:: https://codecov.io/gh/mars-project/mars/branch/master/graph/badge.svg\n   :target: https://codecov.io/gh/mars-project/mars\n.. |Quality| image:: https://img.shields.io/codacy/grade/6a80bb4659ed410eb33795f580c8615e.svg\n   :target: https://app.codacy.com/project/mars-project/mars/dashboard\n.. |PyPI version| image:: https://img.shields.io/pypi/v/pymars.svg\n   :target: https://pypi.python.org/pypi/pymars\n.. |Docs| image:: https://img.shields.io/badge/docs-latest-brightgreen.svg\n   :target: `Documentation`_\n.. |License| image:: https://img.shields.io/pypi/l/pymars.svg\n   :target: https://github.com/mars-project/mars/blob/master/LICENSE\n.. _`mars-dev@googlegroups.com`: https://groups.google.com/forum/#!forum/mars-dev\n.. _`GitHub issue`: https://github.com/mars-project/mars/issues\n.. _`pull requests`: https://github.com/mars-project/mars/pulls\n.. _`Documentation`: https://mars-project.readthedocs.io\n.. _`中文文档`: https://mars-project.readthedocs.io/zh_CN/latest/\n.. _`Mars on Ray`: https://mars-project.readthedocs.io/en/latest/installation/ray.html\n.. _`Run on Kubernetes`: https://mars-project.readthedocs.io/en/latest/installation/kubernetes.html\n.. _`Run on Yarn`: https://mars-project.readthedocs.io/en/latest/installation/yarn.html\n.. _`DASK on Mars`: https://mars-project.readthedocs.io/en/latest/user_guide/contrib/dask.html\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmars-project%2Fmars","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fmars-project%2Fmars","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fmars-project%2Fmars/lists"}