{"id":18554241,"url":"https://github.com/oracle/accelerated-data-science","last_synced_at":"2025-04-08T12:03:12.857Z","repository":{"id":37581417,"uuid":"456699558","full_name":"oracle/accelerated-data-science","owner":"oracle","description":"ADS is the Oracle Data Science Cloud Service's python SDK supporting, model ops (train/eval/deploy), along with running workloads on Jobs and Pipeline resources.","archived":false,"fork":false,"pushed_at":"2025-04-07T05:27:29.000Z","size":61564,"stargazers_count":99,"open_issues_count":16,"forks_count":46,"subscribers_count":20,"default_branch":"main","last_synced_at":"2025-04-07T06:27:41.748Z","etag":null,"topics":["cloud","machine-learning","oci","oracle","python3"],"latest_commit_sha":null,"homepage":"https://accelerated-data-science.readthedocs.io/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"upl-1.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oracle.png","metadata":{"files":{"readme":"README-development.md","changelog":null,"contributing":"CONTRIBUTING.md","funding":null,"license":"LICENSE.txt","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":"CODEOWNERS","security":"SECURITY.md","support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-02-07T22:30:21.000Z","updated_at":"2025-04-07T05:25:21.000Z","dependencies_parsed_at":"2023-12-26T04:30:52.405Z","dependency_job_id":"9ed30847-5acd-41bd-a070-78dc938ea055","html_url":"https://github.com/oracle/accelerated-data-science","commit_stats":{"total_commits":2148,"total_committers":32,"mean_commits":67.125,"dds":0.8049348230912476,"last_synced_commit":"3d8f148f6bff1e6dcd7df8e25cdc2b2bc85e2e74"},"previous_names":[],"tags_count":86,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle%2Faccelerated-data-science","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle%2Faccelerated-data-science/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle%2Faccelerated-data-science/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oracle%2Faccelerated-data-science/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oracle","download_url":"https://codeload.github.com/oracle/accelerated-data-science/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":247838421,"owners_count":21004578,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["cloud","machine-learning","oci","oracle","python3"],"created_at":"2024-11-06T21:20:30.346Z","updated_at":"2025-04-08T12:03:12.850Z","avatar_url":"https://github.com/oracle.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\u003c!-- TOC --\u003e\n# Summary\n\nThe Oracle Accelerated Data Science (ADS) SDK used by data scientists and analysts for\ndata exploration and experimental machine learning to democratize machine learning and\nanalytics by providing easy-to-use, \nperformant, and user-friendly tools that\nbrings together the best of data science practices.\n\nThe ADS SDK helps you connect to different data sources, perform exploratory data analysis,\ndata visualization, feature engineering, model training, model evaluation, and\nmodel interpretation. ADS also allows you to connect to the model catalog to save and load\nmodels to and from the catalog.\n\n- [Summary](#summary)\n  - [Documentation](#documentation)\n  - [Get Support](#get-support)\n  - [Getting started](#getting-started)\n    - [Step 1: Create a conda environment](#step-1-create-a-conda-environment)\n    - [Step 2: Activate your environment](#step-2-activate-your-environment)\n    - [Step 3: Clone ADS and install dependencies](#step-3-clone-ads-and-install-dependencies)\n    - [Step 4: Setup configuration files](#step-4-setup-configuration-files)\n    - [Step 5: Versioning and generation the wheel](#step-5-versioning-and-generation-the-wheel)\n  - [Running tests](#running-tests)\n    - [Running default setup tests](#running-default-setup-tests)\n    - [Running all unit tests](#running-all-unit-tests)\n    - [Running integration tests](#running-integration-tests)\n    - [Running opctl integration tests](#running-opctl-integration-tests)\n  - [Local Setup of AQUA API JupyterLab Server](#local-setup-of-aqua-api-jupyterlab-server)\n    - [Step 1: Requirements](#step-1-requirements)\n    - [Step 2: Create local .env files](#step-2-create-local-env-files)\n    - [Step 3: Add the run\\_ads.sh script in the ADS Repository](#step-3-add-the-run_adssh-script-in-the-ads-repository)\n    - [Step 4: Run the JupyterLab Server](#step-4-run-the-jupyterlab-server)\n    - [Step 5: Run the unit tests for the AQUA API](#step-5-run-the-unit-tests-for-the-aqua-api)\n  - [Security](#security)\n  - [License](#license)\n\n\n## Documentation\n\n - [ads-documentation](https://docs.oracle.com/en-us/iaas/tools/ads-sdk/latest/index.html)\n - [oci-data-science-ai-samples](https://github.com/oracle/oci-data-science-ai-samples)\n\n## Get Support\n\n- Open a [GitHub issue](https://github.com/oracle/accelerated-data-science/issues) for bug reports, questions, or requests for enhancements.\n- Report a security vulnerability according to the [Reporting Vulnerabilities guide](https://www.oracle.com/corporate/security-practices/assurance/vulnerability/reporting.html).\n\n## Getting started\n\nThese are the minimum required steps to install and set up the ADS SDK to run on your local machine\nfor development and testing purposes.\n\n### Step 1: Create a conda environment\n\nInstall Anaconda from `https://repo.continuum.io/miniconda/` for the operating system you are using.\n\nIn the terminal client, enter the following where \u003cyourenvname\u003e is the name you want to call your environment,\nand set the Python version you want to use. ADS SDK requires Python \u003e=3.8.\n\n```bash\n    conda create -n \u003cyourenvname\u003e python=3.8 anaconda\n```\n\nThis installs the Python version and all the associated anaconda packaged libraries at `path_to_your_anaconda_location/anaconda/envs/\u003cyourenvname\u003e`\n\n### Step 2: Activate your environment\n\nTo activate or switch into your conda environment, run this command:\n\n```bash\n    conda activate \u003cyourenvname\u003e\n```\n\nTo list of all your environments, use the `conda env list` command.\n\n### Step 3: Clone ADS and install dependencies\n\nOpen the destination folder where you want to clone ADS library, and install dependencies like this:\n\n```bash\n    cd \u003cdesctination_folder\u003e\n    git clone git@github.com:oracle/accelerated-data-science.git\n    python3 -m pip install -e .\n```\n\nTo view which packages were installed and their version numbers, run:\n\n```bash\n    python3 -m pip freeze\n```\n\n### Step 4: Setup configuration files\n\nYou should also set up configuration files, see the [SDK and CLI Configuration File](https://docs.cloud.oracle.com/Content/API/Concepts/sdkconfig.htm).\n\n\n### Step 5: Versioning and generation the wheel\n\nBump the versions in `pyproject.toml`. The ADS SDK using [build](https://pypa-build.readthedocs.io/en/stable/index.html) as build frontend. To generate sdist and wheel, you can run:\n\n```bash\n    pip install build\n    python3 -m build\n```\n\nThis wheel can then be installed using `pip`.\n\n## Running tests\n\nThe SDK uses pytest as its test framework.\n\n### Running default setup tests\n\nDefault setup tests for testing ADS SDK without extra dependencies, specified in `pyproject.toml` in `[project.optional-dependencies]`.\n\n```bash\n  # Update your environment with tests dependencies\n  pip install -r test-requirements.txt\n  # Run default setup tests\n  python3 -m pytest tests/unitary/default_setup\n```\n\n### Running all unit tests\n\nTo run all unit test install extra dependencies to test all modules of ADS ASD.\n\n```bash\n  # Update your environment with tests dependencies\n  pip install -r test-requirements.txt\n  pip install -e \".[testsuite]\"\n  # Run all unit tests\n  python3 -m pytest tests/unitary\n```\n\n### Running integration tests\n\nADS opctl integration tests can't be run together with all other integration tests, they require special setup.\nTo run all but opctl integration tests, you can run:\n\n```bash\n  # Update your environment with tests dependencies\n  pip install -r test-requirements.txt\n  pip install -e \".[testsuite]\"\n  # Run integration tests\n  python3 -m pytest tests/integration --ignore=tests/integration/opctl\n```\n\n### Running opctl integration tests\n\nADS opctl integration tests utilize cpu, gpu jobs images and need dataexpl_p37_cpu_v2 and pyspark30_p37_cpu_v3 Data Science Environments be installed, see the [About Conda Environments](https://docs.oracle.com/en-us/iaas/data-science/using/conda_understand_environments.htm).\nTo build development container, see the [Build Development Container Image](https://accelerated-data-science.readthedocs.io/en/latest/user_guide/cli/opctl/localdev/jobs_container_image.html).\n\n```bash\n  # Update your environment with tests dependencies\n  pip install -r test-requirements.txt\n  pip install -e \".[opctl]\"\n  pip install oci oci-cli\n  # Build cpu and gpu jobs images\n  ads opctl build-image -d job-local\n  ads opctl build-image -g -d job-local  \n  # Run opclt integration tests\n  python3 -m pytest tests/integration/opctl\n```\n\n## Local Setup of AQUA API JupyterLab Server\nThese are the steps to run the AQUA (AI Quick Actions) API Server for development and testing purposes. The source code for the AQUA API Server is [here](https://github.com/oracle/accelerated-data-science/tree/21ba00b95aef8581991fee6c7d558e2f2b1680ac/ads/aqua) within this repository.\n\n### Step 1: Requirements\n+ Complete the [Getting Started](#getting-started) Section above, create a conda environment with python \u003e3.9 or 3.10\n+ install any Rest API Client in your IDE (Thunder Client on [vscode](https://marketplace.visualstudio.com/items?itemName=rangav.vscode-thunder-client) or Postman) \n+ Activate the conda environment from the Getting Started Section and run\n\n```\npip install -r test-requirements.txt\n```\n\n### Step 2: Create local .env files \nRunning the local JupyterLab server requires setting OCI authentication, proxy, and OCI namespace parameters. Adapt this .env file with your specific OCI profile and OCIDs to set these variables.\n\n```\nCONDA_BUCKET_NS=\"your_conda_bucket\"\nhttp_proxy=\"\"\nhttps_proxy=\"\"\nHTTP_PROXY=\"\"\nHTTPS_PROXY=\"\"\nOCI_ODSC_SERVICE_ENDPOINT=\"your_service_endpoint\"\nAQUA_SERVICE_MODELS_BUCKET=\"service-managed-models\"\nAQUA_TELEMETRY_BUCKET_NS=\"\" \nPROJECT_COMPARTMENT_OCID=\"ocid1.compartment.oc1.\u003cyour_ocid\u003e\" \nOCI_CONFIG_PROFILE=\"your_oci_profile_name\"\nOCI_IAM_TYPE=\"security_token\" # no modification needed if using token-based auth\nTENANCY_OCID=\"ocid1.tenancy.oc1.\u003cyour_ocid\u003e\"\nAQUA_JOB_SUBNET_ID=\"ocid1.subnet.oc1.\u003cyour_ocid\u003e\"\nODSC_MODEL_COMPARTMENT_OCID=\"ocid1.compartment.oc1.\u003cyour_ocid\u003e\" \nPROJECT_OCID=\"ocid1.datascienceproject.oc1.\u003cyour_ocid\u003e\"\n```\n\n### Step 3: Add the run_ads.sh script in the ADS Repository \n+ add the shell script below and .env file from step 2 to your local directory of the cloned ADS Repository\n+ Run ```chmox +x run_ads.sh``` after you create this script.\n```\n#!/bin/bash\n\n#### Check if a CLI command is provided\nif [ \"$#\" -lt 1 ]; then\n  echo \"Usage: $0 \u003ccli command\u003e\"\n  exit 1\nfi\n\n#### Load environment variables from .env file\nif [ -f .env ]; then\n  export $(grep -v '^#' .env.int | xargs)\nelse\n  echo \"Error: .env.int file not found!\"\n  exit 1\nfi\n\n# Execute the CLI command\n\"$@\"\n```\n\n### Step 4: Run the JupyterLab Server \nWe can start the JupyterLab server using the following command\n\n``` \n./run_ads.sh jupyter lab --no-browser --ServerApp.disable_check_xsrf=True\n```\n+ run ```pkill jupyter-lab``` to kill the JupyterLab server and re-run server to reflect changes made locally to the AQUA API\n+ to test if server is running via CLI, run this in terminal\n\n```\n./run_ads.sh ads aqua model list\n```\n\nTo make calls to the API, use the link http://localhost:8888/aqua/insert_handler_here with a REST API Client like Thunder Client/ Postman.\n\nExamples of handlers\n```\nGET http://localhost:8888/aqua/model # calling the model_handler.py\n\nGET http://localhost:8888/aqua/deployments # calling the deployment_handler.py\n```\nHandlers can be found [here](https://github.com/oracle/accelerated-data-science/tree/21ba00b95aef8581991fee6c7d558e2f2b1680ac/ads/aqua/extension).\n\n### Step 5: Run the unit tests for the AQUA API\nAll the unit tests can be found [here](https://github.com/oracle/accelerated-data-science/tree/main/tests/unitary/with_extras/aqua). \nThe following commands detail how the unit tests can be run.\n```\n# Run all tests in AQUA project\npython -m pytest -q tests/unitary/with_extras/aqua/*\n\n# Run all tests specific to a module within in AQUA project (ex. test_deployment.py, test_model.py, etc.)\npython -m pytest -q tests/unitary/with_extras/aqua/test_deployment.py\n\n# Run specific test method within the module (replace test_get_deployment_default_params with targeted test method)\npython -m pytest tests/unitary/with_extras/aqua/test_deployment.py -k \"test_get_deployment_default_params\"\n```\n\n## Security\n\nConsult the [security guide](./SECURITY.md) for our responsible security\nvulnerability disclosure process.\n\n## License\n\nCopyright (c) 2020, 2022 Oracle, Inc. All rights reserved.\nLicensed under the Universal Permissive License v 1.0 as shown at https://oss.oracle.com/licenses/upl.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foracle%2Faccelerated-data-science","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foracle%2Faccelerated-data-science","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foracle%2Faccelerated-data-science/lists"}