{"id":22323299,"url":"https://github.com/skamensky/dt_wrapper","last_synced_at":"2025-03-26T05:22:06.174Z","repository":{"id":106019778,"uuid":"205281747","full_name":"skamensky/dt_wrapper","owner":"skamensky","description":"A python wrapper around the Demand Tools command line interface","archived":false,"fork":false,"pushed_at":"2019-08-30T01:43:45.000Z","size":8,"stargazers_count":0,"open_issues_count":0,"forks_count":0,"subscribers_count":2,"default_branch":"master","last_synced_at":"2025-03-18T02:54:39.914Z","etag":null,"topics":["command-line-interface","demandtools","python-wrapper"],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/skamensky.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2019-08-30T01:34:49.000Z","updated_at":"2019-09-03T19:32:38.000Z","dependencies_parsed_at":null,"dependency_job_id":"1cc69705-4fd3-48c6-bf85-e469ade241ea","html_url":"https://github.com/skamensky/dt_wrapper","commit_stats":{"total_commits":1,"total_committers":1,"mean_commits":1.0,"dds":0.0,"last_synced_commit":"f6086b7fe2eb1db0c7f80793c6f49c1eca9b9e3b"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skamensky%2Fdt_wrapper","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skamensky%2Fdt_wrapper/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skamensky%2Fdt_wrapper/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/skamensky%2Fdt_wrapper/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/skamensky","download_url":"https://codeload.github.com/skamensky/dt_wrapper/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":245592683,"owners_count":20640883,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["command-line-interface","demandtools","python-wrapper"],"created_at":"2024-12-04T01:09:50.335Z","updated_at":"2025-03-26T05:22:06.152Z","avatar_url":"https://github.com/skamensky.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# demandtoolswrapper\nA python wrapper around the Demand Tools command line interface.\n[Demand Tools](https://www.validity.com/demandtools/) is a deduplication and ETL tool for Salesforce.\n\nThis package arose from the need to integrate the product deeper into my automation pipeline than the [JobBuilder](https://helpconsole.validity.com/DemandToolsJobBuilder/)\nallowed.\n\nAs of now only scenario types that I've used are supported but pull requests are welcome if anyone would like to add more.\n\nThis implementation supports multiprocessing by default but since DemandTools doesn't return all errors to standard out, there may be some runs that appear to have succeeded but haven't actually.\nUse the multiprocessing functionality at your own risk.\n\nThanks to [black](https://github.com/psf/black) for formatting my code for me :).\n### Setup\n1. Add the folder containing the 'demandtools.exe' file to the `PATH` environment variable\n3. Create an environment variable named `DEMANDTOOLSLOGDIRECTORY` and set the value to the folder containing the DemandTools logs folder\n2. Log into DemandTools via the GUI and ensure you check off the option to stay logged in\n\n### Examples\n#### Example 1\nChain `DemandToolsCommand`'s together, watch DemandTools' logs,and run functions post scenario completion\n```import os\nimport dt_wrapper as dt \nfrom queue import Queue\n\nLOG_QUEUE = Queue()\n\n\ndef main():\n    \"\"\"\n    this function takes many different single dedupe scenarios, runs them on the lead table,\n    exports the duplicates without merging, and populates a custom field on the lead called 'Duplicate_Of__c'\n    so duplicates can be tracked in Salesforce\n    \"\"\"\n\n    # Logatcher is a context manager that watches the DemandTools log\n    # and prints the results stream as DemandTools writes to the file\n    # the manager optionally takes a threading queue with which to populate the logs\n    with dt.LogWatcher(log_queue=LOG_QUEUE):\n        mark_dedupe_scenario_dir = \"MassEffect Scenarios\"\n        mark_dedupe_scenario_path = \"auto_mark_dupes.MExml\"\n\n        # the demandtoolswrapper keeps track of the extension of various scenario types\n        dedupe_ext = dt.DemandToolsCommand._demand_tool_extension[\"dedupe\"]\n\n        lead_dedupe_dir = \"dedupe_dir\"\n        scenario_dir = \"confident_scenarios\"\n        # only get specific scenario types\n        dedupe_scenarios = [\n            os.path.join(scenario_dir, p)\n            for p in os.listdir(scenario_dir)\n            if p.endswith(dedupe_ext)\n        ]\n\n        output_dir = \"output_dir\"\n        for scenario in dedupe_scenarios:\n            output_file = os.path.join(\n                output_dir, os.path.basename(scenario).replace(dedupe_ext, \".csv\")\n            )\n            file_prep_output = os.path.join(\n                output_dir,\n                \"{scenario_name}_master_nonmaster.csv\".format(\n                    scenario_name=os.path.basename(scenario).replace(dedupe_ext, \"\")\n                ),\n            )\n\n            find_dupe_scenario = dt.DemandToolsCommand(\n                scenario_path=scenario, output_file=output_file, log_queue=LOG_QUEUE\n            )\n\n            # you can pass a function that should run once the scenario is finished running. Can be chained with multiple scenario runs\n            find_dupe_scenario.post_run_func = lambda: None\n            # DemandToolsCommand can also be passed a queue which will be populated with all print statements it produces\n            mark_as_dupe = dt.DemandToolsCommand(\n                scenario_path=mark_dedupe_scenario_path,\n                input_file=file_prep_output,\n                log_queue=LOG_QUEUE,\n            )\n            find_dupe_scenario.run()\n            try:\n                mark_as_dupe.run()\n            except dt.DemandToolsInputFileDoesNotExist as e:\n                print(\n                    '\"{file}\" not found. This usually happens when no dupes were found.'.format(\n                        file=mark_as_dupe.input_file_nice_name\n                    )\n                )\n\n\nif __name__ == \"__main__\":\n    main()\n```\n#### Example 2\nUse DemandTools as an ETL to export all tables and then upload the CSV's to a Postgres database.\n```\nimport dt_wrapper as dt \n\n\ndef sync_local_files_with_postgres(cursor, conn):\n    \"\"\"\n    sync_local_files_with_postgres is out of the scope for this example, but it simply runs\n    copies the output CSV's to a postgres DB\n    \"\"\"\n\n\ndef download_sf_data_as_csvs(cursor, conn):\n    bulk_backup_scenario = dt.DemandToolsCommand(\n        scenario_path=\"Scenarios\\Misc\\backupAllTables.BBxml\",\n        output_file=\"output_directory\",\n        # this is passed as an encoding argument to DemandTools\n        extra_dt_args=\"utf8\",\n        post_run_func=sync_local_files_with_postgres,\n        # you can feed args or kwargs to the post_run_func\n        post_run_func_args=(cursor, conn),\n    )\n\n    bulk_backup_scenario.run()\n```\n\n#### Example 3 \nRun 20+ DemandTools processes concurrently, expect Demand Tools exceptions, and retry a few times until success or permanent failure.\nIt turns out this is faster than running them sequentially and not running into exceptions.\nIt can use concurrency because it's not uploading anything back into Salesforce until all of the scenarios run to completion\n(i.e. the output of one scenario has no impact on the input of the next scenario).\n\n```\nimport os\nfrom queue import Queue\nfrom dt_wrapper import (\n    DemandToolsCommand,\n    DemandToolsMultiProcessDBWriteConflictException,\n    DemandToolsObjectReferenceException,\n    LogWatcher,\n)\n\nLOG_QUEUE = Queue()\n\n\ndef run_demand_tools_scenarios():\n    # find all scenarios in a directory by examining the extensions\n    scenarios_paths = DemandToolsCommand.get_scenarios_in_path(\n        path=\"account_dedupe_scenarios_loose\"\n    )\n    scenarios_paths.extend(\n        DemandToolsCommand.get_scenarios_in_path(\n            path=\"account_dedupe_scenarios_loose_confident\"\n        )\n    )\n    dt_processes = []\n    for scenario_path in scenarios_paths:\n        output_file_base_name = (\n            os.path.splitext(os.path.basename(scenario_path))[0] + \".csv\"\n        )\n        output_file = os.path.join(\"output_dir\", output_file_base_name)\n        command = DemandToolsCommand(\n            scenario_path=scenario_path,\n            output_file=output_file,\n            # can retry on predefined exceptions, otherwise fail on any exception\n            exceptions_to_retry_on=[\n                DemandToolsMultiProcessDBWriteConflictException,\n                DemandToolsObjectReferenceException,\n            ],\n            retry_count=3,\n            log_queue=LOG_QUEUE,\n        )\n        dt_processes.append(command)\n    # if you have multiple organizations on your machine, you can specify an organization_id for the relevant log file\n    with LogWatcher(log_queue=LOG_QUEUE, organization_id=\"00Di0000000gwFuEAI\"):\n        # run with multiprocessing enabled\n        [dt_command.start() for dt_command in dt_processes]\n        [dt_command.join() for dt_command in dt_processes]\n```\n\n### Future plans\n\nRestrict arguments to conform to the documentation which accounts for all possible valid scenarios.\nSyntax could be taken from DemandTool's [Scenario Synax Documentation](https://helpconsole.validity.com/DemandToolsJobBuilder/#pageid=demandtools_job___scenario_syntax\n) and DemandTool's [JobBuilder documentation](\nhttps://helpconsole.validity.com/DemandTools/default.aspx#pageid=scheduled_processes)\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskamensky%2Fdt_wrapper","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fskamensky%2Fdt_wrapper","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fskamensky%2Fdt_wrapper/lists"}