{"id":30421331,"url":"https://github.com/kyleprotho/analysistoolbox","last_synced_at":"2025-08-22T09:03:42.538Z","repository":{"id":38687799,"uuid":"287944083","full_name":"KyleProtho/AnalysisToolBox","owner":"KyleProtho","description":"Analysis Tool Box (i.e. \"analysistoolbox\") is a collection of tools in Python for data collection and processing, statisitics, analytics, and intelligence analysis.","archived":false,"fork":false,"pushed_at":"2025-07-25T13:04:49.000Z","size":2416,"stargazers_count":1,"open_issues_count":3,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-08-10T11:44:07.022Z","etag":null,"topics":["analytics","data-analysis","open-source-intelligence","python3","r","research","snippets","statistics"],"latest_commit_sha":null,"homepage":"https://kyleprotho.github.io/AnalysisToolBox/","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"gpl-3.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/KyleProtho.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE.md","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null}},"created_at":"2020-08-16T12:57:44.000Z","updated_at":"2025-08-08T12:55:19.000Z","dependencies_parsed_at":"2024-01-15T18:45:50.807Z","dependency_job_id":"cae778b1-a431-4345-ac0e-3c8a85682194","html_url":"https://github.com/KyleProtho/AnalysisToolBox","commit_stats":{"total_commits":494,"total_committers":2,"mean_commits":247.0,"dds":0.3238866396761133,"last_synced_commit":"fef0873f5228b02cf1359c7e75c95cebc18fbb1f"},"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/KyleProtho/AnalysisToolBox","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KyleProtho%2FAnalysisToolBox","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KyleProtho%2FAnalysisToolBox/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KyleProtho%2FAnalysisToolBox/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KyleProtho%2FAnalysisToolBox/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/KyleProtho","download_url":"https://codeload.github.com/KyleProtho/AnalysisToolBox/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/KyleProtho%2FAnalysisToolBox/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":271522992,"owners_count":24774749,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-08-21T02:00:08.990Z","response_time":74,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["analytics","data-analysis","open-source-intelligence","python3","r","research","snippets","statistics"],"created_at":"2025-08-22T09:01:59.116Z","updated_at":"2025-08-22T09:03:42.507Z","avatar_url":"https://github.com/KyleProtho.png","language":"Python","readme":"# Analysis Tool Box\n\n\u003cp align=\"center\"\u003e\n  \u003cimg src=\"Square logo - White background.png\" width=\"40%\"\u003e\n\u003c/p\u003e\n\n## Description\n\nAnalysis Tool Box (i.e. \"analysistoolbox\") is a collection of tools in Python for data collection and processing, statisitics, analytics, and intelligence analysis.\n\n## Getting Started\n\nTo install the package, run the following command in the root directory of the project:\n\n```bash\npip install analysistoolbox\n```\n\nVisualizations are created using the matplotlib and seaborn libraries. While you can select whichever seaborn style you'd like, the following Seaborn style tends to get the best looking plots:\n\n```python\nsns.set(\n    style=\"white\",\n    font=\"Arial\",\n    context=\"paper\"\n)\n```\n\n## Table of Contents / Usage\n\nThere are many modules in the analysistoolbox package, each with their own functions. The following is a list of the modules:\n\n- [Calculus](#calculus)\n  - [FindDerivative](#findderivative)\n  - [FindLimitOfFunction](#findlimitoffunction)\n  - [FindMinimumSquareLoss](#findminimumsquareloss)\n  - [PlotFunction](#plotfunction)\n- [Data Collection](#data-collection)\n  - [ExtractTextFromPDF](#extracttextfrompdf)\n  - [FetchPDFFromURL](#fetchpdffromurl)\n  - [FetchUSShapefile](#fetchusshapefile)\n  - [FetchWebsiteText](#fetchwebsitetext)\n  - [GetCompanyFilings](#getcompanyfilings)\n  - [GetGoogleSearchResults](#getgooglesearchresults)\n  - [GetZipFile](#getzipfile)\n- [Data Processing](#data-processing)\n  - [AddDateNumberColumns](#adddatenumbercolumns)\n  - [AddLeadingZeros](#addleadingzeros)\n  - [AddRowCountColumn](#addrowcountcolumn)\n  - [AddTPeriodColumn](#addtperiodcolumn)\n  - [AddTukeyOutlierColumn](#addtukeyoutliercolumn)\n  - [CleanTextColumns](#cleantextcolumns)\n  - [ConductAnomalyDetection](#conductanomalydetection)\n  - [ConductEntityMatching](#conductentitymatching)\n  - [ConvertOddsToProbability](#convertdoddsprobability)\n  - [CountMissingDataByGroup](#countmissingdatabygroup)\n  - [CreateBinnedColumn](#createbinnedcolumn)\n  - [CreateDataOverview](#createdataoverview)\n  - [CreateRandomSampleGroups](#createrandomsamplegroups)\n  - [CreateRareCategoryColumn](#createrarecategorycolumn)\n  - [CreateStratifiedRandomSampleGroups](#createstratifiedrandomsamplegroups)\n  - [ImputeMissingValuesUsingNearestNeighbors](#imputemissingvaluesusingnearestneighbors)\n  - [VerifyGranularity](#verifygranularity)\n- [Descriptive Analytics](#descriptive-analytics)\n  - [ConductManifoldLearning](#conductmanifoldlearning)\n  - [ConductPrincipalComponentAnalysis](#conductprincipalcomponentanalysis)\n  - [ConductPropensityScoreMatching](#conductpropensityscorematching)\n  - [CreateAssociationRules](#createassociationrules)\n  - [CreateGaussianMixtureClusters](#creategaussianmixtureclusters)\n  - [CreateHierarchicalClusters](#createhierarchicalclusters)\n  - [CreateKMeansClusters](#createkmeansclusters)\n  - [GenerateEDAWithLIDA](#generatedewithlida)\n- [File Management](#file-management)\n  - [ImportDataFromFolder](#importdatafromfolder)\n  - [CreateFileTree](#createfiletree)\n  - [CreateCopyOfPDF](#createcopyofpdf)\n  - [ConvertWordDocsToPDF](#convertworddocstopdf)\n- [Hypothesis Testing](#hypothesis-testing)\n  - [ChiSquareTestOfIndependence](#chisquaretestofindependence)\n  - [ChiSquareTestOfIndependenceFromTable](#chisquaretestofindependencefromtable)\n  - [ConductCoxProportionalHazardRegression](#conductcoxproportionalhazardregression)\n  - [ConductLinearRegressionAnalysis](#conductlinearregressionanalysis)\n  - [ConductLogisticRegressionAnalysis](#conductlogisticregressionanalysis)\n  - [OneSampleTTest](#onesamplettest)\n  - [OneWayANOVA](#onewayanova)\n  - [TTestOfMeanFromStats](#ttestofmeanfromstats)\n  - [TTestOfProportionFromStats](#ttestofproportionfromstats)\n  - [TTestOfTwoMeansFromStats](#ttestoftwomeansfromstats)\n  - [TwoSampleTTestOfIndependence](#twosampletestofindependence)\n  - [TwoSampleTTestPaired](#twosampletestpaired)\n- [Linear Algebra](#linear-algebra)\n  - [CalculateEigenvalues](#calculateeigenvalues)\n  - [ConvertMatrixToRowEchelonForm](#convertmatrixtorowechelonform)\n  - [ConvertSystemOfEquationsToMatrix](#convertsystemofequationstomatrix)\n  - [PlotVectors](#plotvectors)\n  - [SolveSystemOfEquations](#solvesystemofequations)\n  - [VisualizeMatrixAsLinearTransformation](#visualizematrixaslineartransformation)\n- [LLM](#llm)\n  - [SendPromptToAnthropic](#sendprompttoanthropic)\n  - [SendPromptToChatGPT](#sendprompttochatgpt)\n- [Predictive Analytics](#predictive-analytics)\n  - [CreateARIMAModel](#createarimamodel)\n  - [CreateBoostedTreeModel](#createboostedtreemodel)\n  - [CreateDecisionTreeModel](#createdecisiontreemodel)\n  - [CreateLinearRegressionModel](#createlinearregressionmodel)\n  - [CreateLogisticRegressionModel](#createlogisticregressionmodel)\n  - [CreateNeuralNetwork_SingleOutcome](#createneuralnetwork_singleoutcome)\n- [Prescriptive Analytics](#prescriptive-analytics)\n  - [ConductLinearOptimization](#conductlinearoptimization)\n  - [CreateContentBasedRecommender](#createcontentbasedrecommender)\n- [Probability](#probability)\n  - [ProbabilityOfAtLeastOne](#probabilityofatleastone)\n- [Simulations](#simulations)\n  - [CreateMetalogDistribution](#createmetalogdistribution)\n  - [CreateMetalogDistributionFromPercentiles](#createmetalogdistributionfrompercentiles)\n  - [CreateSIPDataframe](#createsipdataframe)\n  - [CreateSLURPDistribution](#createslurpdistribution)\n  - [SimulateCountOfSuccesses](#simulatecountofsuccesses)\n  - [SimulateCountOutcome](#simulatecountoutcome)\n  - [SimulateCountUntilFirstSuccess](#simulatecountuntilfirstsuccess)\n  - [SimulateNormallyDistributedOutcome](#simulatenormallydistributedoutcome)\n  - [SimulateTDistributedOutcome](#simulatetdistributedoutcome)\n  - [SimulateTimeBetweenEvents](#simulatetimebetweenevents)\n  - [SimulateTimeUntilNEvents](#simulatetimeuntilnevents)\n- [Statistics](#statistics)\n  - [CalculateConfidenceIntervalOfMean](#calculateconfidenceintervalofmean)\n  - [CalculateConfidenceIntervalOfProportion](#calculateconfidenceintervalofproportion)\n- [Visualizations](#visualizations)\n  - [Plot100PercentStackedBarChart](#plot100percentstackedbarchart)\n  - [PlotBarChart](#plotbarchart)\n  - [PlotBoxWhiskerByGroup](#plotboxwhiskerbygroup)\n  - [PlotBulletChart](#plotbulletchart)\n  - [PlotCard](#plotcard)\n  - [PlotClusteredBarChart](#plotclusteredbarchart)\n  - [PlotContingencyHeatmap](#plotcontingencyheatmap)\n  - [PlotCorrelationMatrix](#plotcorrelationmatrix)\n  - [PlotDensityByGroup](#plotdensitybygroup)\n  - [PlotDotPlot](#plotdotplot)\n  - [PlotHeatmap](#plotheatmap)\n  - [PlotOverlappingAreaChart](#plotoverlappingareachart)\n  - [PlotRiskTolerance](#plotrisktolerance)\n  - [PlotScatterplot](#plotscatterplot)\n  - [PlotSingleVariableCountPlot](#plotsinglevariablecountplot)\n  - [PlotSingleVariableHistogram](#plotsinglevariablehistogram)\n  - [PlotTimeSeries](#plottimeseries)\n  - [RenderTableOne](#rendertableone)\n\n\n### Calculus\n\n#### FindDerivative\n\nThe **FindDerivative** function calculates the derivative of a given function. It uses the sympy library, a Python library for symbolic mathematics, to perform the differentiation. The function also has the capability to print the original function and its derivative, return the derivative function, and plot both the original function and its derivative.\n\n```python\n# Load the FindDerivative function from the Calculus submodule\nfrom analysistoolbox.calculus import FindDerivative\nimport sympy\n\n# Define a symbolic variable\nx = sympy.symbols('x')\n\n# Define a function\nf_of_x = x**3 + 2*x**2 + 3*x + 4\n\n# Use the FindDerivative function\nFindDerivative(\n    f_of_x, \n    print_functions=True, \n    return_derivative_function=True, \n    plot_functions=True\n)\n```\n\n#### FindLimitOfFunction\n\nThe **FindLimitOfFunction** function finds the limit of a function at a specific point and optionally plot the function and its tangent line at that point. The script uses the matplotlib and numpy libraries for plotting and numerical operations respectively.\n\n```python\n# Import the necessary libraries\nfrom analysistoolbox.calculus import FindLimitOfFunction\nimport numpy as np\nimport sympy\n\n# Define a symbolic variable\nx = sympy.symbols('x')\n\n# Define a function\nf_of_x = np.sin(x) / x\n\n# Use the FindLimitOfFunction function\nFindLimitOfFunction(\n    f_of_x, \n    point=0, \n    step=0.01, \n    plot_function=True, \n    x_minimum=-10, \n    x_maximum=10, \n    n=1000, \n    tangent_line_window=1\n)\n```\n\n#### FindMinimumSquareLoss\n\nThe **FindMinimumSquareLoss** function calculates the minimum square loss between observed and predicted values. This function is often used in machine learning and statistics to measure the average squared difference between the actual and predicted outcomes.\n\n```python\n# Import the necessary libraries\nfrom analysistoolbox.calculus import FindMinimumSquareLoss\n\n# Define observed and predicted values\nobserved_values = [1, 2, 3, 4, 5]\npredicted_values = [1.1, 1.9, 3.2, 3.7, 5.1]\n\n# Use the FindMinimumSquareLoss function\nminimum_square_loss = FindMinimumSquareLoss(\n    observed_values, \n    predicted_values, \n    show_plot=True\n)\n\n# Print the minimum square loss\nprint(f\"The minimum square loss is: {minimum_square_loss}\")\n```\n\n#### PlotFunction\n\nThe **PlotFunction** function plots a mathematical function of x. It takes a lambda function as input and allows for customization of the plot.\n\n```python\n# Import the necessary libraries\nfrom analysistoolbox.calculus import PlotFunction\nimport sympy\n\n# Set x as a symbolic variable\nx = sympy.symbols('x')\n\n# Define the function to plot\nf_of_x = lambda x: x**2\n\n# Plot the function with default settings\nPlotFunction(f_of_x)\n```\n\n### Data Collection\n\n#### ExtractTextFromPDF\n\nThe **ExtractTextFromPDF** function extracts text from a PDF file, cleans it, then saves it to a text file.\n\n```python\n# Import the function\nfrom analysistoolbox.data_collection import ExtractTextFromPDF\n\n# Call the function\nExtractTextFromPDF(\n    filepath_to_pdf=\"/path/to/your/input.pdf\", \n    filepath_for_exported_text=\"/path/to/your/output.txt\", \n    start_page=1, \n    end_page=None\n)\n```\n\n#### FetchPDFFromURL\n\nThe **FetchPDFFromURL** function downloads a PDF file from a URL and saves it to a specified location.\n\n```python\n# Import the function\nfrom analysistoolbox.data_collection import FetchPDFFromURL\n\n# Call the function to download the PDF\nFetchPDFFromURL(\n    url=\"https://example.com/sample.pdf\", \n    filename=\"C:/folder/sample.pdf\"\n)\n```\n\n#### FetchUSShapefile\n\nThe **FetchUSShapefile** function fetches a geographical shapefile from the TIGER database of the U.S. Census Bureau. \n\n```python\n# Import the function\nfrom analysistoolbox.data_collection import FetchUSShapefile\n\n# Fetch the shapefile for the census tracts in King County, Washington, for the 2021 census year\nshapefile = FetchUSShapefile(\n    state='PA', \n    county='Allegheny', \n    geography='tract', \n    census_year=2021\n)\n\n# Print the first few rows of the shapefile\nprint(shapefile.head())\n```\n\n#### FetchWebsiteText\n\nThe **FetchWebsiteText** function fetches the text from a website and saves it to a text file.\n\n```python\n# Import the function\nfrom analysistoolbox.data_collection import FetchWebsiteText\n\n# Call the function\ntext = FetchWebsiteText(\n    url=\"https://www.example.com\", \n    browserless_api_key=\"your_browserless_api_key\"\n)\n\n# Print the fetched text\nprint(text)\n```\n\n#### GetCompanyFilings\n\nThe **GetCompanyFilings** function fetches company filings from the SEC EDGAR database. It returns a list of filings for a given company CIK (Central Index Key) and filing type.\n\n```python\n# Import the function\nfrom analysistoolbox.data_collection import GetCompanyFilings\n\n# Call the function to get company filings for 'Online Dating' companies in 2024\nresults = GetCompanyFilings(\n        search_keywords=\"Online Dating\",\n        start_date=\"2024-01-01\",\n        end_date=\"2024-12-31\",\n        filing_type=\"all\",\n    )\n\n# Print the results\nprint(results)\n```\n\n#### GetGoogleSearchResults\n\nThe **GetGoogleSearchResults** function fetches Google search results for a given query using the Serper API.\n\n```python\n# Import the function\nfrom analysistoolbox.data_collection import GetGoogleSearchResults\n\n# Call the function with the query\n# Make sure to replace 'your_serper_api_key' with your actual Serper API key\nresults = GetGoogleSearchResults(\n    query=\"Python programming\", \n    serper_api_key='your_serper_api_key', \n    number_of_results=5, \n    apply_autocorrect=True, \n    display_results=True\n)\n\n# Print the results\nprint(results)\n```\n\n#### GetZipFile\n\nThe **GetZipFile** function downloads a zip file from a url and saves it to a specified folder. It can also unzip the file and print the contents of the zip file.\n\n```python\n# Import the function\nfrom analysistoolbox.data_collection import GetZipFile\n\n# Call the function\nGetZipFile(\n    url=\"http://example.com/file.zip\", \n    path_to_save_folder=\"/path/to/save/folder\"\n)\n```\n\n### Data Processing\n\n#### AddDateNumberColumns\n\nThe **AddDateNumberColumns** function adds columns for the year, month, quarter, week, day of the month, and day of the week to a dataframe.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.data_processing import AddDateNumberColumns\nfrom datetime import datetime\nimport pandas as pd\n\n# Create a sample dataframe\ndata = {'Date': [datetime(2020, 1, 1), datetime(2020, 2, 1), datetime(2020, 3, 1), datetime(2020, 4, 1)]}\ndf = pd.DataFrame(data)\n\n# Use the function on the sample dataframe\ndf = AddDateNumberColumns(\n    dataframe=df, \n    date_column_name='Date'\n)\n\n# Print the updated dataframe\nprint(df)\n```\n\n#### AddLeadingZeros\n\nThe **AddLeadingZeros** function adds leading zeros to a column. If fixed_length is not specified, the longest string in the column is used as the fixed length. If add_as_new_column is set to True, the new column is added to the dataframe. Otherwise, the original column is updated.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.data_processing import AddLeadingZeros\nimport pandas as pd\n\n# Create a sample dataframe\ndata = {'ID': [1, 23, 456, 7890]}\ndf = pd.DataFrame(data)\n\n# Use the AddLeadingZeros function\ndf = AddLeadingZeros(\n    dataframe=df, \n    column_name='ID', \n    add_as_new_column=True\n)\n\n# Print updated dataframe\nprint(df)\n```\n\n#### AddRowCountColumn\n\nThe **AddRowCountColumn** function adds a column to a dataframe that contains the row number for each row, based on a group (or groups) of columns. The function can also sort the dataframe by a column or columns before adding the row count column.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.data_processing import AddRowCountColumn\nimport pandas as pd\n\n# Create a sample dataframe\ndata = {\n    'Payment Method': ['Check', 'Credit Card', 'Check', 'Credit Card', 'Check', 'Credit Card', 'Check', 'Credit Card'],\n    'Transaction Value': [100, 200, 300, 400, 500, 600, 700, 800],\n    'Transaction Order': [1, 2, 3, 4, 5, 6, 7, 8]\n}\ndf = pd.DataFrame(data)\n\n# Call the function\ndf_updated = AddRowCountColumn(\n    dataframe=df, \n    list_of_grouping_variables=['Payment Method'], \n    list_of_order_columns=['Transaction Order'], \n    list_of_ascending_order_args=[True]\n)\n\n# Print the updated dataframe\nprint(df_updated)\n```\n\n#### AddTPeriodColumn\n\nThe **AddTPeriodColumn** function adds a T-period column to a dataframe. The T-period column is the number of intervals (e.g., days or weeks) since the earliest date in the dataframe.\n\n```python\n# Import necessary libraries\nfrom analysistoolbox.data_processing import AddTPeriodColumn\nfrom datetime import datetime\nimport pandas as pd\n\n# Create a sample dataframe\ndata = {\n    'date': pd.date_range(start='1/1/2020', end='1/10/2020'),\n    'value': range(1, 11)\n}\ndf = pd.DataFrame(data)\n\n# Use the function\ndf_updated = AddTPeriodColumn(\n    dataframe=df, \n    date_column_name='date', \n    t_period_interval='days'\n)\n\n# Print the updated dataframe\nprint(df_updated)\n```\n\n#### AddTukeyOutlierColumn\n\nThe **AddTukeyOutlierColumn** function adds a column to a dataframe that indicates whether a value is an outlier. The function uses the Tukey method to identify outliers.\n\n```python\n# Import necessary libraries\nfrom analysistoolbox.data_processing import AddTukeyOutlierColumn\nimport pandas as pd\n\n# Create a sample dataframe\ndata = pd.DataFrame({'values': [1, 2, 3, 4, 5, 6, 7, 8, 9, 20]})\n\n# Use the function\ndf_updated = AddTukeyOutlierColumn(\n    dataframe=data, \n    value_column_name='values', \n    tukey_boundary_multiplier=1.5, \n    plot_tukey_outliers=True\n)\n\n# Print the updated dataframe\nprint(df_updated)\n```\n\n#### CleanTextColumns\n\nThe **CleanTextColumns** function cleans string-type columns in a pandas DataFrame by removing all leading and trailing spaces.\n\n```python\n# Import necessary libraries\nfrom analysistoolbox.data_processing import CleanTextColumns\nimport pandas as pd\n\n# Create a sample dataframe\ndf = pd.DataFrame({\n    'A': [' hello', 'world ', ' python '],\n    'B': [1, 2, 3],\n})\n\n# Clean the dataframe\ndf_clean = CleanTextColumns(df)\n```\n\n#### ConductAnomalyDetection\n\nThe **ConductAnomalyDetection** function performs anomaly detection on a given dataset using the z-score method.\n\n```python\n# Import necessary libraries\nfrom analysistoolbox.data_processing import ConductAnomalyDetection\nimport pandas as pd\n\n# Create a sample dataframe\ndf = pd.DataFrame({\n    'A': [1, 2, 3, 1000],\n    'B': [4, 5, 6, 2000],\n})\n\n# Conduct anomaly detection\ndf_anomaly_detected = ConductAnomalyDetection(\n    dataframe=df, \n    list_of_columns_to_analyze=['A', 'B']\n)\n\n# Print the updated dataframe\nprint(df_anomaly_detected)\n```\n\n#### ConductEntityMatching\n\nThe **ConductEntityMatching** function performs entity matching between two dataframes using various fuzzy matching algorithms.\n\n```python\nfrom analysistoolbox.data_processing import ConductEntityMatching\nimport pandas as pd\n\n# Create two dataframes\ndataframe_1 = pd.DataFrame({\n    'ID': ['1', '2', '3'],\n    'Name': ['John Doe', 'Jane Smith', 'Bob Johnson'],\n    'City': ['New York', 'Los Angeles', 'Chicago']\n})\n\ndataframe_2 = pd.DataFrame({\n    'ID': ['a', 'b', 'c'],\n    'Name': ['Jon Doe', 'Jane Smyth', 'Robert Johnson'],\n    'City': ['NYC', 'LA', 'Chicago']\n})\n\n# Conduct entity matching\nmatched_entities = ConductEntityMatching(\n    dataframe_1=dataframe_1,\n    dataframe_1_primary_key='ID',\n    dataframe_2=dataframe_2,\n    dataframe_2_primary_key='ID',\n    levenshtein_distance_filter=3,\n    match_score_threshold=80,\n    columns_to_compare=['Name', 'City'],\n    match_methods=['Partial Token Set Ratio', 'Weighted Ratio']\n)\n``` \n\n#### ConvertOddsToProbability\n\nThe **ConvertOddsToProbability** function converts odds to probability in a new column.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.data_processing import ConvertOddsToProbability\nimport pandas as pd\n\n# Create a sample dataframe\ndata = {\n    'Team': ['Team1', 'Team2', 'Team3', 'Team4'],\n    'Odds': [2.5, 1.5, 3.0, np.nan]\n}\ndf = pd.DataFrame(data)\n\n# Print the original dataframe\nprint(\"Original DataFrame:\")\nprint(df)\n\n# Use the function to convert odds to probability\ndf = ConvertOddsToProbability(\n    dataframe=df, \n    odds_column='Odds'\n)\n```\n\n#### CountMissingDataByGroup\n\nThe **CountMissingDataByGroup** function counts the number of records with missing data in a Pandas dataframe, grouped by specified columns.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.data_processing import CountMissingDataByGroup\nimport pandas as pd\nimport numpy as np\n\n# Create a sample dataframe with some missing values\ndata = {\n    'Group': ['A', 'B', 'A', 'B', 'A', 'B'],\n    'Value1': [1, 2, np.nan, 4, 5, np.nan],\n    'Value2': [np.nan, 8, 9, 10, np.nan, 12]\n}\ndf = pd.DataFrame(data)\n\n# Use the function to count missing data by group\nCountMissingDataByGroup(\n    dataframe=df, \n    list_of_grouping_columns=['Group']\n)\n```\n\n#### CreateBinnedColumn\n\nThe **CreateBinnedColumn** function creates a new column in a Pandas dataframe based on a numeric variable. Binning is a process of transforming continuous numerical variables into discrete categorical 'bins'.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.data_processing import CreateBinnedColumn\nimport pandas as pd\nimport numpy as np\n\n# Create a sample dataframe\ndata = {\n    'Group': ['A', 'B', 'A', 'B', 'A', 'B'],\n    'Value1': [1, 2, 3, 4, 5, 6],\n    'Value2': [7, 8, 9, 10, 11, 12]\n}\ndf = pd.DataFrame(data)\n\n# Use the function to create a binned column\ndf_binned = CreateBinnedColumn(\n    dataframe=df, \n    numeric_column_name='Value1', \n    number_of_bins=3, \n    binning_strategy='uniform'\n)\n```\n\n#### CreateDataOverview\n\nThe **CreateDataOverview** function creates an overview of a Pandas dataframe, including the data type, missing count, missing percentage, and summary statistics for each variable in the DataFrame.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.data_processing import CreateDataOverview\nimport pandas as pd\nimport numpy as np\n\n# Create a sample dataframe\ndata = {\n    'Column1': [1, 2, 3, np.nan, 5, 6],\n    'Column2': ['a', 'b', 'c', 'd', np.nan, 'f'],\n    'Column3': [7.1, 8.2, 9.3, 10.4, np.nan, 12.5]\n}\ndf = pd.DataFrame(data)\n\n# Use the function to create an overview of the dataframe\nCreateDataOverview(\n    dataframe=df, \n    plot_missingness=True\n)\n```\n\n#### CreateRandomSampleGroups\n\nThe **CreateRandomSampleGroups** function a takes a pandas DataFrame, shuffle its rows, assign each row to one of n groups, and then return the updated DataFrame with an additional column indicating the group number.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.data_processing import CreateRandomSampleGroups \nimport pandas as pd\n\n# Create a sample DataFrame\ndata = {\n    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve'],\n    'Age': [25, 31, 35, 19, 45],\n    'Score': [85, 95, 78, 81, 92]\n}\ndf = pd.DataFrame(data)\n\n# Use the function\ngrouped_df = CreateRandomSampleGroups(\n    dataframe=df, \n    number_of_groups=2, \n    random_seed=123\n)\n```\n\n#### CreateRareCategoryColumn\n\nThe **CreateRareCategoryColumn** function creates a new column in a Pandas dataframe that indicates whether a categorical variable value is rare. A rare category is a category that occurs less than a specified percentage of the time.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.data_processing import CreateRareCategoryColumn \nimport pandas as pd\n\n# Create a sample DataFrame\ndata = {\n    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Alice', 'Bob', 'Alice'],\n    'Age': [25, 31, 35, 19, 45, 23, 30, 24],\n    'Score': [85, 95, 78, 81, 92, 88, 90, 86]\n}\ndf = pd.DataFrame(data)\n\n# Use the function\nupdated_df = CreateRareCategoryColumn(\n    dataframe=df, \n    categorical_column_name='Name', \n    rare_category_label='Rare', \n    rare_category_threshold=0.05,\n    new_column_suffix='(rare category)'\n)\n```\n\n#### CreateStratifiedRandomSampleGroups\n\nThe **CreateStratifiedRandomSampleGroups** unction performs stratified random sampling on a pandas DataFrame. Stratified random sampling is a method of sampling that involves the division of a population into smaller groups known as strata. In stratified random sampling, the strata are formed based on members' shared attributes or characteristics.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.data_processing import CreateStratifiedRandomSampleGroups\nimport numpy as np\nimport pandas as pd\n\n# Create a sample DataFrame\ndata = {\n    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Alice', 'Bob', 'Alice'],\n    'Age': [25, 31, 35, 19, 45, 23, 30, 24],\n    'Score': [85, 95, 78, 81, 92, 88, 90, 86]\n}\ndf = pd.DataFrame(data)\n\n# Use the function\nstratified_df = CreateStratifiedRandomSampleGroups(\n    dataframe=df, \n    number_of_groups=2, \n    list_categorical_column_names=['Name'], \n    random_seed=42\n)\n```\n\n#### ImputeMissingValuesUsingNearestNeighbors\n\nThe **ImputeMissingValuesUsingNearestNeighbors** function imputes missing values in a dataframe using the nearest neighbors method. For each sample with missing values, it finds the n_neighbors nearest neighbors in the training set and imputes the missing values using the mean value of these neighbors.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.data_processing import ImputeMissingValuesUsingNearestNeighbors\nimport pandas as pd\nimport numpy as np\n\n# Create a sample DataFrame with missing values\ndata = {\n    'A': [1, 2, np.nan, 4, 5],\n    'B': [np.nan, 2, 3, 4, 5],\n    'C': [1, 2, 3, np.nan, 5],\n    'D': [1, 2, 3, 4, np.nan]\n}\ndf = pd.DataFrame(data)\n\n# Use the function\nimputed_df = ImputeMissingValuesUsingNearestNeighbors(\n    dataframe=df, \n    list_of_numeric_columns_to_impute=['A', 'B', 'C', 'D'], \n    number_of_neighbors=2, \n    averaging_method='uniform'\n)\n```\n\n#### VerifyGranularity\n\nThe **VerifyGranularity** function checks the granularity of a given dataframe based on a list of key columns. Granularity in this context refers to the level of detail or summarization in a set of data.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.data_processing import VerifyGranularity\nimport pandas as pd\n\n# Create a sample DataFrame\ndata = {\n    'Name': ['Alice', 'Bob', 'Charlie', 'David', 'Eve', 'Alice', 'Bob', 'Alice'],\n    'Age': [25, 31, 35, 19, 45, 23, 30, 24],\n    'Score': [85, 95, 78, 81, 92, 88, 90, 86]\n}\ndf = pd.DataFrame(data)\n\n# Use the function\nVerifyGranularity(\n    dataframe=df, \n    list_of_key_columns=['Name', 'Age'], \n    set_key_as_index=True, \n    print_as_markdown=False\n)\n```\n\n### Descriptive Analytics\n\n#### ConductManifoldLearning\n\nThe **ConductManifoldLearning** function performs manifold learning on a given dataframe and returns a new dataframe with the original columns and the new manifold learning components. Manifold learning is a type of unsupervised learning that is used to reduce the dimensionality of the data.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.descriptive_analytics import ConductManifoldLearning\nimport pandas as pd\nfrom sklearn.datasets import load_iris\n\n# Load the iris dataset\niris = load_iris()\niris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)\n\n# Use the function\nnew_df = ConductManifoldLearning(\n    dataframe=iris_df, \n    list_of_numeric_columns=['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'], \n    number_of_components=2, \n    random_seed=42, \n    show_component_summary_plots=True, \n    sns_color_palette='Set2',\n    summary_plot_size=(10, 10)\n)\n```\n\n#### ConductPrincipalComponentAnalysis\n\nThe **ConductPrincipalComponentAnalysis** function performs Principal Component Analysis (PCA) on a given dataframe. PCA is a technique used in machine learning to reduce the dimensionality of data while retaining as much information as possible.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.descriptive_analytics import ConductManifoldLearning\nimport pandas as pd\nfrom sklearn.datasets import load_iris\n\n# Load the iris dataset\niris = load_iris()\niris_df = pd.DataFrame(data=iris.data, columns=iris.feature_names)\n\n# Call the function\nresult = ConductPrincipalComponentAnalysis(\n    dataframe=iris_df,\n    list_of_numeric_columns=['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'],\n    number_of_components=2\n)\n```\n\n#### ConductPropensityScoreMatching\n\nConducts propensity score matching to create balanced treatment and control groups for causal inference analysis.\n\n```python\nfrom analysistoolbox.descriptive_analytics import ConductPropensityScoreMatching\nimport pandas as pd\n\n# Create matched groups based on age, education, and experience\nmatched_df = ConductPropensityScoreMatching(\n    dataframe=df,\n    subject_id_column_name='employee_id',\n    list_of_column_names_to_base_matching=['age', 'education', 'years_experience'],\n    grouping_column_name='received_training',\n    control_group_name='No',\n    max_matches_per_subject=1,\n    balance_groups=True,\n    propensity_score_column_name=\"PS_Score\",\n    matched_id_column_name=\"Matched_Employee_ID\",\n    random_seed=412\n)\n```\n\n#### CreateAssociationRules\n\nThe **CreateAssociationRules** function creates association rules from a given dataframe. Association rules are widely used in market basket analysis, where the goal is to find associations and/or correlations among a set of items.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.descriptive_analytics import CreateAssociationRules\nimport pandas as pd\n\n# Assuming you have a dataframe 'df' with 'TransactionID' and 'Item' columns\nresult = CreateAssociationRules(\n    dataframe=df,\n    transaction_id_column='TransactionID',\n    items_column='Item',\n    support_threshold=0.01,\n    confidence_threshold=0.2,\n    plot_lift=True,\n    plot_title='Association Rules',\n    plot_size=(10, 7)\n)\n```\n\n#### CreateGaussianMixtureClusters\n\nThe **CreateGaussianMixtureClusters** function creates Gaussian mixture clusters from a given dataframe. Gaussian mixture models are a type of unsupervised learning that is used to find clusters in data. It adds the resulting clusters as a new column in the dataframe, and also calculates the probability of each data point belonging to each cluster.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.descriptive_analytics import CreateGaussianMixtureClusters\nimport pandas as pd\nfrom sklearn import datasets\n\n# Load the iris dataset\niris = datasets.load_iris()\n\n# Convert the iris dataset to a pandas dataframe\ndf = pd.DataFrame(data= np.c_[iris['data'], iris['target']],\n                  columns= iris['feature_names'] + ['target'])\n\n# Call the CreateGaussianMixtureClusters function\ndf_clustered = CreateGaussianMixtureClusters(\n    dataframe=df,\n    list_of_numeric_columns_for_clustering=['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'],\n    number_of_clusters=3,\n    column_name_for_clusters='Gaussian Mixture Cluster',\n    scale_predictor_variables=True,\n    show_cluster_summary_plots=True,\n    sns_color_palette='Set2',\n    summary_plot_size=(15, 15),\n    random_seed=123,\n    maximum_iterations=200\n)\n```\n\n#### CreateHierarchicalClusters\n\nThe **CreateHierarchicalClusters** function creates hierarchical clusters from a given dataframe. Hierarchical clustering is a type of unsupervised learning that is used to find clusters in data. It adds the resulting clusters as a new column in the dataframe.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.descriptive_analytics import CreateHierarchicalClusters\nimport pandas as pd\nfrom sklearn import datasets\n\n# Load the iris dataset\niris = datasets.load_iris()\ndf = pd.DataFrame(data=iris.data, columns=iris.feature_names)\n\n# Call the CreateHierarchicalClusters function\ndf_clustered = CreateHierarchicalClusters(\n    dataframe=df,\n    list_of_value_columns_for_clustering=['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'],\n    number_of_clusters=3,\n    column_name_for_clusters='Hierarchical Cluster',\n    scale_predictor_variables=True,\n    show_cluster_summary_plots=True,\n    color_palette='Set2',\n    summary_plot_size=(6, 4),\n    random_seed=412,\n    maximum_iterations=300\n)\n```\n\n#### CreateKMeansClusters\n\nThe **CreateKMeansClusters** function performs K-Means clustering on a given dataset and returns the dataset with an additional column indicating the cluster each record belongs to.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.descriptive_analytics import CreateKMeansClusters\nimport pandas as pd\nfrom sklearn import datasets\n\n# Load the iris dataset\niris = datasets.load_iris()\ndf = pd.DataFrame(data=iris.data, columns=iris.feature_names)\n\n# Call the CreateKMeansClusters function\ndf_clustered = CreateKMeansClusters(\n    dataframe=df,\n    list_of_value_columns_for_clustering=['sepal length (cm)', 'sepal width (cm)', 'petal length (cm)', 'petal width (cm)'],\n    number_of_clusters=3,\n    column_name_for_clusters='K-Means Cluster',\n    scale_predictor_variables=True,\n    show_cluster_summary_plots=True,\n    color_palette='Set2',\n    summary_plot_size=(6, 4),\n    random_seed=412,\n    maximum_iterations=300\n)\n```\n\n#### GenerateEDAWithLIDA\n\nThe **GenerateEDAWithLIDA** function uses the LIDA package from Microsoft to generate exploratory data analysis (EDA) goals. \n\n```python\n# Import necessary packages\nfrom analysistoolbox.descriptive_analytics import GenerateEDAWithLIDA\nimport pandas as pd\nfrom sklearn import datasets\n\n# Load the iris dataset\niris = datasets.load_iris()\ndf = pd.DataFrame(data=iris.data, columns=iris.feature_names)\n\n# Call the GenerateEDAWithLIDA function\ndf_summary = GenerateEDAWithLIDA(\n    dataframe=df,\n    llm_api_key=\"your_llm_api_key_here\",\n    llm_provider=\"openai\",\n    llm_model=\"gpt-3.5-turbo\",\n    visualization_library=\"seaborn\",\n    goal_temperature=0.50,\n    code_generation_temperature=0.05,\n    data_summary_method=\"llm\",\n    number_of_samples_to_show_in_summary=5,\n    return_data_fields_summary=True,\n    number_of_goals_to_generate=5,\n    plot_recommended_visualization=True,\n    show_code_for_recommended_visualization=True\n)\n```\n\n### File Management\n\n#### ImportDataFromFolder\n\nThe **ImportDataFromFolder** function imports all CSV and Excel files from a specified folder and combines them into a single DataFrame. It ensures that column names match across all files if specified.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.file_management import ImportDataFromFolder\n\n# Specify the folder path\nfolder_path = \"path/to/your/folder\"\n\n# Call the ImportDataFromFolder function\ncombined_df = ImportDataFromFolder(\n    folder_path=folder_path,\n    force_column_names_to_match=True\n)\n```\n\n#### CreateFileTree\n\nThe **CreateFileTree** function recursively walks a directory tree and prints a diagram of all the subdirectories and files.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.file_management import CreateFileTree\n\n# Specify the directory path\ndirectory_path = \"path/to/your/directory\"\n\n# Call the CreateFileTree function\nCreateFileTree(\n    path=directory_path,\n    indent_spaces=2\n)\n```\n\n#### CreateCopyOfPDF\n\nThe **CreateCopyOfPDF** function creates a copy of a PDF file, with options to specify the start and end pages.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.file_management import CreateCopyOfPDF\n\n# Specify the input and output file paths\ninput_pdf = \"path/to/input.pdf\"\noutput_pdf = \"path/to/output.pdf\"\n\n# Call the CreateCopyOfPDF function\nCreateCopyOfPDF(\n    input_file=input_pdf,\n    output_file=output_pdf,\n    start_page=1,\n    end_page=5\n)\n```\n\n#### ConvertWordDocsToPDF\n\nThe **ConvertWordDocsToPDF** function converts all Word documents in a specified folder to PDF format.\n\n```python\n# Import necessary packages\nfrom analysistoolbox.file_management import ConvertWordDocsToPDF\n\n# Specify the folder paths\nword_folder = \"path/to/word/documents\"\npdf_folder = \"path/to/save/pdf/documents\"\n\n# Call the ConvertWordDocsToPDF function\nConvertWordDocsToPDF(\n    word_folder_path=word_folder,\n    pdf_folder_path=pdf_folder,\n    open_each_doc=False\n)\n```\n\n### Hypothesis Testing\n\n#### ChiSquareTestOfIndependence\n\nThe **ChiSquareTestOfIndependence** function performs a chi-square test of independence to determine if there is a significant relationship between two categorical variables.\n\n```python\nfrom analysistoolbox.hypothesis_testing import ChiSquareTestOfIndependence\n\n# Create sample data\ndata = {\n    'Education': ['High School', 'College', 'High School', 'Graduate', 'College'],\n    'Employment': ['Employed', 'Unemployed', 'Employed', 'Employed', 'Unemployed']\n}\ndf = pd.DataFrame(data)\n\n# Conduct chi-square test\nChiSquareTestOfIndependence(\n    dataframe=df,\n    first_categorical_column='Education',\n    second_categorical_column='Employment',\n    plot_contingency_table=True\n)\n```\n\n#### ChiSquareTestOfIndependenceFromTable\n\nThe **ChiSquareTestOfIndependenceFromTable** function performs a chi-square test using a pre-computed contingency table.\n\n```python\nfrom analysistoolbox.hypothesis_testing import ChiSquareTestOfIndependenceFromTable\n\n# Create contingency table\ncontingency_table = pd.DataFrame({\n    'Online': [100, 150],\n    'In-Store': [200, 175]\n}, index=['Male', 'Female'])\n\n# Conduct chi-square test\nChiSquareTestOfIndependenceFromTable(\n    contingency_table=contingency_table,\n    plot_contingency_table=True\n)\n```\n\n#### ConductCoxProportionalHazardRegression\n\nThe **ConductCoxProportionalHazardRegression** function performs survival analysis using Cox Proportional Hazard regression.\n\n```python\nfrom analysistoolbox.hypothesis_testing import ConductCoxProportionalHazardRegression\n\n# Conduct Cox regression\nmodel = ConductCoxProportionalHazardRegression(\n    dataframe=df,\n    outcome_column='event',\n    duration_column='time',\n    list_of_predictor_columns=['age', 'sex', 'treatment'],\n    plot_survival_curve=True\n)\n```\n\n#### ConductLinearRegressionAnalysis\n\nThe **ConductLinearRegressionAnalysis** function performs linear regression analysis with optional plotting.\n\n```python\nfrom analysistoolbox.hypothesis_testing import ConductLinearRegressionAnalysis\n\n# Conduct linear regression\nresults = ConductLinearRegressionAnalysis(\n    dataframe=df,\n    outcome_column='sales',\n    list_of_predictor_columns=['advertising', 'price'],\n    plot_regression_diagnostic=True\n)\n```\n\n#### ConductLogisticRegressionAnalysis\n\nThe **ConductLogisticRegressionAnalysis** function performs logistic regression for binary outcomes.\n\n```python\nfrom analysistoolbox.hypothesis_testing import ConductLogisticRegressionAnalysis\n\n# Conduct logistic regression\nresults = ConductLogisticRegressionAnalysis(\n    dataframe=df,\n    outcome_column='purchased',\n    list_of_predictor_columns=['age', 'income'],\n    plot_regression_diagnostic=True\n)\n```\n\n#### OneSampleTTest\n\nThe **OneSampleTTest** function performs a one-sample t-test to compare a sample mean to a hypothesized population mean.\n\n```python\nfrom analysistoolbox.hypothesis_testing import OneSampleTTest\n\n# Conduct one-sample t-test\nOneSampleTTest(\n    dataframe=df,\n    outcome_column='score',\n    hypothesized_mean=70,\n    alternative_hypothesis='two-sided',\n    confidence_interval=0.95\n)\n```\n\n#### OneWayANOVA\n\nThe **OneWayANOVA** function performs a one-way analysis of variance to compare means across multiple groups.\n\n```python\nfrom analysistoolbox.hypothesis_testing import OneWayANOVA\n\n# Conduct one-way ANOVA\nOneWayANOVA(\n    dataframe=df,\n    outcome_column='performance',\n    grouping_column='treatment_group',\n    plot_sample_distributions=True\n)\n```\n\n#### TTestOfMeanFromStats\n\nThe **TTestOfMeanFromStats** function performs a t-test using summary statistics rather than raw data.\n\n```python\nfrom analysistoolbox.hypothesis_testing import TTestOfMeanFromStats\n\n# Conduct t-test from statistics\nTTestOfMeanFromStats(\n    sample_mean=75,\n    sample_size=30,\n    sample_standard_deviation=10,\n    hypothesized_mean=70,\n    alternative_hypothesis='greater'\n)\n```\n\n#### TTestOfProportionFromStats\n\nThe **TTestOfProportionFromStats** function tests a sample proportion against a hypothesized value.\n\n```python\nfrom analysistoolbox.hypothesis_testing import TTestOfProportionFromStats\n\n# Test proportion from statistics\nTTestOfProportionFromStats(\n    sample_proportion=0.65,  # 65% proportion\n    sample_size=200,         # 200 survey responses\n    hypothesized_proportion=0.50,\n    alternative_hypothesis='two-sided'\n)\n```\n\n#### TTestOfTwoMeansFromStats\n\nThe **TTestOfTwoMeansFromStats** function compares two means using summary statistics.\n\n```python\nfrom analysistoolbox.hypothesis_testing import TTestOfTwoMeansFromStats\n\n# Compare two means from statistics\nTTestOfTwoMeansFromStats(\n    first_sample_mean=75,\n    first_sample_size=30,\n    first_sample_standard_deviation=10,\n    second_sample_mean=70,\n    second_sample_size=30,\n    second_sample_standard_deviation=12\n)\n```\n\n#### TwoSampleTTestOfIndependence\n\nThe **TwoSampleTTestOfIndependence** function performs an independent samples t-test to compare means between two groups.\n\n```python\nfrom analysistoolbox.hypothesis_testing import TwoSampleTTestOfIndependence\n\n# Conduct independent samples t-test\nTwoSampleTTestOfIndependence(\n    dataframe=df,\n    outcome_column='score',\n    grouping_column='group',\n    alternative_hypothesis='two-sided',\n    homogeneity_of_variance=True\n)\n```\n\n#### TwoSampleTTestPaired\n\nThe **TwoSampleTTestPaired** function performs a paired samples t-test for before-after comparisons.\n\n```python\nfrom analysistoolbox.hypothesis_testing import TwoSampleTTestPaired\n\n# Conduct paired samples t-test\nTwoSampleTTestPaired(\n    dataframe=df,\n    first_outcome_column='pre_score',\n    second_outcome_column='post_score',\n    alternative_hypothesis='greater'\n)\n```\n\n### Linear Algebra\n\n#### CalculateEigenvalues\n\nThe **CalculateEigenvalues** function calculates and visualizes the eigenvalues and eigenvectors of a matrix.\n\n```python\nfrom analysistoolbox.linear_algebra import CalculateEigenvalues\nimport numpy as np\n\n# Create a 2x2 matrix\nmatrix = np.array([\n    [4, -2],\n    [1, 1]\n])\n\n# Calculate eigenvalues and eigenvectors\nCalculateEigenvalues(\n    matrix=matrix,\n    plot_eigenvectors=True,\n    plot_transformation=True\n)\n```\n\n#### ConvertMatrixToRowEchelonForm\n\nThe **ConvertMatrixToRowEchelonForm** function converts a matrix to row echelon form using Gaussian elimination.\n\n```python\nfrom analysistoolbox.linear_algebra import ConvertMatrixToRowEchelonForm\nimport numpy as np\n\n# Create a matrix\nmatrix = np.array([\n    [1, 2, 3],\n    [4, 5, 6],\n    [7, 8, 9]\n])\n\n# Convert to row echelon form\nrow_echelon = ConvertMatrixToRowEchelonForm(\n    matrix=matrix,\n    show_pivot_columns=True\n)\n```\n\n#### ConvertSystemOfEquationsToMatrix\n\nThe **ConvertSystemOfEquationsToMatrix** function converts a system of linear equations to matrix form.\n\n```python\nfrom analysistoolbox.linear_algebra import ConvertSystemOfEquationsToMatrix\nimport numpy as np\n\n# Define system of equations: \n# 2x + 3y = 8\n# 4x - y = 1\ncoefficients = np.array([\n    [2, 3],\n    [4, -1]\n])\nconstants = np.array([8, 1])\n\n# Convert to matrix form\nmatrix = ConvertSystemOfEquationsToMatrix(\n    coefficients=coefficients,\n    constants=constants,\n    show_determinant=True\n)\n```\n\n#### PlotVectors\n\nThe **PlotVectors** function visualizes vectors in 2D or 3D space.\n\n```python\nfrom analysistoolbox.linear_algebra import PlotVectors\nimport numpy as np\n\n# Define vectors\nvectors = [\n    [3, 2],    # First vector\n    [-1, 4],   # Second vector\n    [2, -3]    # Third vector\n]\n\n# Plot vectors\nPlotVectors(\n    list_of_vectors=vectors,\n    origin=[0, 0],\n    plot_sum=True,\n    grid=True\n)\n```\n\n#### SolveSystemOfEquations\n\nThe **SolveSystemOfEquations** function solves a system of linear equations and optionally visualizes the solution.\n\n```python\nfrom analysistoolbox.linear_algebra import SolveSystemOfEquations\nimport numpy as np\n\n# Define system of equations:\n# 2x + y = 5\n# x - 3y = -1\ncoefficients = np.array([\n    [2, 1],\n    [1, -3]\n])\nconstants = np.array([5, -1])\n\n# Solve the system\nsolution = SolveSystemOfEquations(\n    coefficients=coefficients,\n    constants=constants,\n    show_plot=True,\n    plot_boundary=10\n)\n```\n\n#### VisualizeMatrixAsLinearTransformation\n\nThe **VisualizeMatrixAsLinearTransformation** function visualizes how a matrix transforms space as a linear transformation.\n\n```python\nfrom analysistoolbox.linear_algebra import VisualizeMatrixAsLinearTransformation\nimport numpy as np\n\n# Define transformation matrix\ntransformation_matrix = np.array([\n    [2, -1],\n    [1, 1]\n])\n\n# Visualize the transformation\nVisualizeMatrixAsLinearTransformation(\n    transformation_matrix=transformation_matrix,\n    plot_grid=True,\n    plot_unit_vectors=True,\n    animation_frames=30\n)\n```\n\n### LLM\n\n#### SendPromptToAnthropic\n\nThe **SendPromptToAnthropic** function sends a prompt to Anthropic's Claude API using LangChain. It supports template-based prompting and requires an Anthropic API key.\n\n```python\nfrom analysistoolbox.llm import SendPromptToAnthropic\n\n# Define your prompt template with variables in curly braces\nprompt_template = \"Given the text: {text}\\nSummarize the main points in bullet form.\"\n\n# Create a dictionary with your input variables\nuser_input = {\n    \"text\": \"Your text to analyze goes here...\"\n}\n\n# Send the prompt to Claude\nresponse = SendPromptToAnthropic(\n    prompt_template=prompt_template,\n    user_input=user_input,\n    system_message=\"You are a helpful assistant.\",\n    anthropic_api_key=\"your-api-key-here\",\n    temperature=0.0,\n    chat_model_name=\"claude-3-opus-20240229\",\n    maximum_tokens=1000\n)\n\nprint(response)\n```\n\n#### SendPromptToChatGPT\n\nThe **SendPromptToChatGPT** function sends a prompt to OpenAI's ChatGPT API using LangChain. It supports template-based prompting and requires an OpenAI API key.\n\n```python\nfrom analysistoolbox.llm import SendPromptToChatGPT\n\n# Define your prompt template with variables in curly braces\nprompt_template = \"Analyze the following data: {data}\\nProvide key insights.\"\n\n# Create a dictionary with your input variables\nuser_input = {\n    \"data\": \"Your data to analyze goes here...\"\n}\n\n# Send the prompt to ChatGPT\nresponse = SendPromptToChatGPT(\n    prompt_template=prompt_template,\n    user_input=user_input,\n    system_message=\"You are a helpful assistant.\",\n    openai_api_key=\"your-api-key-here\",\n    temperature=0.0,\n    chat_model_name=\"gpt-4o-mini\",\n    maximum_tokens=1000\n)\n\nprint(response)\n```\n\n### Predictive Analytics\n\n#### CreateARIMAModel\n\nBuilds an ARIMA (Autoregressive Integrated Moving Average) model for time series forecasting.\n\n```python\nfrom analysistoolbox.predictive_analytics import CreateARIMAModel\nimport pandas as pd\n\n# Create time series forecast\nforecast = CreateARIMAModel(\n    dataframe=df,\n    time_column='date',\n    value_column='sales',\n    forecast_periods=12\n)\n```\n\n#### CreateBoostedTreeModel\n\nCreates a gradient boosted tree model for classification or regression tasks, offering high performance and feature importance analysis.\n\n```python\nfrom analysistoolbox.predictive_analytics import CreateBoostedTreeModel\n\n# Train a boosted tree classifier\nmodel = CreateBoostedTreeModel(\n    dataframe=df,\n    outcome_variable='churn',\n    list_of_predictor_variables=['usage', 'tenure', 'satisfaction'],\n    is_outcome_categorical=True,\n    plot_model_test_performance=True\n)\n```\n\n#### CreateDecisionTreeModel\n\nBuilds an interpretable decision tree for classification or regression, with visualization options.\n\n```python\nfrom analysistoolbox.predictive_analytics import CreateDecisionTreeModel\n\n# Create a decision tree for predicting house prices\nmodel = CreateDecisionTreeModel(\n    dataframe=df,\n    outcome_variable='price',\n    list_of_predictor_variables=['sqft', 'bedrooms', 'location'],\n    is_outcome_categorical=False,\n    maximum_depth=5\n)\n```\n\n#### CreateLinearRegressionModel\n\nFits a linear regression model with optional scaling and comprehensive performance visualization.\n\n```python\nfrom analysistoolbox.predictive_analytics import CreateLinearRegressionModel\n\n# Predict sales based on advertising spend\nmodel = CreateLinearRegressionModel(\n    dataframe=df,\n    outcome_variable='sales',\n    list_of_predictor_variables=['tv_ads', 'radio_ads', 'newspaper_ads'],\n    scale_variables=True,\n    plot_model_test_performance=True\n)\n```\n\n#### CreateLogisticRegressionModel\n\nImplements logistic regression for binary classification tasks with regularization options.\n\n```python\nfrom analysistoolbox.predictive_analytics import CreateLogisticRegressionModel\n\n# Predict customer churn probability\nmodel = CreateLogisticRegressionModel(\n    dataframe=df,\n    outcome_variable='churn',\n    list_of_predictor_variables=['usage', 'complaints', 'satisfaction'],\n    scale_predictor_variables=True,\n    show_classification_plot=True\n)\n```\n\n#### CreateNeuralNetwork_SingleOutcome\n\nBuilds and trains a neural network for single-outcome prediction tasks, with customizable architecture.\n\n```python\nfrom analysistoolbox.predictive_analytics import CreateNeuralNetwork_SingleOutcome\n\n# Create a neural network for image classification\nmodel = CreateNeuralNetwork_SingleOutcome(\n    dataframe=df,\n    outcome_variable='label',\n    list_of_predictor_variables=feature_columns,\n    number_of_hidden_layers=3,\n    is_outcome_categorical=True,\n    plot_loss=True\n)\n```\n\n### Prescriptive Analytics\n\nThe prescriptive analytics module provides tools for making data-driven recommendations and decisions:\n\n#### ConductLinearOptimization\n\nConducts linear optimization to find the optimal input values for a given output variable, with optional constraints.\n\n```python\nimport pandas as pd\nfrom analysistoolbox.prescriptive_analytics.ConductLinearOptimization import ConductLinearOptimization\n\n# Sample data\ndata = pd.DataFrame({\n    'input1': [1, 2, 3, 4, 5],\n    'input2': [2, 4, 6, 8, 10],\n    'output': [10, 20, 30, 40, 50]\n})\n\n# Define constraints (optional)\nconstraints = {\n    'input1': (0, 10),  # input1 between 0 and 10\n    'input2': (None, 15)  # input2 maximum 15, no minimum\n}\n\n# Run optimization\nresults = ConductLinearOptimization(\n    dataframe=data,\n    output_variable='output',\n    list_of_input_variables=['input1', 'input2'],\n    optimization_type='maximize',\n    input_constraints=constraints\n)\n```\n\n#### CreateContentBasedRecommender\n\nBuilds a content-based recommendation system using neural networks to learn user and item embeddings.\n\n```python\nfrom analysistoolbox.prescriptive_analytics import CreateContentBasedRecommender\nimport pandas as pd\n\n# Create a movie recommendation system\nrecommender = CreateContentBasedRecommender(\n    dataframe=movie_ratings_df,\n    outcome_variable='rating',\n    user_list_of_predictor_variables=['age', 'gender', 'occupation'],\n    item_list_of_predictor_variables=['genre', 'year', 'director', 'budget'],\n    user_number_of_hidden_layers=2,\n    item_number_of_hidden_layers=2,\n    number_of_recommendations=5,\n    scale_variables=True,\n    plot_loss=True\n)\n```\n\n### Probability\n\nThe probability module provides tools for working with probability distributions and statistical models:\n\n#### ProbabilityOfAtLeastOne\n\nCalculates and visualizes the probability of at least one event occurring in a series of independent trials.\n\n```python\nfrom analysistoolbox.probability import ProbabilityOfAtLeastOne\n\n# Calculate probability of at least one defect in 10 products\n# given a 5% defect rate per product\nprob = ProbabilityOfAtLeastOne(\n    probability_of_event=0.05,\n    number_of_events=10,\n    format_as_percent=True,\n    show_plot=True,\n    risk_tolerance=0.20  # Highlight 20% risk threshold\n)\n\n# Calculate probability of at least one successful sale\n# given 30 customer interactions with 15% success rate\nprob = ProbabilityOfAtLeastOne(\n    probability_of_event=0.15,\n    number_of_events=30,\n    format_as_percent=True,\n    show_plot=True,\n    title_for_plot=\"Sales Success Probability\",\n    subtitle_for_plot=\"Probability of at least one sale in 30 customer interactions\"\n)\n```\n\n### Simulations\n\nThe simulations module provides a comprehensive set of tools for statistical simulations and probability distributions:\n\n#### CreateMetalogDistribution\n\nCreates a flexible metalog distribution from data, useful for modeling complex probability distributions.\n\n```python\nfrom analysistoolbox.simulations import CreateMetalogDistribution\n\n# Create a metalog distribution from historical data\ndistribution = CreateMetalogDistribution(\n    dataframe=df,\n    variable='sales',\n    lower_bound=0,\n    number_of_samples=10000,\n    plot_metalog_distribution=True\n)\n```\n\n#### CreateMetalogDistributionFromPercentiles\n\nBuilds a metalog distribution from known percentile values.\n\n```python\nfrom analysistoolbox.simulations import CreateMetalogDistributionFromPercentiles\n\n# Create distribution from percentiles\ndistribution = CreateMetalogDistributionFromPercentiles(\n    list_of_values=[10, 20, 30, 50],\n    list_of_percentiles=[0.1, 0.25, 0.75, 0.9],\n    lower_bound=0,\n    show_distribution_plot=True\n)\n```\n\n#### CreateSIPDataframe\n\nGenerates Stochastically Indexed Percentiles (SIP) for uncertainty analysis.\n\n```python\nfrom analysistoolbox.simulations import CreateSIPDataframe\n\n# Create SIP dataframe for risk analysis\nsip_df = CreateSIPDataframe(\n    number_of_percentiles=10,\n    number_of_trials=1000\n)\n```\n\n#### CreateSLURPDistribution\nCreates a SIP with relationships preserved (SLURP) based on a linear regression model's prediction interval.\n\n```python\nfrom analysistoolbox.simulations import CreateSLURPDistribution\n\n# Create a SLURP distribution from a linear regression model\nslurp_dist = CreateSLURPDistribution(\n    linear_regression_model=model,  # statsmodels regression model\n    list_of_prediction_values=[x1, x2, ...],  # values for predictors\n    number_of_trials=10000,  # number of samples to generate\n    prediction_interval=0.95,  # confidence level for prediction interval\n    lower_bound=None,  # optional lower bound constraint\n    upper_bound=None  # optional upper bound constraint\n)\n```\n\n#### SimulateCountOfSuccesses\n\nSimulates binomial outcomes (number of successes in fixed trials).\n\n```python\nfrom analysistoolbox.simulations import SimulateCountOfSuccesses\n\n# Simulate customer conversion rates\nresults = SimulateCountOfSuccesses(\n    probability_of_success=0.15,\n    sample_size_per_trial=100,\n    number_of_trials=10000,\n    plot_simulation_results=True\n)\n```\n\n#### SimulateCountOutcome\n\nSimulates Poisson-distributed count data.\n\n```python\nfrom analysistoolbox.simulations import SimulateCountOutcome\n\n# Simulate daily customer arrivals\narrivals = SimulateCountOutcome(\n    expected_count=25,\n    number_of_trials=10000,\n    plot_simulation_results=True\n)\n```\n\n#### SimulateCountUntilFirstSuccess\n\nSimulates geometric distributions (trials until first success).\n\n```python\nfrom analysistoolbox.simulations import SimulateCountUntilFirstSuccess\n\n# Simulate number of attempts until success\nattempts = SimulateCountUntilFirstSuccess(\n    probability_of_success=0.2,\n    number_of_trials=10000,\n    plot_simulation_results=True\n)\n```\n\n#### SimulateNormallyDistributedOutcome\nGenerates normally distributed random variables.\n\n```python\nfrom analysistoolbox.simulations import SimulateNormallyDistributedOutcome\n\n# Simulate product weights\nweights = SimulateNormallyDistributedOutcome(\n    mean=100,\n    standard_deviation=5,\n    number_of_trials=10000,\n    plot_simulation_results=True\n)\n```\n\n#### SimulateTDistributedOutcome\nGenerates Student's t-distributed random variables.\n\n```python\nfrom analysistoolbox.simulations import SimulateTDistributedOutcome\n\n# Simulate with heavy-tailed distribution\nvalues = SimulateTDistributedOutcome(\n    degrees_of_freedom=5,\n    number_of_trials=10000,\n    plot_simulation_results=True\n)\n```\n\n#### SimulateTimeBetweenEvents\n\nSimulates exponentially distributed inter-arrival times.\n\n```python\nfrom analysistoolbox.simulations import SimulateTimeBetweenEvents\n\n# Simulate time between customer arrivals\ntimes = SimulateTimeBetweenEvents(\n    average_time_between_events=30,\n    number_of_trials=10000,\n    plot_simulation_results=True\n)\n```\n\n#### SimulateTimeUntilNEvents\nSimulates Erlang-distributed waiting times.\n\n```python\nfrom analysistoolbox.simulations import SimulateTimeUntilNEvents\n\n# Simulate time until 5 events occur\nwait_time = SimulateTimeUntilNEvents(\n    average_time_between_events=10,\n    number_of_events=5,\n    number_of_trials=10000,\n    plot_simulation_results=True\n)\n```\n\n### Statistics\n\nThe statistics module provides essential tools for statistical inference and estimation:\n\n#### CalculateConfidenceIntervalOfMean\n\nCalculates confidence intervals for population means, automatically handling both large (z-distribution) and small (t-distribution) sample sizes.\n\n```python\nfrom analysistoolbox.statistics import CalculateConfidenceIntervalOfMean\n\n# Calculate 95% confidence interval for average customer spending\nci_results = CalculateConfidenceIntervalOfMean(\n    sample_mean=45.2,\n    sample_standard_deviation=12.5,\n    sample_size=100,\n    confidence_interval=0.95,\n    plot_sample_distribution=True,\n    value_name=\"Average Spending ($)\"\n)\n```\n\n#### CalculateConfidenceIntervalOfProportion\n\nCalculates confidence intervals for population proportions, with automatic selection of the appropriate distribution based on sample size.\n\n```python\nfrom analysistoolbox.statistics import CalculateConfidenceIntervalOfProportion\n\n# Calculate 95% confidence interval for customer satisfaction rate\nci_results = CalculateConfidenceIntervalOfProportion(\n    sample_proportion=0.78,  # 78% satisfaction rate\n    sample_size=200,         # 200 survey responses\n    confidence_interval=0.95,\n    plot_sample_distribution=True,\n    value_name=\"Satisfaction Rate\"\n)\n```\n\n### Visualizations\n\nThe visualizations module provides a comprehensive set of tools for creating publication-quality statistical plots and charts:\n\n#### Plot100PercentStackedBarChart\nCreates a 100% stacked bar chart for comparing proportional compositions across categories.\n\n```python\nfrom analysistoolbox.visualizations import Plot100PercentStackedBarChart\n\n# Create a stacked bar chart showing customer segments by region\nchart = Plot100PercentStackedBarChart(\n    dataframe=df,\n    categorical_column_name='Region',\n    value_column_name='Customers',\n    grouping_column_name='Segment'\n)\n```\n\n#### PlotBarChart\n\nCreates a customizable bar chart with options for highlighting specific categories.\n\n```python\nfrom analysistoolbox.visualizations import PlotBarChart\n\n# Create a bar chart of sales by product\nchart = PlotBarChart(\n    dataframe=df,\n    categorical_column_name='Product',\n    value_column_name='Sales',\n    top_n_to_highlight=3,\n    highlight_color=\"#b0170c\"\n)\n```\n\n#### PlotBoxWhiskerByGroup\n\nCreates box-and-whisker plots for comparing distributions across groups.\n\n```python\nfrom analysistoolbox.visualizations import PlotBoxWhiskerByGroup\n\n# Compare salary distributions across departments\nplot = PlotBoxWhiskerByGroup(\n    dataframe=df,\n    value_column_name='Salary',\n    grouping_column_name='Department'\n)\n```\n\n#### PlotBulletChart\n\nCreates bullet charts for comparing actual values against targets with optional range bands.\n\n```python\nfrom analysistoolbox.visualizations import PlotBulletChart\n\n# Create bullet chart comparing actual vs target sales\nchart = PlotBulletChart(\n    dataframe=df,\n    value_column_name='Actual_Sales',\n    grouping_column_name='Region',\n    target_value_column_name='Target_Sales',\n    list_of_limit_columns=['Min_Sales', 'Max_Sales']\n)\n```\n\n#### PlotCard\n\nCreates a simple card-style visualization with a value and an optional value label.\n\n```python\nfrom analysistoolbox.visualizations import PlotCard\n\n# Create a simple KPI card\ncard = PlotCard(\n    value=125000,  # main value to display\n    value_label=\"Monthly Revenue\",  # optional label\n    value_font_size=30,  # size of the main value\n    value_label_font_size=14,  # size of the label\n    figure_size=(3, 2)  # dimensions of the card\n)\n```\n\n#### PlotClusteredBarChart\n\nCreates grouped bar charts for comparing multiple categories across groups.\n\n```python\nfrom analysistoolbox.visualizations import PlotClusteredBarChart\n\n# Create clustered bar chart of sales by product and region\nchart = PlotClusteredBarChart(\n    dataframe=df,\n    categorical_column_name='Product',\n    value_column_name='Sales',\n    grouping_column_name='Region'\n)\n```\n\n#### PlotContingencyHeatmap\n\nCreates a heatmap visualization of contingency tables.\n\n```python\nfrom analysistoolbox.visualizations import PlotContingencyHeatmap\n\n# Create heatmap of customer segments vs purchase categories\nheatmap = PlotContingencyHeatmap(\n    dataframe=df,\n    categorical_column_name_1='Customer_Segment',\n    categorical_column_name_2='Purchase_Category',\n    normalize_by=\"columns\"\n)\n```\n\n#### PlotCorrelationMatrix\n\nCreates correlation matrix visualizations with optional scatter plots.\n\n```python\nfrom analysistoolbox.visualizations import PlotCorrelationMatrix\n\n# Create correlation matrix of numeric variables\nmatrix = PlotCorrelationMatrix(\n    dataframe=df,\n    list_of_value_column_names=['Age', 'Income', 'Spending'],\n    show_as_pairplot=True\n)\n```\n\n#### PlotDensityByGroup\n\nCreates density plots for comparing distributions across groups.\n\n```python\nfrom analysistoolbox.visualizations import PlotDensityByGroup\n\n# Compare age distributions across customer segments\nplot = PlotDensityByGroup(\n    dataframe=df,\n    value_column_name='Age',\n    grouping_column_name='Customer_Segment'\n)\n```\n\n#### PlotDotPlot\n\nCreates dot plots with optional connecting lines between groups.\n\n```python\nfrom analysistoolbox.visualizations import PlotDotPlot\n\n# Compare before/after measurements\nplot = PlotDotPlot(\n    dataframe=df,\n    categorical_column_name='Metric',\n    value_column_name='Value',\n    group_column_name='Time_Period',\n    connect_dots=True\n)\n```\n\n#### PlotHeatmap\n\nCreates customizable heatmaps for visualizing two-dimensional data.\n\n```python\nfrom analysistoolbox.visualizations import PlotHeatmap\n\n# Create heatmap of customer activity by hour and day\nheatmap = PlotHeatmap(\n    dataframe=df,\n    x_axis_column_name='Hour',\n    y_axis_column_name='Day',\n    value_column_name='Activity',\n    color_palette=\"RdYlGn\"\n)\n```\n\n#### PlotOverlappingAreaChart\n\nCreates stacked or overlapping area charts for time series data.\n\n```python\nfrom analysistoolbox.visualizations import PlotOverlappingAreaChart\n\n# Show product sales trends over time\nchart = PlotOverlappingAreaChart(\n    dataframe=df,\n    time_column_name='Date',\n    value_column_name='Sales',\n    variable_column_name='Product'\n)\n```\n\n#### PlotRiskTolerance\n\nCreates specialized plots for risk analysis and tolerance visualization.\n\n```python\nfrom analysistoolbox.visualizations import PlotRiskTolerance\n\n# Visualize risk tolerance levels\nplot = PlotRiskTolerance(\n    dataframe=df,\n    value_column_name='Risk_Score',\n    tolerance_level_column_name='Tolerance'\n)\n```\n\n#### PlotScatterplot\n\nCreates scatter plots with optional trend lines and grouping.\n\n```python\nfrom analysistoolbox.visualizations import PlotScatterplot\n\n# Create scatter plot of age vs income\nplot = PlotScatterplot(\n    dataframe=df,\n    x_axis_column_name='Age',\n    y_axis_column_name='Income',\n    color_by_column_name='Education'\n)\n```\n\n#### PlotSingleVariableCountPlot\n\nCreates count plots for categorical variables.\n\n```python\nfrom analysistoolbox.visualizations import PlotSingleVariableCountPlot\n\n# Show distribution of customer types\nplot = PlotSingleVariableCountPlot(\n    dataframe=df,\n    categorical_column_name='Customer_Type',\n    top_n_to_highlight=2\n)\n```\n\n#### PlotSingleVariableHistogram\n\nCreates histograms for continuous variables.\n\n```python\nfrom analysistoolbox.visualizations import PlotSingleVariableHistogram\n\n# Create histogram of transaction amounts\nplot = PlotSingleVariableHistogram(\n    dataframe=df,\n    value_column_name='Transaction_Amount',\n    show_mean=True,\n    show_median=True\n)\n```\n\n#### PlotTimeSeries\n\nCreates time series plots with optional grouping and marker sizes.\n\n```python\nfrom analysistoolbox.visualizations import PlotTimeSeries\n\n# Plot monthly sales with grouping\nplot = PlotTimeSeries(\n    dataframe=df,\n    time_column_name='Date',\n    value_column_name='Sales',\n    grouping_column_name='Region',  # optional grouping\n    marker_size_column_name='Volume',  # optional markers\n    line_color='#3269a8',\n    figure_size=(8, 5)\n)\n```\n\n#### RenderTableOne\n\nCreates publication-ready summary statistics tables comparing variables across groups.\n\n```python\nfrom analysistoolbox.visualizations import RenderTableOne\n\n# Create summary statistics table comparing age, education by department\ntable = RenderTableOne(\n    dataframe=df,\n    value_column_name='Age',  # outcome variable\n    grouping_column_name='Department',  # grouping variable\n    list_of_row_variables=['Education', 'Experience'],  # variables to compare\n    table_format='html',  # output format\n    show_p_value=True  # include statistical tests\n)\n```\n\n## Contributions\n\nContributions to the analysistoolbox package are welcome! Please submit a pull request with your changes.\n\n## License\n\nThe analysistoolbox package is licensed under the GNU License. Read more about the GNU License at [https://www.gnu.org/licenses/gpl-3.0.html](https://www.gnu.org/licenses/gpl-3.0.html).\n","funding_links":[],"categories":[],"sub_categories":[],"project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyleprotho%2Fanalysistoolbox","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fkyleprotho%2Fanalysistoolbox","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fkyleprotho%2Fanalysistoolbox/lists"}