{"id":13288284,"url":"https://github.com/BrunoArsioli/astrocat","last_synced_at":"2025-03-10T06:33:22.196Z","repository":{"id":168622122,"uuid":"644339617","full_name":"BrunoArsioli/astrocat","owner":"BrunoArsioli","description":"Astrocat, an open-source project aimed at supporting researchers and data scientists with crossmatching of catalogs. In astrophysics, this task is typically done via TopCat software. When moving the crossmatching tasks to a Python + Astropy framework, I believe you will experience efficiency \u0026 time gains in your daily workflow. More functions soon.","archived":false,"fork":false,"pushed_at":"2024-03-26T16:58:56.000Z","size":58,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-03-27T15:27:08.864Z","etag":null,"topics":["astrophysics","astropy","crossmatch"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"mit","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/BrunoArsioli.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2023-05-23T10:09:35.000Z","updated_at":"2023-05-24T22:53:34.000Z","dependencies_parsed_at":"2024-03-26T15:27:07.508Z","dependency_job_id":"2c73136d-06fd-489d-bf55-6133aeb0e542","html_url":"https://github.com/BrunoArsioli/astrocat","commit_stats":null,"previous_names":["brunoarsioli/astrocat"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BrunoArsioli%2Fastrocat","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BrunoArsioli%2Fastrocat/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BrunoArsioli%2Fastrocat/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/BrunoArsioli%2Fastrocat/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/BrunoArsioli","download_url":"https://codeload.github.com/BrunoArsioli/astrocat/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":242805420,"owners_count":20187995,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["astrophysics","astropy","crossmatch"],"created_at":"2024-07-29T16:56:17.494Z","updated_at":"2025-03-10T06:33:21.776Z","avatar_url":"https://github.com/BrunoArsioli.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# astrocat\n\nCrossmatching Astrophysical Catalogs\n\nAstrocat, an open-source project aimed at supporting researchers and data scientists with crossmatching of catalogs. In astrophysics, this task is typically done via TopCat software. When moving the crossmatching tasks to a Python + Astropy framework, I believe you will experience efficiency \u0026 time gains in your daily workflow. More functions will be included in future.\n\nCurrent functions available:\n\n* crossmatch_astrocat() : match your main cat with and external cat, extrac info from ext and add to your main.\n\n* crossmatch_radius()   : have a bird's eye view on how your sky-crossmatch-radius inpact true vs. spurious matches  (still to push into astrocat)\n\n* fits_to_parquet()     : convert .fits to .parquet and save disk space (up to 60% reduction in file size)\n\n* csv_to_parquet()      : convert .csv  to .parquet and save disk space (up to 60% reduction in file size)\n\n\n## How to Install \n\nIntall astrocat package: \n\n```\n!install git+https://github.com/BrunoArsioli/astrocat.git\n```\n\nthen, import crossmatch_catalog from astrocat.crossmatch library:\n\n```\nfrom astrocat import crossmatch\nfrom astrocat import cross_panstarrs\n```\n\n## Functions\n\n### --\u003e crossmatch_astrocat() function:\n\nThis function will take two DataFrames (main and external), the column names for the R.A. and Dec. for each DataFrame, the list of column names to be added to the main DataFrame, and the maximum distance for a match between catalogs. It will add the new columns to the main df, fill in values for true matches (within max_distance), and return the updated main DataFrame.\n\nHere we have a description of the crossmatch_astrocat() function and its input variables. \n\n```\ncrossmatch_astrocat(df_main, df_ext, ra_main, dec_main, ra_ext, dec_ext, col_list, max_distance)\n```\n\nThe crossmatch_astrocat() is meant to crossmatches two astronomical catalogs.\n\nThis function finds matches between two catalogs of astronomical objects, given a maximum distance for a match.\nFor each object in the main catalog, the closest object in the external catalog within max_distance is identified.\nIf such a match is found, specified information from the external catalog is added to the main catalog.\n\n#### crossmatch_astrocat() parameters:\n\n* df_main : DataFrame\n    The main catalog DataFrame. Each row represents an astronomical object.\n\n* df_ext : DataFrame\n    The external catalog DataFrame. Each row represents an astronomical object.\n\n* ra_main, dec_main : str\n    Column names for the right ascension and declination in df_main.\n\n* ra_ext, dec_ext : str\n    Column names for the right ascension and declination in df_ext.\n\n* col_list : list of str\n    Column names in df_ext that will be added to df_main for matching objects.\n\n* max_distance : float\n    Maximum distance in arcseconds to consider for a match.\n\n#### Returns\n\n* DataFrame\n    The updated main catalog DataFrame, with new columns added from df_ext for matching objects.\n\n#### Examples\n\nHere is how to use this function:\n\n  **i)** If you have CatWISE2020 as the external catalog and want to write W1 and W2 info to your main DataFrame\n\n    col_list = ['W1mag', 'w1snr', 'W2mag', 'w2snr']\n    df_main_updated = crossmatch_astrocat(df_main, df_ext, 'RA1', 'DEC1', 'RA2', 'DEC2', col_list, 2.0)\n\n\n  **ii)** If you want to track the external_source from where the information is read, add a source_id or source_name to col_list:\n\n    col_list = ['WISEname', 'W1mag', 'w1snr', 'W2mag', 'w2snr']\n    df_main_updated = crossmatch_astrocat(df_main, df_ext, 'RA1', 'DEC1', 'RA2', 'DEC2', col_list, 2.0)\n\n### --\u003e cross_panstarrs() function: \n\nThe cross_panstarrs function performs parallel cross-matching with a specified Pan-STARRS catalog using a specified radius. It utilizes ThreadPoolExecutor for efficient processing of HTTP requests in batches.\n\n#### Example:\n\nPython\nimport pandas as pd\nfrom astrocat import cross_panstarrs\n\n- Sample DataFrame with source positions\ndata = {'ra': [123.456, 20.0, 21.1, 22.2], 'dec': [78.901, 19.1, 20.2, 21.3]}\ndf = pd.DataFrame(data)\n\n##### Perform cross-match with dr2/stack catalog and filter specific columns\nresult_df = cross_panstarrs(df, radius=0.001, relevant_columns=['_ra_', '_dec_', 'gPSFMag', 'rPSFMag'], catalog=\"dr2/stack\")\n\nprint(result_df)\n\n##### Perform cross-match with dr2/stack catalog and have all columns from Pan-STARRS\nresult_df = cross_panstarrs(df, radius=0.001, relevant_columns= None, catalog=\"dr2/stack\")\nprint(result_df)\n\n##### Perform cross-match with dr2/stack catalog and have all columns from Pan-STARRS\nresult_df = cross_panstarrs(df, radius=0.001, relevant_columns= \"Default\", catalog=\"dr2/stack\")\nprint(result_df)\n\n\n#### Parameters:\n\n* !df (pandas.DataFrame): Input DataFrame containing source positions: R.A., Dec. (J2000), in degrees.\n    * !The name of the columns must be 'ra' and 'dec' (i.e.: df['ra','dec'] ) \n* radius (float): Search radius in degrees for the cross-match.\n* num_workers (int, optional): Number of worker processes for parallel HTTP requests execution.\n    Defaults to 30. Recommendation: Do not go above 40, otherwise the PanSTARRS server can block your request.\n* relevant_columns (list, optional): List of specific columns to retrieve from Pan-STARRS. If \"Default\",\n    a default list of relevant columns is used. Defaults to \"Default\".\n\n* batch size (int, optional): the number of sources that goes into each HTTP request (each worker) \n* catalog (str, optional): The Pan-STARRS catalog to query. Available options include:\n    - \"dr1/mean\" (DR1 Mean)\n    - \"dr1/stack\" (DR1 Stack)\n    - \"dr2/mean\" (DR2 Mean)  (Default)\n    - \"dr2/stack\" (DR2 Stack)\n    - \"dr2/detection\" (DR2 Detection)\n    - \"dr2/forced_mean\" (DR2 Forced Mean)\n    - check for the latest available catalogs at: https://catalogs.mast.stsci.edu/docs/panstarrs.html \n\n#### Return Value:\n\nA pandas DataFrame containing the cross-matched results with relevant information from Pan-STARRS.\nAdditional Notes:\n\nThis function uses ThreadPoolExecutor for parallel processing.\nIn case of errors, the function currently prints error messages to the console.\nFor further details on the Pan-STARRS catalogs and API, refer to the documentation: https://catalogs.mast.stsci.edu/docs/panstarrs.html\n\n\n\n\n### --\u003e crossmatch_radius() function:\n(still to share)\n\nThis function will help visualise what is the best crossmatch radius to use when combining multi-mission archives. \n\nAlso, it will be possible to estimate the level of contamination based on the trends in number-counts that are associated to real-associations and spurious-associations. \n\n\n### --\u003e fits_to_parquet() function: \nThis function converts .fits and .fit files to .parquet files using the astropy and pandas libraries. The resulting .parquet files are compressed and can be read faster than uncompressed .fits files.\n\n\nUsage examples:\nCall the fits_to_parquet function and pass in the path to the .fits file:\n\npython\n```\n# import library\nimport astrocat\nfrom astrocat.fits_to_parquet import fits_to_parquet\n```\n\n``` \n# convert a .fits file to a .parquet file\nfits_to_parquet('path/to/fits/file.fits')\n```\n\n```\n# convert multiple .fits files to .parquet files\nfits_list = ['path/to/fits/file1.fits', 'path/to/fits/file2.fits', 'path/to/fits/file3.fits']\nfor fits_file in fits_list:\n    fits_to_parquet(fits_file)\n```\n\nThe resulting .parquet file will be saved in the same directory as the input .fits file.\nIf the resulting file_name.praquet already exist, a warning message will be shown.\n\n\n\n### --\u003e csv_to_parquet() function: \nThis function converts .csv files to .parquet files using the astropy and pandas libraries. The resulting .parquet files are compressed and can be read faster than uncompressed .csv files.\n\nUsage examples:\nCall the csv_to_parquet function and pass in the path to the .csv file:\n\npython\n```\n# import library\nimport astrocat\nfrom astrocat.csv_to_parquet import csv_to_parquet\n```\n\n``` \n# convert a .csv file to a .parquet file\ncsv_to_parquet('path/to/csv/file.csv')\n```\n\n```\n# convert multiple .csv files to .parquet files\ncsv_list = ['path/to/csv/file1.csv', 'path/to/csv/file2.csv', 'path/to/csv/file3.csv']\nfor csv_file in csv_list:\n    csv_to_parquet(csv_file)\n```\n\nThe resulting .parquet file will be saved in the same directory as the input .csv file.\nIf the resulting file_name.parquet already exist, a warning message will be shown.\n\n\n## Contributing\n\nContributions are welcome. To contribute, please follow these steps:\n\n1.Fork the repository.\n\n2.Create a new branch.\n\n3.Make your changes and commit them.\n\n4.Push changes to GitHub.\n\n5.Submit a pull request.\n\n\n\n## License\nThis project is licensed under the MIT License - see the [LICENSE.md](LICENSE.md) file for details.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBrunoArsioli%2Fastrocat","html_url":"https://awesome.ecosyste.ms/projects/github.com%2FBrunoArsioli%2Fastrocat","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2FBrunoArsioli%2Fastrocat/lists"}