{"id":19746347,"url":"https://github.com/astrolabsoftware/fink-science-perf","last_synced_at":"2026-06-13T15:34:35.851Z","repository":{"id":248820804,"uuid":"829839381","full_name":"astrolabsoftware/fink-science-perf","owner":"astrolabsoftware","description":"Profiling and performance checks for fink-science modules","archived":false,"fork":false,"pushed_at":"2026-03-09T11:55:31.000Z","size":505,"stargazers_count":2,"open_issues_count":5,"forks_count":0,"subscribers_count":3,"default_branch":"main","last_synced_at":"2026-03-09T14:55:09.058Z","etag":null,"topics":[],"latest_commit_sha":null,"homepage":null,"language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":"apache-2.0","status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/astrolabsoftware.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":"LICENSE","code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null,"zenodo":null,"notice":null,"maintainers":null,"copyright":null,"agents":null,"dco":null,"cla":null}},"created_at":"2024-07-17T05:23:38.000Z","updated_at":"2026-03-09T11:55:35.000Z","dependencies_parsed_at":"2025-01-30T16:21:05.603Z","dependency_job_id":"e8b7011d-4b97-42a1-84c8-f698185e3f50","html_url":"https://github.com/astrolabsoftware/fink-science-perf","commit_stats":null,"previous_names":["astrolabsoftware/fink-science-perf"],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/astrolabsoftware/fink-science-perf","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Ffink-science-perf","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Ffink-science-perf/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Ffink-science-perf/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Ffink-science-perf/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/astrolabsoftware","download_url":"https://codeload.github.com/astrolabsoftware/fink-science-perf/tar.gz/refs/heads/main","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/astrolabsoftware%2Ffink-science-perf/sbom","scorecard":null,"host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":286080680,"owners_count":34290346,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2026-05-26T15:22:16.424Z","status":"online","status_checked_at":"2026-06-13T02:00:06.617Z","response_time":62,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":[],"created_at":"2024-11-12T02:14:20.769Z","updated_at":"2026-06-13T15:34:35.845Z","avatar_url":"https://github.com/astrolabsoftware.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Profiling \u0026 performance for fink-science\n\nThis repository contains scripts to perform the profiling and performance checks of [fink-science](https://github.com/astrolabsoftware/fink-science) modules.\n\n## Manual profiling\n\nFire a docker container with all Fink dependencies installed (replace `ztf` with `rubin` if you need to test on LSST):\n\n```bash\n# 3GB compressed\ndocker pull gitlab-registry.in2p3.fr/astrolabsoftware/fink/fink-deps-sentinel-ztf:latest\n\n# Launch container from the fink-science-perf directory.\n# Use -s to mount a local fink-science checkout (overrides the pip-installed version).\n# Use -m to mount additional packages (colon-separated list of paths).\n./run_container.sh \\\n  -s $HOME/src/github.com/astrolabsoftware/fink-science \\\n  -m $HOME/src/github.com/emilleishida/fink_sn_activelearning\n```\n\nThe script mounts `fink-science-perf` automatically, sets `PYTHONPATH`, and pip-installs\nmounted packages in editable mode. Once inside the container, navigate to the workspace:\n\n```bash\ncd /workspace/fink-science-perf\n```\n\n### Data\n\nUse the [Data Transfer](https://fink-portal.org/download) service to get tailored data for your test. Make sure you have an account to use the [fink-client](https://github.com/astrolabsoftware/fink-client). Install it\nand register your credentials on the container:\n\n```bash\n# Install the client\npip install fink-client\n\n# register using your credentials\nfink_client_register -survey lsst ... \n```\n\nTrigger a job on the Data Transfer service and download data in your container:\n\n```bash\n# Change accordingly\nTOPIC=ftransfer_ztf_2024-07-16_682277\n\nmkdir -p /data/$TOPIC\nfink_datatransfer \\\n            -topic $TOPIC \\\n            -outdir /data/$TOPIC \\\n            -partitionby finkclass \\\n            --verbose\n```\n\n### Profiling a new PR in fink-science\n\nIn case a user opens a new PR in fink-science and you want to profile the new code, you first need to\nremove the fink-science dependency in the container:\n\n```bash\npip uninstall fink-science\n```\n\nand clone the targeted version:\n\n```bash\ngit clone https://github.com/astrolabsoftware/fink-science.git\ncd fink-science\nexport PYTHONPATH=$PYTHONPATH:$PWD\n\n# in case you need a specific fork\ngit checkout -b \u003cbranch_name\u003e master\ngit pull https://github.com/erusseil/fink-science.git master\n```\n\nIn case the code is not instrumented, add necessary decorators:\n\n```python\nfrom line_profiler import profile\n\n@profile\ndef the_function_that_needs_to_be_profiled(...)\n```\n\nand install the code:\n\n```bash\n# in fink-science\npip install .\n```\n\nand finally update the list of science modules in [ztf/science_modules.py](ztf/science_modules.py):\n\n```diff\n@@ -98,13 +96,21 @@ def load_ztf_modules(module_name=\"\") -\u003e dict:\n             'cols': ['cjd', 'cfid', 'cmagpsf', 'csigmapsf', 'cdsxmatch', F.col('candidate.ndethist')],\n             'type': 'ml',\n             'colname': 'rf_snia_vs_nonia'\n+        },\n+        {\n+            'My New module': {\n+                'processor': name_of_the_function_in_fink_science,\n+                'cols': ['list', 'of', 'required', 'columns'],\n+                'type': 'xmatch or ml or feature',\n+                'colname': 'the name of the new column'\n+            }\n         }\n     }\n```\n\nand profile your code with:\n\n```bash\n# Change arguments accordingly\n./bench_module.sh -s ztf -n 'My New module' -d /data/$TOPIC\n```\n\nDependending on how many decorators you put in the code,\nyou will see a more or less details in the form. For example:\n\n```python\nFile: /home/libs/fink-science/fink_science/hostless_detection/run_pipeline.py\nFunction: process_candidate_fink at line 31\n\nLine #      Hits         Time  Per Hit   % Time  Line Contents\n==============================================================\n    31                                               @profile\n    32                                               def process_candidate_fink(self, science_stamp: Dict,\n    33                                                                          template_stamp: Dict, objectId: str):\n    34                                                   \"\"\"\n    35                                                   Processes each candidate\n    36                                           \n    37                                                   Parameters\n    38                                                   ----------\n    39                                                   science_stamp\n    40                                                      science stamp data\n    41                                                   template_stamp\n    42                                                      template stamp data\n    43                                                   \"\"\"\n    44      1000    1217571.0   1217.6      1.8          science_stamp = read_bytes_image(science_stamp)\n    45      1000    1026124.8   1026.1      1.5          template_stamp = read_bytes_image(template_stamp)\n    46      1000       1110.8      1.1      0.0          if not ((science_stamp.shape == (63, 63)) and (template_stamp.shape == (63, 63))):\n    47        15        288.4     19.2      0.0              print(objectId, science_stamp.shape, template_stamp.shape)\n    48        15          5.1      0.3      0.0              return -99\n    49       985        571.2      0.6      0.0          science_stamp_clipped, template_stamp_clipped = (\n    50       985    4539642.4   4608.8      6.8              self._run_sigma_clipping(science_stamp, template_stamp))\n    51      1970    2558995.9   1299.0      3.8          is_hostless_candidate = run_hostless_detection_with_clipped_data(\n    52       985        208.7      0.2      0.0              science_stamp_clipped, template_stamp_clipped,\n    53       985        541.1      0.5      0.0              self.configs, self._image_shape)\n    54       985        418.2      0.4      0.0          if is_hostless_candidate:\n    55        30   57607331.3    2e+06     86.0              power_spectrum_results = run_powerspectrum_analysis(\n    56        15          3.8      0.3      0.0                  science_stamp, template_stamp,\n    57        15         80.6      5.4      0.0                  science_stamp_clipped.mask.astype(int),\n    58        15         55.5      3.7      0.0                  template_stamp_clipped.mask.astype(int), self._image_shape)\n    59        15          8.4      0.6      0.0              return power_spectrum_results[\"kstest_SCIENCE_15_statistic\"]\n    60       970        449.2      0.5      0.0          return -99\n```\n\nWhat matters first is the column `% Time` which indicates the percentage of time\nspent per call. In this example above, 86% is spent in calling `run_powerspectrum_analysis`\nwhich would be the target to optimize if we want to improve the performances.\n\nAnother important column is `Hits`, that is the number of time an instruction has been done.\nIn this example, we started with 1,000 alerts, and the first lines were called 1,000 times.\nBut then we have a branch (`if is_hostless_candidate:`), and the costly function is \nactually only called 30 times.\n\n### SSOFT\n\nThe case of the SSOFT is different from the rest, as it does not uses raw alerts as input. Instead, profile it using the test data in `fink-science` (`fink-science/fink_science/data/alerts/sso_aggregated_2024.09_test_sample.parquet`).\n\nNote that the SSOFT is not profiled with other modules when you profile all science modules. It can only be tested individually.\n\n## Performance checks\n\n### Timing science modules\n\nTo launch the performance test, you can use the `ztf/perf_science_modules.py` script. It will launch a series of Spark jobs to time each science module.\n\n```bash\npython perf_science_modules.py -h\nusage: perf_science_modules.py [-h] [-night NIGHT] [-total_memory TOTAL_MEMORY]\n                               [-gb_per_executor GB_PER_EXECUTOR]\n                               [-core_per_executor CORE_PER_EXECUTOR]\n                               [-nloops NLOOPS]\n\nScience modules performance using Apache Spark for ZTF\n\noptional arguments:\n  -h, --help            show this help message and exit\n  -night NIGHT          Night in the form YYYYMMDD. Default is 20240716 (200k\n                        alerts)\n  -total_memory TOTAL_MEMORY\n                        Total RAM for the job in GB. Default is 16GB.\n  -gb_per_executor GB_PER_EXECUTOR\n                        Total RAM per executor. Default is 2GB.\n  -core_per_executor CORE_PER_EXECUTOR\n                        Number of core per executor. Default is 1.\n  -nloops NLOOPS        Number of times to run the performance test. Default is 2.\n```\n\nNote that it makes little sense to make performance tests from within a single container, and this script assumes you are on the Fink Apache Spark cluster at VirtualData. Edit `perf_science_modules.py` and `utils.load_spark_session` with your correct master URI, path to the data and mesos configuration. By default the script will run performance test using 8 cores with 2GB RAM each on the ZTF night 20240416 (212,039 alerts).\n\nHere is the result for `fink-science==5.9.0` for ZTF alert data:\n\n![perf_ztf](ztf/static/perfs_5.9.0.png)\n\n\nNote that if we profile directly the functions without Spark (see `ztf/co2_science_modules.py`), we obtain the same behaviour, but we about x10 speed-up. We need to investigate.\n\n### Inferring CO2eq emissions\n\nBased on [codecarbon](https://github.com/mlco2/codecarbon), we try to gain some insights regarding the CO2eq emissions linked to Fink operations. There are many caveats here, and one should be cautious with the results. In all results below, we use the codecarbon converter to transform energy measurement into kgCO2eq emissions, and corresponding rate. All operations are done on the spark-master @ VirtualData:\n\n```\n[codecarbon INFO @ 16:00:45]   Platform system: Linux-3.10.0-1160.36.2.el7.x86_64-x86_64-with-glibc2.17\n[codecarbon INFO @ 16:00:45]   Python version: 3.9.13\n[codecarbon INFO @ 16:00:45]   CodeCarbon version: 2.5.0\n[codecarbon INFO @ 16:00:45]   Available RAM : 35.196 GB\n[codecarbon INFO @ 16:00:45]   CPU count: 18\n[codecarbon INFO @ 16:00:45]   CPU model: AMD EPYC 7702 64-Core Processor\n[codecarbon INFO @ 16:00:45]   GPU count: None\n[codecarbon INFO @ 16:00:45]   GPU model: None\n```\n\nand we execute the tests using:\n\n```bash\ntaskset --cpu-list 0 python ztf/co2_science_modules.py\n```\n\n#### Default parameters\n\nFirst we assume a PUE of the center of 1.25, and leave the rest of the parameters as default. The program recognises that we are in France, and it applies the energy mix accordingly. However, we could not access the CPU tracking mode, and a constant consumption mode was used instead (AMD EPYC 7702 64-Core Processor). \n\nWe do two measurements: (A) the measure of the energy consumed during the execution of a science module (on 200k alerts), and (B) another measurement lasting the same amount of time with nothing particular running from our side. The two measurements are done one after each other, and we assume that the state of the server remains the same (no external jobs launched).\n\n```bash\n[profiling  INFO @ 03:59:56] Early SN Ia\n# (A)\n[codecarbon INFO @ 15:59:51] Energy consumed for RAM : 0.000001 kWh. RAM Power : 0.9447441101074219 W\n[codecarbon INFO @ 15:59:51] Energy consumed for all CPUs : 0.000090 kWh. Total CPU Power : 100.0 W\n[codecarbon INFO @ 15:59:51] 0.000091 kWh of electricity used since the beginning.\n...\n# (B)\n[codecarbon INFO @ 15:59:54] Energy consumed for RAM : 0.000001 kWh. RAM Power : 0.9447441101074219 W\n[codecarbon INFO @ 15:59:54] Energy consumed for all CPUs : 0.000091 kWh. Total CPU Power : 100.0 W\n[codecarbon INFO @ 15:59:54] 0.000092 kWh of electricity used since the beginning.\n...\n[profiling  INFO @ 04:00:23] RAW: 31.27 # (A) in kgCO2eq/year\n[profiling  INFO @ 04:00:23] BAS: 31.27 # (B) in kgCO2eq/year\n[profiling  INFO @ 04:00:23] DIF: -0.00 # difference\n```\n\nOn average, the emission rate (B) is around 30 kgCO2eq/year (again, one should not take this at face value, this depends on all the assumptions entered in the measurement), and for all science modules, the energy consumed in (A) is identical to the one in (B) (to the precision of the measurement). So either our science module is not consuming much compared to the electricty required to just keep the machine alive, or we do something wrong (e.g. the run is not long enough, some assumptions are wrong, etc.)\n\n#### What is consuming the most: cpu, RAM, or just the machine itself?  \n\nTODO: make measurements on cpu-bound and IO\n\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrolabsoftware%2Ffink-science-perf","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fastrolabsoftware%2Ffink-science-perf","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fastrolabsoftware%2Ffink-science-perf/lists"}