{"id":27638339,"url":"https://github.com/iffranciscome/msc-datascience-thesis","last_synced_at":"2025-04-23T21:38:29.202Z","repository":{"id":193298160,"uuid":"310755240","full_name":"IFFranciscoME/Msc-DataScience-Thesis","owner":"IFFranciscoME","description":"Genetic Programming and Neural Networks for Financial Predictive Modeling.","archived":false,"fork":false,"pushed_at":"2021-06-01T03:47:30.000Z","size":93470,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"main","last_synced_at":"2024-05-12T05:48:12.170Z","etag":null,"topics":["financial-machine-learning","genetic-algorithms","genetic-programming","sklearn","tensorflow","time-series"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/IFFranciscoME.png","metadata":{"files":{"readme":"README.rst","changelog":"changelog.txt","contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null}},"created_at":"2020-11-07T02:59:53.000Z","updated_at":"2024-03-30T17:23:44.000Z","dependencies_parsed_at":"2023-09-07T15:53:26.447Z","dependency_job_id":null,"html_url":"https://github.com/IFFranciscoME/Msc-DataScience-Thesis","commit_stats":null,"previous_names":["iffranciscome/msc-datascience-thesis"],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IFFranciscoME%2FMsc-DataScience-Thesis","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IFFranciscoME%2FMsc-DataScience-Thesis/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IFFranciscoME%2FMsc-DataScience-Thesis/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/IFFranciscoME%2FMsc-DataScience-Thesis/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/IFFranciscoME","download_url":"https://codeload.github.com/IFFranciscoME/Msc-DataScience-Thesis/tar.gz/refs/heads/main","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":250521528,"owners_count":21444482,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["financial-machine-learning","genetic-algorithms","genetic-programming","sklearn","tensorflow","time-series"],"created_at":"2025-04-23T21:38:24.736Z","updated_at":"2025-04-23T21:38:29.196Z","avatar_url":"https://github.com/IFFranciscoME.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"\n-------------------------------------------------------------------------\nGenetic Programming and Neural Networks for Financial Predictive Modeling\n-------------------------------------------------------------------------\n\n**This is currently a private repository, it contains the whole project, data and execution codes to produce the results that will be used for the Master in Science in Data Science graduate programm. This is only made\npublic in order to apply to the** `Turing-LIDA DSG 2021`_\n\n.. _`Turing-LIDA DSG 2021`:\n   https://lida.leeds.ac.uk/partnerships/lida-partners/the-alan-turing-institute/turing-data-study-group-at-lida/call-for-researcher-participants/?_cldee=ZnJhbmNpc2NvbWVAaXRlc28ubXg%3d\u0026recipientid=contact-7598eb94f040eb11a8130022483ed0bb-ec75aee3b0304fbca8aaab4ed46f6997\u0026esid=1c8c41ef-eebe-eb11-bacc-000d3ad6a2e7\\\n\n--------\nAbstract\n--------\n\nFinancial time series forecasting certainly is the case of a predictive modeling process with many challenges, mainly because the temporal structure of the data. Genetic programming, as a particular variation of genetic algorithms, can be used to as a feature engineering, importance and selection process all at once, it can provide highly interpretable symbolic features that have low colinearity among them and yet high correlation with a target variable. We present the use of such method for generating symbolic features from endogenous linear and autoregressive variables, along with a Multi-Layer Perceptron, to construct a binary predictor for the price of Ethereum cryptocurrency.\n\n--------------------\nMain Functionalities\n--------------------\n\n- Autoregressive Feature Generation (**autoregressive_features**)\n- Hadamard Product for Feature Generation (**hadamard_features**)\n- Genetic Programming for Symbolic Operations for Feature Generation (**symbolic_features**)\n- Timeseries Block Folds without filtration (**t_folds**)\n- Classifier model: Logistic Regression with Elastic Net Regularization (**logistic_net**)\n- Classifier model: L1 Support Vector Machines (**l1_svm**)\n- Classifier model: Artificial Neural Net Multilayer Perceptron (**ann_mlp**)\n- Genetic Algorithms Optimization (**genetic_algo_optimization**)\n- Machine Learning Models Performance Metrics (**model_evaluation**)\n- Plotly visualizations of results (**visualizations.py**)\n\n\n------------------\nRunning on cluster\n------------------\n\nBecause of the extensive explorations, this project was executed in a computing cluster at the ITESO university\n\n|\n\n.. image:: figures/cluster_execution.png\n   :align: center\n\n|\n\nOther progress is for the paper format in order to submit some of the findings to a journal ...\n\n|\n\n.. |logo1| image:: figures/paper_progress_1.png\n   :scale: 50%\n   :align: middle\n.. |logo2| image:: figures/paper_progress_2.png\n   :scale: 50%\n   :align: middle\n.. |logo3| image:: figures/paper_progress_3.png\n   :scale: 50%\n   :align: middle\n.. |logo4| image:: figures/paper_progress_4.png\n   :scale: 50%\n   :align: middle\n\n+---------+---------+\n| |logo1| | |logo2| |\n+---------+---------+\n| |logo3| | |logo4| |\n+---------+---------+\n\n\nLicense\n-------\n\nPrivate. This content is intendedn only for the application to the `Turing-LIDA DSG 20212`_\n\n.. _`Turing-LIDA DSG 20212`:\n   https://lida.leeds.ac.uk/partnerships/lida-partners/the-alan-turing-institute/turing-data-study-group-at-lida/call-for-researcher-participants/?_cldee=ZnJhbmNpc2NvbWVAaXRlc28ubXg%3d\u0026recipientid=contact-7598eb94f040eb11a8130022483ed0bb-ec75aee3b0304fbca8aaab4ed46f6997\u0026esid=1c8c41ef-eebe-eb11-bacc-000d3ad6a2e7\\\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiffranciscome%2Fmsc-datascience-thesis","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fiffranciscome%2Fmsc-datascience-thesis","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fiffranciscome%2Fmsc-datascience-thesis/lists"}