{"id":24484501,"url":"https://github.com/dariodip/rfd-discovery","last_synced_at":"2025-09-08T03:38:11.262Z","repository":{"id":53732205,"uuid":"76447539","full_name":"dariodip/rfd-discovery","owner":"dariodip","description":"This project, written in Python and Cython, deals with Discovery of Relaxed Functional Dependencies(RFDs) using a bottom-up approach.","archived":false,"fork":false,"pushed_at":"2021-03-17T11:55:09.000Z","size":3588,"stargazers_count":8,"open_issues_count":1,"forks_count":5,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-04-13T18:09:09.857Z","etag":null,"topics":["artificial-intelligence","cython","data-science","python","python-3","university-project"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/dariodip.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null}},"created_at":"2016-12-14T10:02:29.000Z","updated_at":"2023-08-02T10:15:10.000Z","dependencies_parsed_at":"2022-09-12T21:41:01.278Z","dependency_job_id":null,"html_url":"https://github.com/dariodip/rfd-discovery","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariodip%2Frfd-discovery","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariodip%2Frfd-discovery/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariodip%2Frfd-discovery/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/dariodip%2Frfd-discovery/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/dariodip","download_url":"https://codeload.github.com/dariodip/rfd-discovery/tar.gz/refs/heads/master","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":248758421,"owners_count":21156957,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["artificial-intelligence","cython","data-science","python","python-3","university-project"],"created_at":"2025-01-21T13:14:57.769Z","updated_at":"2025-04-13T18:09:16.118Z","avatar_url":"https://github.com/dariodip.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# **rfd-discovery**\n\n[![Build Status](https://travis-ci.org/dariodip/rfd-discovery.svg?branch=master)](https://travis-ci.org/dariodip/rfd-discovery)\n\n\n###### By\n - [Altamura Antonio](https://www.linkedin.com/in/antonio-altamura-26ab85136/en)\n - [Tomeo Mattia](https://www.linkedin.com/in/mattia-tomeo-b71aa6130/en)\n - [Di Pasquale Dario](https://it.linkedin.com/in/dario-di-pasquale)\n\n## Description\nThis project, written in Python and Cython, deals with Discovery of Relaxed Functional Dependencies(RFDs)\n[[1](http://hdl.handle.net/11386/4658456)] using a bottom-up approach:\ninstead of giving a fixed threshold on input and then finding all the RDFs, this method infers distances from different RHS\n attributes by itself and then discovers the RFDs for these ones.\n \n rfd-discovery takes a dataset, representing a table of a relational database, in CSV format as input and prints the set\n of the discovered RFDs. \n \n CSV file can contain the following formats:\n  - int; \u003cbr\u003e\n  - int32; \u003cbr\u003e\n  - int64; \u003cbr\u003e\n  - float; \u003cbr\u003e\n  - float64; \u003cbr\u003e\n  - string; \u003cbr\u003e\n  - datetime64*. \n  \n  *for date format you can use one of the formats known by [pandas](http://pandas.pydata.org/pandas-docs/stable/timeseries.html)\n   \u003cbr\u003e\n  \n\n***Index:***\n - [Requirements](#requirements)\n - [Setup rfd-discovery](#setup)\n - [Build](#build)\n - [Usage](#usage)\n \n## Requirements\nrfd-discovery is developed using **[Python 3.5](http://www.python.it/)**, a C compiler ([gcc](https://gcc.gnu.org/) or [Visual Studio C++](https://www.visualstudio.com/vs/cplusplus/)) and [Cython 0.25.2](http://cython.org/),\n the latter is used to improve time and memory consuming in CPU-bound operations. \n \n For running rdf-discovery correctly, you have to install **Python 3.5** and **Cython 0.25**.\n For installing correctly all the requirements you have to install **pip 9.0** (or high).\n \n rdf-discovery use the following Python's libraries:\u003cbr\u003e\n    *[matplotlib✛](http://matplotlib.org/)*\u003cbr\u003e\n    *[numpy✛](http://www.numpy.org/)* \u003cbr\u003e\n    *[pandas✛](http://pandas.pydata.org/)* \u003cbr\u003e\n    *[tornado](http://www.tornadoweb.org/en/stable/)* \u003cbr\u003e\n    *[Cython](http://cython.org/)* \u003cbr\u003e\n    *[nltk](http://www.nltk.org/)* \u003cbr\u003e\n    *[flask](http://flask.pocoo.org/)* \u003cbr\u003e\n    \n   You can install these by following the [Setup Section](#setup).\n\n✛these libraries are part of [SciPy stack](https://www.scipy.org/index.html) \n## Setup\nIn order to install rfd-discovery and all his requirements, you have to create a virtual environment using [venv](https://virtualenv.pypa.io/en/stable/) on Python 3.5.\nTo install *venv*, run the following:\n\n`[sudo] pip3 install virtualenv` on Linux/macOS\nor\n`pip install virtualenv` using the prompt as the administrator on Windows.\n\nTo create a virtual environment, in the main directory of the project run:\n\n`virtualenv venv`.\n\nTo activate the virtual environment, in the main directory on the project run:\n\n`source venv/bin/activate` on Linux/MacOS\nor\n`venv\\Scripts\\activate` on Windows.\n\nYou can check if the virtual environment is activated, checking if the command prompt has the prefix `(venv)`.\n\nTo install all the requirements, run the following:\n\n`pip install -r requirements.txt`\n\nThis should install, using [pip](https://pypi.python.org/pypi/pip), all the [requirements](#requirements). \n\nTo install WordNet, run:\n\n`python setup.py install`.\n\n## Build\n\nPart of rfd-discovery is written using *Cython*, a superset of the Python programming language, designed to give C-like \nperformance with code which is mostly written in Python. This because operations that take place in the code are mostly\nCPU bound, wasting computation and memory resources. \u003cbr\u003e You can compile Cython code running the following:\n\n`python build.py build_ext --inplace`\n\nthis will generate C code from Cython code and will try to compile it. \u003cbr\u003e\n\n** Note that you'll need gcc or other C compiler  **\n\nIf building phase ends without errors, you should have some *.c* and *.pyd* (or *.so*, depending by your OS) files. Don't\n worry about dealing with these, Python does it automatically **:)**.\n\n\n## Usage\n\nUsing rdf-discovery is easy enough. Just run the following command:\n\n`python3 main.py -c \u003ccsv-file\u003e [options]`\n\n - *`-c \u003cyour-csv\u003e`*: is the path of the dataset on which you want to discover RFDs;\n\n\nOptions:\n - *`-v`* : display the version number;\n - *`-s \u003csep\u003e`*: the separation char used in your CSV file. If you don't provide this, rfd-discovery tries to infer\n it for you;\n - *`-h`*: Indicates that the CSV file has the header row. If you don't provide this, rdf-discovery tries to infer it for you.\n - *`-r \u003crhs_index\u003e`*: is the column number of the RHS attribute. It must be a valid integer. You can avoid specifying it only if you don't specify LHS attributes (it will find RFDs using each attribute as RHS and the remaining as LHS);\n - *`-l \u003clhs_index_1, lhs_index_2, ...,lhs_index_k\u003e`*: column indexes of LHS attributes separated by commas \n (e.g. *1,2,3*). You can avoid specifying them: \u003cbr\u003e \n  if you don't specify the index for RHS attribute it will find RFDs using each attribute as RHS and the remaining as LHS; \u003cbr\u003e\n  if you specify a valid RHS index it will assume your LHS as the remaining attributes;\n - *`-i \u003cindex_col\u003e`*: the column which contains the primary key of the dataset. Specifying it, the program will not \n calculate distance on it. **NOTE: index column should contain unique values**;\n - *`-d \u003cdatetime columns\u003e`*: a list of columns, separated by commas, which values are in datetime format;\n  Specifying this, rfd-discovery can depict distance between two date in time format (e.g. ms, sec, min);\n - *`--semantic`*: use semantic distance on Wordnet for string;\n For more info [here.](http://www.cs.toronto.edu/pub/gh/Budanitsky+Hirst-2001.pdf)\n - *`--human`*: print the RFDs to the standard output in a human-readable form;\n - *`--help`*: show help.\n \n \n ##### Valid Examples:\n ###### Check on each combination of attributes:\n  `python main.py -c resources/dataset.csv`\n  ###### Infer LHS attributes given a fixed RHS' attribute index:\n  `python main.py -c resources/dataset.csv -r 0`\n ###### RHS and LHS fixed, separator and header line specified: \n `python main.py -c resources/dataset.csv -r 0 -l 1,2,3 -s , -h 0`\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdariodip%2Frfd-discovery","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Fdariodip%2Frfd-discovery","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Fdariodip%2Frfd-discovery/lists"}