https://github.com/captainirs/transfinitte
https://github.com/captainirs/transfinitte
Last synced: 8 months ago
JSON representation
- Host: GitHub
- URL: https://github.com/captainirs/transfinitte
- Owner: CaptainIRS
- Created: 2022-10-14T22:44:02.000Z (over 3 years ago)
- Default Branch: main
- Last Pushed: 2022-11-09T20:11:31.000Z (over 3 years ago)
- Last Synced: 2025-03-01T08:12:49.797Z (over 1 year ago)
- Language: Jupyter Notebook
- Homepage: https://family-tree.captainirs.dev/
- Size: 1.28 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Family tree builder for TransfiNITTe'22
Submission for TransfiNITTe'22 hackathon under the problem statement "Family Tree" by BharatX
## Problem Statement
Creatre a Family Tree of all families in India (Pan India) using Electoral Roll Pdf available at nsvp.in. You are free to use any technology or any hack to crack the problem statement. [Link](https://quartz-artichoke-67d.notion.site/Hackathon-Problem-Statement-7f6ebf8bbc694cd18c355eb9433d1197) to elaborated problem statement.
## Solution
We have developed an approach to dynamically generate the family tree:
* Query the electoral search website with user's details.
* Get the user's polling booth from the details.
* Download the electoral roll PDF for that booth from the website of that state's Chief Electoral Officer.
* Since the PDFs contain the electoral roll as image data, we use Optical Character Recognition (OCR) to extract the text from the PDF.
* This text is parsed to generate a JSON file with all the people's details, which are cached in the local filesystem for faster access.
* This JSON file is then used to generate the family tree of a person or a constituency.
* The family trees are generated by finding ancestors and spouses from this data for people belonging to the same house.
* The trees can be visualised using our web application.
## ETL pipelines
ETL, which stands for extract, transform and load, is a data integration process that combines data from multiple data sources into a single, consistent data store that is loaded into a data warehouse or other target system.
For our project, we have used the following ETL pipeline:
* Extract: We extract the data from the electoral roll PDFs. This is done for state of Tamil Nadu. This code is present in the `spark` folder `extract_tn.ipynb`.
* Transform: We transform the extracted data using the `transform.ipynb` notebook in the `spark` folder. This notebook generates the JSON file for the constituency.
* Load: We load the JSON file into the local filesystem for the APIs to use.
## NEO4J
Features:
* Visualise the family tree of a person or a constituency.
* Query the database to find the relationships between people.
#### Example Queries
* Find the spouse of a person
```
MATCH (n:Person {name: "S. S. Rajendran"})-[:SPOUSE]->(spouse) RETURN spouse
```
* Find the children of a person
```
MATCH (n:Person {name: "S. S. Rajendran"})-[:CHILD]->(child) RETURN child
```
* Find the person in house number 1
```
MATCH (n:Person {house_number: "1"}) RETURN n
```
## Tech Stack

## Deployment
The web application is hosted [here](https://family-tree.captainirs.dev).
The backend API is hosted [here](https://api.family-tree.captainirs.dev).
## API Information
[Postman collection link](https://documenter.getpostman.com/view/5489887/2s84Dmy4Vq)
[Swagger documentation link](https://api.family-tree.captainirs.dev/docs)
* GET /state_list - Returns a list of states and its state codes
* GET /district_list - Returns a list of districts for a state and its district number
Query Parameters:
* state_no - State code for which the districts are to be fetched
* GET /assembly_list - Returns a list of of assembly constituencies for a district and its assembly number
Query Parameters:
* state_no - State code for which the districts are to be fetched
* dist_no - District number for which the assembly constituencies are to be fetched
* POST /tree - Returns a family tree for a person
Query Parameters:
* name - Name of the person
* relative_name - Name of the relative
* dob - Date of birth of the person in YYYY-MM-DD
* state - State code of the state in which the person resides
* gender (Optional param) - Gender of the person: M - Male, F - Female, O - Other
* district (Optional param) - District number of the district in which the person resides
* ac (Optional param) - Assembly constituency number of the constituency in which the person resides
* POST /trees - Returns all family trees for a polling booth
Query Parameters:
* state - State code of the state in which the constituency resides
* district - District number of the district in which the constituency resides
* ac - Assembly constituency number of the constituency
* part_no - Part number of the polling booth
## Local Docker deployment instructions
### Backend
* Go to `backend` directory
* Copy the `.env.example` file to `.env` and fill in the required values
* Run `git submodule update --init --recursive`
* Go to `./indic-trans`
* Run `pip install -r requirements.txt`. It is required for the below command.
* Run `python setup.py sdist`
### Frontend
* Change BACKEND_URL const to your backend URL in `frontend/src/config.js`
### Spark
* Go to `spark` directory
* Run `build.sh` to build the docker images
Run `docker-compose up -d` in the root directory to start the application.
> **_NOTE:_** Some of the states have ratelimiting on their electoral roll PDFs or have blacklisted the IP address of the cloud server. In such cases, the PDFs cannot be downloaded. The application will still work for other states.
## Screenshots
#### Family tree search for a single person


#### Family tree search for a polling booth


#### NEO4J

#### Swagger

#### Spark


