https://github.com/captainirs/transfinitte

Last synced: 8 months ago
JSON representation

Host: GitHub
URL: https://github.com/captainirs/transfinitte
Owner: CaptainIRS
Created: 2022-10-14T22:44:02.000Z (over 3 years ago)
Default Branch: main
Last Pushed: 2022-11-09T20:11:31.000Z (over 3 years ago)
Last Synced: 2025-03-01T08:12:49.797Z (over 1 year ago)
Language: Jupyter Notebook
Homepage: https://family-tree.captainirs.dev/
Size: 1.28 MB
Stars: 0
Watchers: 1
Forks: 0
Open Issues: 0
Metadata Files:
- Readme: README.md

Awesome Lists containing this project

README

# Family tree builder for TransfiNITTe'22

Submission for TransfiNITTe'22 hackathon under the problem statement "Family Tree" by BharatX

## Problem Statement

Creatre a Family Tree of all families in India (Pan India) using Electoral Roll Pdf available at nsvp.in. You are free to use any technology or any hack to crack the problem statement. [Link](https://quartz-artichoke-67d.notion.site/Hackathon-Problem-Statement-7f6ebf8bbc694cd18c355eb9433d1197) to elaborated problem statement.

## Solution

We have developed an approach to dynamically generate the family tree:

* Query the electoral search website with user's details.
* Get the user's polling booth from the details.
* Download the electoral roll PDF for that booth from the website of that state's Chief Electoral Officer.
* Since the PDFs contain the electoral roll as image data, we use Optical Character Recognition (OCR) to extract the text from the PDF.
* This text is parsed to generate a JSON file with all the people's details, which are cached in the local filesystem for faster access.
* This JSON file is then used to generate the family tree of a person or a constituency.
* The family trees are generated by finding ancestors and spouses from this data for people belonging to the same house.
* The trees can be visualised using our web application.

## ETL pipelines
ETL, which stands for extract, transform and load, is a data integration process that combines data from multiple data sources into a single, consistent data store that is loaded into a data warehouse or other target system.

For our project, we have used the following ETL pipeline:
* Extract: We extract the data from the electoral roll PDFs. This is done for state of Tamil Nadu. This code is present in the `spark` folder `extract_tn.ipynb`.
* Transform: We transform the extracted data using the `transform.ipynb` notebook in the `spark` folder. This notebook generates the JSON file for the constituency.
* Load: We load the JSON file into the local filesystem for the APIs to use.

## NEO4J

Features:
* Visualise the family tree of a person or a constituency.
* Query the database to find the relationships between people.

#### Example Queries
* Find the spouse of a person
```
MATCH (n:Person {name: "S. S. Rajendran"})-[:SPOUSE]->(spouse) RETURN spouse
```
* Find the children of a person
```
MATCH (n:Person {name: "S. S. Rajendran"})-[:CHILD]->(child) RETURN child
```
* Find the person in house number 1
```
MATCH (n:Person {house_number: "1"}) RETURN n
```

## Tech Stack

![tech-stack-image](https://i.imgur.com/IrmcFe8.png)

## Deployment

The web application is hosted [here](https://family-tree.captainirs.dev).
The backend API is hosted [here](https://api.family-tree.captainirs.dev).

## API Information

[Postman collection link](https://documenter.getpostman.com/view/5489887/2s84Dmy4Vq)
[Swagger documentation link](https://api.family-tree.captainirs.dev/docs)

* GET /state_list - Returns a list of states and its state codes

* GET /district_list - Returns a list of districts for a state and its district number

Query Parameters:
* state_no - State code for which the districts are to be fetched

* GET /assembly_list - Returns a list of of assembly constituencies for a district and its assembly number

Query Parameters:
* state_no - State code for which the districts are to be fetched
* dist_no - District number for which the assembly constituencies are to be fetched

* POST /tree - Returns a family tree for a person

Query Parameters:
* name - Name of the person
* relative_name - Name of the relative
* dob - Date of birth of the person in YYYY-MM-DD
* state - State code of the state in which the person resides
* gender (Optional param) - Gender of the person: M - Male, F - Female, O - Other
* district (Optional param) - District number of the district in which the person resides
* ac (Optional param) - Assembly constituency number of the constituency in which the person resides

* POST /trees - Returns all family trees for a polling booth

Query Parameters:
* state - State code of the state in which the constituency resides
* district - District number of the district in which the constituency resides
* ac - Assembly constituency number of the constituency
* part_no - Part number of the polling booth

## Local Docker deployment instructions

### Backend

* Go to `backend` directory
* Copy the `.env.example` file to `.env` and fill in the required values
* Run `git submodule update --init --recursive`
* Go to `./indic-trans`
* Run `pip install -r requirements.txt`. It is required for the below command.
* Run `python setup.py sdist`

### Frontend

* Change BACKEND_URL const to your backend URL in `frontend/src/config.js`

### Spark

* Go to `spark` directory
* Run `build.sh` to build the docker images

Run `docker-compose up -d` in the root directory to start the application.

> **_NOTE:_** Some of the states have ratelimiting on their electoral roll PDFs or have blacklisted the IP address of the cloud server. In such cases, the PDFs cannot be downloaded. The application will still work for other states.

## Screenshots

#### Family tree search for a single person

![single_person](./screenshots/single_person.png)
![family_tree](./screenshots/family_tree.png)

#### Family tree search for a polling booth

![polling_booth](./screenshots/polling_booth.png)
![family_trees](./screenshots/family_trees.png)

#### NEO4J

![neo4j](./screenshots/neo4j.png)

#### Swagger
![swagger](./screenshots/swagger.jpg)

#### Spark
![spark](./screenshots/spark.jpg)
![master](./screenshots/master.jpg)
![worker](./screenshots/workers.jpg)

ecosyste.ms

Data

Tools

Indexes

Applications

Experiments

Awesome

https://github.com/captainirs/transfinitte

Awesome Lists containing this project

README