{"id":25670117,"url":"https://github.com/oya163/corteva","last_synced_at":"2025-07-25T07:34:16.518Z","repository":{"id":87687550,"uuid":"582210838","full_name":"oya163/corteva","owner":"oya163","description":"Corteva Data Ingestion Pipeline","archived":false,"fork":false,"pushed_at":"2023-01-30T04:05:30.000Z","size":9816,"stargazers_count":1,"open_issues_count":0,"forks_count":0,"subscribers_count":1,"default_branch":"master","last_synced_at":"2025-02-24T11:37:09.536Z","etag":null,"topics":["corteva","data","engineering","etl"],"latest_commit_sha":null,"homepage":"","language":"Python","has_issues":true,"has_wiki":null,"has_pages":null,"mirror_url":null,"source_name":null,"license":null,"status":null,"scm":"git","pull_requests_enabled":true,"icon_url":"https://github.com/oya163.png","metadata":{"files":{"readme":"README.md","changelog":null,"contributing":null,"funding":null,"license":null,"code_of_conduct":null,"threat_model":null,"audit":null,"citation":null,"codeowners":null,"security":null,"support":null,"governance":null,"roadmap":null,"authors":null,"dei":null,"publiccode":null,"codemeta":null}},"created_at":"2022-12-26T05:07:20.000Z","updated_at":"2023-02-02T17:52:33.000Z","dependencies_parsed_at":null,"dependency_job_id":"f2a2c0ab-8ad5-4282-ae34-581e634a622e","html_url":"https://github.com/oya163/corteva","commit_stats":null,"previous_names":[],"tags_count":0,"template":false,"template_full_name":null,"purl":"pkg:github/oya163/corteva","repository_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oya163%2Fcorteva","tags_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oya163%2Fcorteva/tags","releases_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oya163%2Fcorteva/releases","manifests_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oya163%2Fcorteva/manifests","owner_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners/oya163","download_url":"https://codeload.github.com/oya163/corteva/tar.gz/refs/heads/master","sbom_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories/oya163%2Fcorteva/sbom","host":{"name":"GitHub","url":"https://github.com","kind":"github","repositories_count":266973463,"owners_count":24014699,"icon_url":"https://github.com/github.png","version":null,"created_at":"2022-05-30T11:31:42.601Z","updated_at":"2022-07-04T15:15:14.044Z","status":"online","status_checked_at":"2025-07-25T02:00:09.625Z","response_time":70,"last_error":null,"robots_txt_status":"success","robots_txt_updated_at":"2025-07-24T06:49:26.215Z","robots_txt_url":"https://github.com/robots.txt","online":true,"can_crawl_api":true,"host_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub","repositories_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repositories","repository_names_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/repository_names","owners_url":"https://repos.ecosyste.ms/api/v1/hosts/GitHub/owners"}},"keywords":["corteva","data","engineering","etl"],"created_at":"2025-02-24T11:29:36.207Z","updated_at":"2025-07-25T07:34:16.503Z","avatar_url":"https://github.com/oya163.png","language":"Python","funding_links":[],"categories":[],"sub_categories":[],"readme":"# Corteva Coding Assignment\n\nThis is a simple ETL pipeline project\n\n## Tech Stack\n```\nBackend - Django/Django Rest Framework\nDatabase - PostgreSQL\n```\n\n## Installation\n\n### Install postgresql\n```\n$ sudo apt install postgresql postgresql-contrib\n$ sudo -u postgres createuser \u003cusername\u003e\n$ sudo -u postgres createdb corteva\n$ sudo su - postgres\n$ psql\n$ ALTER USER \u003cusername\u003e WITH ENCRYPTED PASSWORD '\u003cpassword\u003e';\n$ GRANT ALL PRIVILEGES ON DATABASE corteva TO \u003cusername\u003e;\n```\n\n### Install pgadmin (Optional)\n\nDownload pgadmin installer from [here](https://www.pgadmin.org/download/pgadmin-4-windows/)\n\n\n### Install requirements and clone raw data\n```\n$ python3 -m venv corteva\n$ cd corteva\n$ source ./bin/activate\n$ git clone https://github.com/oya163/corteva.git\n$ cd corteva\n$ pip install -r requirements.txt\n$ git clone https://github.com/corteva/code-challenge-template.git\n```\n\n\n## Folder Structure\n```\n├── code-challenge-template -\u003e contains the raw data files\n│   ├── wx_data\n│   └── yld_data\n├── manage.py\n├── README.md\n├── requirements.txt\n├── scripts -\u003e contains scripts to load the database, perform analysis and log files\n├── weather -\u003e contains django-admin setting for this project\n└── weatherapp -\u003e contains django app\n```\n\n## How to run\n\n### Data ingestion scripts\n - Weather data ingestion\n   - **weather_data_ingestion** script loads the weather data from CSV file and ingests into **WeatherData** table. \n   - Performs basic data cleaning before inserting into database table, like converting -9999 as NULL values, so that it will be easier for calculation in later phases. \n   - Performs bulk insertion of data of each file for faster data ingestion. \n   - Produces log which are recorded into **logs/log_ingestion.log** file.\n\n\n```\npython manage.py runscript weather_data_ingestion\n```\n\n\n - Yield data ingestion\n   - **yield_data_ingestion** script loads the yield data from CSV file and ingests into **YieldData** table. \n   - Produces log which are recorded into **logs/log_ingestion.log** file.\n    \n```\npython manage.py runscript yield_data_ingestion\n```\n\n\n - Perform ETL\n   - **perform_etl** script basically loads the weather data from **WeatherData** table into pandas dataframe.\n   - Performs basic calculation like converting temperatures from one-tenths of degree Celsius to degree Celsius and converting precipitation from one-tenths of millimeter into centimeter, and inserts transformed records into **Analytics** table for further consumption by REST API.\n   - Produces log which are recorded into **logs/log_etl.log** file.\n\n```\npython manage.py runscript perform_etl\n```\n\n### Django standalone server\n\nThe server is hosted at http://127.0.0.1:8000/ by default\n\n    python manage.py runserver\n\n## REST API\n\nDjango Rest Framework's **ListAPIView** and **DjangoFilterBackend** is extensively used to process the GET requests and filter the results according to the query parameters.\nThree main APIs are exposed which are explained as follows:-\n - /api/weather\n```\nThis API lists the weather data from WeatherData table.\n\nQuery Params (optional):-\n    - id(int): record id\n    - date(date): record date [format: %Y-%m-%d]\n    - station_id(char): weather station id\n    - page(int): page number for pagination purpose\n\nUsage:\n    - http://127.0.0.1:8000/api/weather?id=5518560\n    - http://127.0.0.1:8000/api/weather?page=1\n    - http://127.0.0.1:8000/api/weather?date=2014-01-01\n    - http://127.0.0.1:8000/api/weather?station_id=USC00110072\n    - http://127.0.0.1:8000/api/weather?date=2014-01-01\u0026station_id=USC00110072\n``` \n\n - /api/yield\n```\nThis API lists the yield data from YieldData table.\n\nQuery Params (optional):-\n    - id(int): record id\n    - date(date): record date [format: %Y-%m-%d]\n    - page(int): page number for pagination purpose\n\nUsage:\n    - http://127.0.0.1:8000/api/yield?id=1\n    - http://127.0.0.1:8000/api/yield?page=1\n    - http://127.0.0.1:8000/api/yield?date=2014-01-01\n``` \n\n - /api/weather/stats\n```\nThis API lists the transformed data from Analytics table.\n\nQuery Params (optional):-\n    - id(int): record id\n    - date(date): record date [format: %Y-%m-%d]\n    - station_id(char): weather station id\n    - page(int): page number for pagination purpose\n\nUsage:\n    - http://127.0.0.1:8000/api/weather/stats?id=1\n    - http://127.0.0.1:8000/api/weather/stats?page=1\n    - http://127.0.0.1:8000/api/weather/stats?date=2014-01-01\n    - http://127.0.0.1:8000/api/weather/stats?station_id=USC00110072\n    - http://127.0.0.1:8000/api/weather/stats?date=2014-01-01\u0026station_id=USC00110072\n``` \n\n## Testing\n\nDjango's in-built test library is utilized to perform the test on the response of all of the exposed APIs and also checks max temperature is always greater than min temperature on Analytics table.\n\n## CI pipeline\n\nA simple Continuous Integration (CI) workflow is integrated in Github Actions so that **linting** and **testing** are performed on every `push` and `pull requests` to the master branch.\n","project_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foya163%2Fcorteva","html_url":"https://awesome.ecosyste.ms/projects/github.com%2Foya163%2Fcorteva","lists_url":"https://awesome.ecosyste.ms/api/v1/projects/github.com%2Foya163%2Fcorteva/lists"}