Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/girish119628/data-tagging
Data Tagging, Analysis and Insights Generation using Python [NLP, Tokenization]
https://github.com/girish119628/data-tagging
nlp-keywords-extraction tagging tokenization
Last synced: 23 days ago
JSON representation
Data Tagging, Analysis and Insights Generation using Python [NLP, Tokenization]
- Host: GitHub
- URL: https://github.com/girish119628/data-tagging
- Owner: girish119628
- Created: 2024-12-11T11:41:58.000Z (25 days ago)
- Default Branch: main
- Last Pushed: 2024-12-11T11:46:22.000Z (25 days ago)
- Last Synced: 2024-12-11T12:33:06.647Z (25 days ago)
- Topics: nlp-keywords-extraction, tagging, tokenization
- Language: Jupyter Notebook
- Homepage:
- Size: 0 Bytes
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Data-Tagging
Data Tagging, Analysis and Insights Generation using PythonThe tagging of each given field (Root Cause, Symptom_Condition, Symptom_Component, Fix_Condition, and Fix_Component.)
# 1. Column-Wise Analysis:
○ Perform a column-wise analysis of the provided dataset.
○ Describe each column in terms of its data type, unique values, distribution, and
overall significance for stakeholders
# 2. Data Cleaning:
○ Handle missing or invalid values using appropriate methods (e.g., imputation,
deletion).
○ Address inconsistencies in categorical columns (e.g., typos, inconsistent
capitalization).
○ Ensure numerical columns are in the correct format and free from outliers, where
applicable.
# 3. Identifying Critical Columns:
○ Select the top 5 critical columns that might be most insightful for stakeholders
according to your data understanding.
○ Provide reasoning for your selection.
○ Generate visualizations (e.g., bar plots etc) using Python to represent these
insights effectively. (atleast 3)
# 4. Generating tags/features from free text available :
○ Generate meaningful tags from the free text fields to summarize information,
example - failure conditions and components etc etc..
# 5. Summary and Insights (Food for thought and has bonus marks)
○ Write a summary of the tags generated, including potential insights derived from
the dataset.
○ Provide actionable recommendations for stakeholders based on your analysis.
○ Highlight discrepancies in the dataset (e.g., null values, missing primary keys)
and how did you approach.