Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/zahidhasann88/product-manufacturer-mapping
This is designed to process and analyze product data from multiple sources, map related manufacturers, and provide insights into manufacturer relationships.
https://github.com/zahidhasann88/product-manufacturer-mapping
csv data-analytics data-normalization data-processing nodejs sqlite typescript
Last synced: 18 days ago
JSON representation
This is designed to process and analyze product data from multiple sources, map related manufacturers, and provide insights into manufacturer relationships.
- Host: GitHub
- URL: https://github.com/zahidhasann88/product-manufacturer-mapping
- Owner: zahidhasann88
- Created: 2024-08-20T08:41:21.000Z (5 months ago)
- Default Branch: main
- Last Pushed: 2024-08-20T09:08:09.000Z (5 months ago)
- Last Synced: 2024-11-06T01:50:07.589Z (2 months ago)
- Topics: csv, data-analytics, data-normalization, data-processing, nodejs, sqlite, typescript
- Language: TypeScript
- Homepage:
- Size: 41.4 MB
- Stars: 0
- Watchers: 1
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
Awesome Lists containing this project
README
# Product Manufacturer Mapping
## Overview
This project is designed to process and analyze product data from multiple sources, map related manufacturers, and provide insights into manufacturer relationships.
## Table of Contents
1. [Setup](#setup)
2. [Usage](#usage)
3. [Main Components](#main-components)
4. [Data Flow](#data-flow)
5. [Output](#output)
6. [Performance Improvements](#performance-improvements)
7. [Validation Enhancements](#validation-enhancements)
8. [Logging](#logging)## Setup
### Prerequisites
- Node.js (v14 or later recommended)
- npm (usually comes with Node.js)### Installation
1. Clone the repository:
```bash
git clone https://github.com/zahidhasann88/product-manufacturer-mapping.git
cd product-manufacturer-mapping
```2. Install dependencies:
```bash
npm install
```3. Place your CSV data files in the `./data` directory.
## Usage
1. To compile the TypeScript code and run the application, use:
```bash
npm run build-and-start
```## Main Components
### 1. CSV Reader (`csvReader.ts`)
Reads and parses CSV files containing product data and matches. Utilizes parallel processing for improved performance when handling multiple CSV files.
### 2. Manufacturer Mapper (`manufacturerMapper.ts`)
Maps related manufacturers based on product data and matches. It also determines the relationship type (parent/child/sibling) between manufacturers.
### 3. Brand Assigner (`brandAssigner.ts`)
Assigns a brand to a given product title based on the manufacturer relations. The brand is determined using a case-insensitive match against the product title.
### 4. Database Manager (`databaseManager.ts`)
Handles all database operations, including initializing the database, saving manufacturer relations, and retrieving data. The database operations ensure transactional integrity during complex operations.
### 5. Logger (`logger.ts`)
Configures and manages application logging. Logs are written to both console and file, with configurable log levels and directory paths.
### 6. Validation Algorithm (`validationAlgorithm.ts`)
Implements heuristics to flag potentially faulty manufacturer matches, improving data quality by detecting anomalies.
## Data Flow
1. **CSV Reading**: CSV files are read in parallel from the `data/` directory using the CSV Reader.
2. **Manufacturer Mapping**: Product data and matches are processed by the Manufacturer Mapper, leveraging enhanced relationship detection algorithms.
3. **Brand Assignment**: Product titles are assigned a brand using the Brand Assigner.
4. **Database Operations**: Manufacturer relations are generated and saved to the SQLite database via the Database Manager.
5. **Validation**: The Validation Algorithm flags potentially problematic manufacturer matches.
6. **Logging**: All operations and results are logged to both the console and log files.## Output
The program produces the following outputs:
1. **Console Logs**: Detailed process steps and results are output to the console.
2. **Log Files**: Logs are stored in the `logs/` directory for comprehensive debugging and auditing.
3. **SQLite Database**: A database file (`manufacturer_relations.db`) containing the manufacturer relations.
4. **Flagged Manufacturers**: A list of manufacturers that may require manual review, based on the enhanced validation algorithm.## Performance Improvements
The implementation includes parallel processing for CSV file reading, which significantly improves performance when dealing with multiple data files.
## Validation Enhancements
The validation algorithm includes additional heuristics such as string similarity checks and detection of number-only manufacturers, reducing the likelihood of false matches.
## Logging
### Configuration
Logging is configured via environment variables:
- **`LOG_DIR`**: Specifies the directory where log files are stored (default: `./logs`).
- **`LOG_LEVEL`**: Specifies the log level (`info`, `error`, etc.), allowing fine-grained control over logging verbosity.### Log Outputs
- **Error Logs**: Stored in `logs/error.log` for capturing error-level messages.
- **Combined Logs**: Stored in `logs/combined.log`, containing all log messages above the configured log level.