Ecosyste.ms: Awesome
An open API service indexing awesome lists of open source software.
https://github.com/arturogonzalezm/missing_stock_prices_python
A time series of a stock's highest price during a trading day at the New York Stock Exchange is provided to you.
https://github.com/arturogonzalezm/missing_stock_prices_python
machine-learning machine-learning-algorithms python python3 python310 statistical-analysis
Last synced: about 17 hours ago
JSON representation
A time series of a stock's highest price during a trading day at the New York Stock Exchange is provided to you.
- Host: GitHub
- URL: https://github.com/arturogonzalezm/missing_stock_prices_python
- Owner: arturogonzalezm
- License: mit
- Created: 2024-04-11T23:46:25.000Z (9 months ago)
- Default Branch: master
- Last Pushed: 2024-04-12T06:40:35.000Z (9 months ago)
- Last Synced: 2024-04-12T09:14:18.042Z (9 months ago)
- Topics: machine-learning, machine-learning-algorithms, python, python3, python310, statistical-analysis
- Language: Jupyter Notebook
- Homepage:
- Size: 2.53 MB
- Stars: 0
- Watchers: 2
- Forks: 0
- Open Issues: 0
-
Metadata Files:
- Readme: README.md
- License: LICENSE
Awesome Lists containing this project
README
# Missing Stock Prices - HackerRank Challenge
A time series of a stock's highest price during a trading day at the New York Stock Exchange is provided to you.
Each test case has day's highest prices missing for certain days.
Analyse the data to identify the missing price for those particular days.## Input Format
- The first line contains an integer `N`, which represents the number of rows of data to follow.
- Each of the next `N` rows contains data with the following columns:
- A **time-stamp** in the first column.
- The **day's highest price** for the stock in the second column.Columns are delimited by a tab. Each input file exactly contains twenty rows where the day's highest price is missing, marked as `"Missing_1"`, `"Missing_2"`, ..., `"Missing_20"`. These missing records are randomly dispersed among the rows.
## Output Format
The output should contain exactly twenty rows, each containing your predicted value for each of the missing values (`Missing_1`, `Missing_2`, ..., `Missing_20`) in that order.
## Scoring
The scoring will be based on the mean of the magnitude of the percentage difference by comparing your expected answers with the actual stock price high for each missing record across all test cases. The difference is calculated as follows:
```text
d = Summation of abs((expected price - computed price)/expected price) x 100
```## Scoring Criteria
The expression: $$50 \times \max(2 - d, 0)$$ Is a mathematical formula used in various contexts, often for scoring or evaluating performance based on a deviation measure 'd'. Here's a breakdown of each part of this formula:
- **50**: This is a constant multiplier used to scale the result of the function to a desired range or magnitude.
- **\(\max(2 - d, 0)\)**: This is the key function in the expression. It's the maximum function which compares two values and returns the larger of the two. In this case, it compares \(2 - d\) and \(0\).
- **\(2 - d\)**: Here, \(d\) represents a deviation or error from a target or expected value. \(2 - d\) calculates a value that decreases as \(d\) increases. In many scoring systems, \(d\) could represent an error rate or a percentage difference, and the number 2 is a threshold value.
- **0**: This ensures that the expression does not return a negative number. If \(d\) exceeds 2, making \(2 - d\) negative, the \(\max\) function will return 0.**Putting it together:**
- When \(d\) (the deviation) is less than or equal to 2, \(2 - d\) will be positive and therefore \(50 \times \max(2 - d, 0)\) will yield a scaled positive score based on how much smaller \(d\) is than 2.
- When \(d\) is greater than 2, \(2 - d\) becomes negative, and \(\max(2 - d, 0)\) results in 0. Consequently, \(50 \times 0\) is 0, reflecting a failure to stay within the acceptable deviation range.
**Note**, the above is used to calculate a score that decreases with an increase in the deviation \(d\) beyond a set point (2 in this case), down to a minimum score of 0. It’s commonly used in contexts where there's a penalty for exceeding a certain error threshold, ensuring scores are non-negative.
## Diagram
```mermaid
sequenceDiagram
participant Main
participant ParseData
participant ArimaInterpolation
participant GradientBoostingInterpolation
participant RandomForestInterpolation
participant SplineInterpolation
participant LinearInterpolationMain->>+ParseData: Parse data from file
ParseData-->>-Main: Return DataFrame & missing indicesMain->>+ArimaInterpolation: Apply ARIMA interpolation
ArimaInterpolation-->>-Main: Return interpolated DataFrameMain->>+GradientBoostingInterpolation: Apply Gradient Boosting interpolation
GradientBoostingInterpolation-->>-Main: Return interpolated DataFrameMain->>+RandomForestInterpolation: Apply Random Forest interpolation
RandomForestInterpolation-->>-Main: Return interpolated DataFrameMain->>+SplineInterpolation: Apply Spline interpolation
SplineInterpolation-->>-Main: Return interpolated DataFrameMain->>+LinearInterpolation: Apply Linear interpolation
LinearInterpolation-->>-Main: Return interpolated DataFrameMain->>Main: Combine results and output
```