[MGT6705] Time Series Data Analysis and Forecasting Mid-term Project
๐ Project Overview
In this project, I analyzed which physicochemical properties most significantly affect the quality of red wines using data from the UCI Machine Learning Repository. Motivated by Koreaโs high import taxes and sensitivity to wine pricing, the analysis aimed to help consumers underrated high-quality wines using accessible numerical indicators and producers to reduce empirical errors when making the wines.
๐ฌ Methodology
- Dataset: 1599 red wine samples (Portuguese โVinho Verdeโ)
- Model: Ordered Probit Regression (To properly dictate what type of factors affect and how much)
- Feature Selection: Stepwise variable selection + Type II deviance tests
- Final Model Variables:
volatile acidity
,total sulfur dioxide
,sulphates
, andalcohol
๐ Key Findings
- Volatile acidity negatively affects quality (higher โ worse).
- Alcohol and sulphates are positively associated with higher quality.
- Total sulfur dioxide has a weak negative effect but interacts subtly with sulphates.
- Final model accuracy: 0.61, AUC: 0.748
These findings suggest that even without expert critic ratings, consumers and importers can leverage simple physicochemical indicators to select high-quality red wines.