[MGT6705] Time Series Data Analysis and Forecasting Mid-term Project

๐Ÿ“ Project Overview

In this project, I analyzed which physicochemical properties most significantly affect the quality of red wines using data from the UCI Machine Learning Repository. Motivated by Koreaโ€™s high import taxes and sensitivity to wine pricing, the analysis aimed to help consumers underrated high-quality wines using accessible numerical indicators and producers to reduce empirical errors when making the wines.

๐Ÿ”ฌ Methodology

  • Dataset: 1599 red wine samples (Portuguese โ€œVinho Verdeโ€)
  • Model: Ordered Probit Regression (To properly dictate what type of factors affect and how much)
  • Feature Selection: Stepwise variable selection + Type II deviance tests
  • Final Model Variables: volatile acidity, total sulfur dioxide, sulphates, and alcohol

๐Ÿ“ˆ Key Findings

  • Volatile acidity negatively affects quality (higher โ†’ worse).
  • Alcohol and sulphates are positively associated with higher quality.
  • Total sulfur dioxide has a weak negative effect but interacts subtly with sulphates.
  • Final model accuracy: 0.61, AUC: 0.748

These findings suggest that even without expert critic ratings, consumers and importers can leverage simple physicochemical indicators to select high-quality red wines.

๐Ÿ“Ž Download Full Report

๐Ÿ“„ Download PDF Report

๐Ÿ” View R Code Markdown Report (HTML)