Predicting Australian Vehicle Price

bmw Cover

Introduction

The Australian Vehicle Price Prediction project aims to build an efficient machine learning pipeline to predict the price of vehicles based on various features. This project provides insights into the factors influencing vehicle pricing and showcases expertise in data preprocessing, machine learning, and deployment.

Objectives

  • Understand the relationship between vehicle attributes (e.g., brand, mileage, age) and price.
  • Preprocess and clean the data to ensure it is ready for machine learning.
  • Build and evaluate a machine learning model capable of accurately predicting vehicle prices.
  • Deploy the model as a web application for real-time predictions.

Dataset Overview

  • Kilometres: Distance traveled by the vehicle.
  • Seats_count: Number of seats in the vehicle.
  • Engine_in_litre: Engine size in liters.
  • Cylinders_in_engine: Number of engine cylinders.
  • FuelConsumption_Per100km: Fuel consumption per 100 km.
  • Drive Type: 4WD, AWD, Front, Other, Rear.
  • Transmission: Automatic, Manual.
  • Vehicle History: Demo, New, Used.
  • Fuel Type: Diesel, Electric, Hybrid, LPG, Leaded, Other, Premium, Unleaded.
  • Body Type: Commercial, Convertible, Coupe, Hatchback, Other, People Mover, SUV, Sedan, Ute/Tray, Wagon.
  • Brand: Vehicle Brand.

Pipeline Workflow

1. Exploratory Data Analysis (EDA)

  • Visualize distributions and correlations using libraries like Matplotlib and Seaborn.
  • Correlation heatmap to identify relationships between features and target variable.
  • Bar chart to showcase average prices by vehicle brand.
  • Scatter plot showing the relationship between kilometers driven and vehicle price.

2. Data Preprocessing (EDA)

  • Handle missing values and outliers.
  • Perform one-hot encoding for categorical variables such as transmission, fuel type, and drive type.
  • Scale numerical features using StandardScaler.
  • Split the dataset into training and testing sets.

3. Model Selection

  • Train multiple regression models (e.g., Linear Regression, Random Forest, Gradient Boosting).
  • Evaluate performance using metrics like RMSE, MAE, and R-squared.
  • Select the best-performing model for deployment.

4. Model Evaluation

  • Generate a table comparing model performance.
  • Plot residuals and predicted vs. actual values.

5. Deployment

Web Application

  • The project is deployed as a Streamlit web application.
  • Users can Input vehicle details (e.g., kilometers, engine size, fuel type, transmission).
  • Click "Predict" to get the estimated vehicle price.

Technical Details

  • The trained model is saved as vehicle_price_model.pkl
  • The Streamlit app (app.py) provides a simple, interactive interface for predictions.
  • The app is hosted on Hugging Face Spaces for public access: View Deployed App.

If you want to explore the deployed model, you can try it out on my Streamlit app.

View Deployment

Take a look at my other projects to see how I address real-world problems using data.

Back to Top