2 min read

End-to-end Project

Price estimator for used cars

Web Application | GitHub Repository

1. Objective

Buying and selling a car can be an important financial decision, and thus being able to estimate the selling price of a car is very important.

The objective of this project is to create a web application that allows for user input to estimate the selling price of a car given a number of features using a machine learning estimator and real data collected from the web.

2. Overview

# Preprocessing

More than 50,000 ads on second-hand cars were collected from a popular french site. Complementary data was also extracted to better encode brands.

The preprocessing was done in two steps:

  • First, an Exploratory Data Analysis was conducted on the training set to know how to best clean the data
  • Then, some preprocessing was applied to the entire dataset (dropping unnecessary features, fixing data errors) then most of the preprocessing was applied to only the raw training set

# Modeling

The model training included four steps:

  • Choosing the metrics to measure performance: RMSE, MAE & R²
  • A first round of modelling, as benchmark, using 7 different algorithms (LinearRegression, KNeighbors, AdaBoost, ...)
  • Selecting the 2 best models to be fine tuned using k-fold cross validation
  • After comparison, exporting the best model as a pickle file to be later applied on the test set

Following the steps of the preprocessing, a complete pipeline was constructed to clean the test set and apply the fine-tuned model.

The final performance (measured after applying the final model on the test set) is:

Models RMSE score MAE score R² score
XGBRegressor 8883 3803 0.871

# Deployment

The deployment of the model was achieved in three steps:

  • The code used for the previous steps was translated from jupyter notebooks (research environment) to python scripts (production environment).
  • The scripts were designed as a local package, with init files, and configuration was handled using a .yml file.
  • Finally, a web application was built using Flask, HTML and CSS code, and deployed on Heroku.

Some images from the project

1 / 8
Target Distribution
2 / 8
Log Transformation of Target to improve spread
3 / 8
Exploring missing values
4 / 8
Adding and testing 'Luxury' brand feature
5 / 8
Adding and testing 'Origin country' brand feature
6 / 8
Visualizing performance through predicted vs actual values
7 / 8
Feature importance of the final model on test data
8 / 8
Example prediction of web application

Copyright © All rights reserved | This template is made with by Colorlib --- Colorlib