Price Estimator for Used Cars

2 min read

End-to-end Project

Price estimator for used cars

1. Objective

Buying and selling a car can be an important financial decision, and thus being able to estimate the selling price of a car is very important.

The objective of this project is to create a web application that allows for user input to estimate the selling price of a car given a number of features using a machine learning estimator and real data collected from the web.

2. Overview

# Preprocessing

More than 50,000 ads on second-hand cars were collected from a popular french site. Complementary data was also extracted to better encode brands.

The preprocessing was done in two steps:

First, an Exploratory Data Analysis was conducted on the training set to know how to best clean the data
Then, some preprocessing was applied to the entire dataset (dropping unnecessary features, fixing data errors) then most of the preprocessing was applied to only the raw training set

# Modeling

The model training included four steps:

Choosing the metrics to measure performance: RMSE, MAE & R²
A first round of modelling, as benchmark, using 7 different algorithms (LinearRegression, KNeighbors, AdaBoost, ...)
Selecting the 2 best models to be fine tuned using k-fold cross validation
After comparison, exporting the best model as a pickle file to be later applied on the test set

Following the steps of the preprocessing, a complete pipeline was constructed to clean the test set and apply the fine-tuned model.

The final performance (measured after applying the final model on the test set) is:

Models	RMSE score	MAE score	R² score
XGBRegressor	8883	3803	0.871

# Deployment

The deployment of the model was achieved in three steps:

The code used for the previous steps was translated from jupyter notebooks (research environment) to python scripts (production environment).
The scripts were designed as a local package, with init files, and configuration was handled using a .yml file.
Finally, a web application was built using Flask, HTML and CSS code, and deployed on Heroku.

Some images from the project

1 / 8

2 / 8

Log Transformation of Target to improve spread

3 / 8

4 / 8

5 / 8

6 / 8

Visualizing performance through predicted vs actual values

7 / 8

Feature importance of the final model on test data

8 / 8

❮ ❯