Buying and selling a car can be an important financial decision, and thus being able to estimate the selling price of a car is very important.
The objective of this project is to create a web application that allows for user input to estimate the selling price of a car given a number of features
using a machine learning estimator and real data collected from the web.
2. Overview
# Preprocessing
More than 50,000 ads on second-hand cars were collected from a popular french site. Complementary data was also extracted to better encode brands.
The preprocessing was done in two steps:
First, an Exploratory Data Analysis was conducted on the training set to know how to best clean the data
Then, some preprocessing was applied to the entire dataset (dropping unnecessary features, fixing data errors)
then most of the preprocessing was applied to only the raw training set
# Modeling
The model training included four steps:
Choosing the metrics to measure performance: RMSE, MAE & R²
A first round of modelling, as benchmark, using 7 different algorithms (LinearRegression, KNeighbors, AdaBoost, ...)
Selecting the 2 best models to be fine tuned using k-fold cross validation
After comparison, exporting the best model as a pickle file to be later applied on the test set
Following the steps of the preprocessing, a complete pipeline was constructed to clean the test set and apply the fine-tuned model.
The final performance (measured after applying the final model on the test set) is:
Models
RMSE score
MAE score
R² score
XGBRegressor
8883
3803
0.871
# Deployment
The deployment of the model was achieved in three steps:
The code used for the previous steps was translated from jupyter notebooks (research environment) to python scripts (production environment).
The scripts were designed as a local package, with init files, and configuration was handled using a .yml file.
Finally, a web application was built using Flask, HTML and CSS code, and deployed on Heroku.