1 min read

Multiouput Text Classification

Classifying Text Messages

GitHub Repository

1. Objective

During disasters, communication is vital, but it is at such times that disaster response organizations are least able to filter and extract the most important messages.

The objective of this project is to classify messages into several categories to indicate to which organizations they should be referred.

For this, I use real anonymized messages sent during disaster events provided by Udacity, in collaboration with Appen (formally Figure 8), as a part of their Data Scientist Nanodegree Program.

2. Overview

  • Merged and cleaned data from files into a SQLite database (ETL process)
  • Applied a Machine Learning pipeline containg NLTK features, a custome tokenizer to handle text, and a classification model able to handle multioutput tasks
  • Introduced grid search cross validation on hyperparameters to the ML pipeline to be able to not only fine-tune the ML model but also the preprocessing steps to handle text data
  • Built a web application using Flask that lets users input a message to be classified by the model (runs locally) and display Plotly figures created with data from the SQLite database

Some images from the project

1 / 4
Web Application to get input data (new messages to classify)
2 / 4
Viz-1 in Web App: % of messages in each category
3 / 4
Viz-2 in Web App: Spearman correlation between categories
4 / 4
Classification example

Copyright © All rights reserved | This template is made with by Colorlib --- Colorlib