Term Deposit Prediction

A comprehensive end-to-end Machine Learning project designed to predict bank deposit subscriptions using the well-known “Bank Marketing” dataset with production grade deployment techniques.

Overview

This project tackles the challenge of predicting bank deposit subscriptions using a comprehensive Machine Learning pipeline. It leverages the popular “Bank Marketing” dataset to train and deploy a model that optimizes F1-score and important performance metrics due to the dataset’s imbalanced nature.

The project demonstrates a complete workflow, including data ingestion with CockroachDB, data exploration and feature engineering, model selection with hyperparameter tuning via MLFlow, and deployment to Microsoft Azure cloud using Docker containers and Github Actions for CI/CD. Additionally, a web application is built with PyWebIO and Flask for user interaction with the prediction model.

Tech Stack Used

Data Storage & Retrieval

  • CockroachDB (SQL Database): Efficiently stored and retrieved the “Bank Marketing” dataset for data ingestion..

Data Preprocessing & Analysis

  • Python Libraries (Pandas, NumPy, Seaborn, etc.): Handled data manipulation, cleaning, exploration, and visualization, providing insights for feature engineering.

Model Selection & Model Training

  • Machine Learning Algorithms (CatBoost, Random Forests, SVM, etc): Built, trained, and evaluated various machine learning models to predict bank deposit subscriptions.
  • MLFlow: Tracked model experiments, hyperparameter tuning results, and other metrics (AUC-ROC, Precision, Recall) for optimal model selection.
  • DagsHub: Remote server for comparing different experiments tracked by MLFlow.

Web Application Development

  • Docker: Containerized the prediction pipeline, ensuring consistent runtime environment across deployment stages.
  • PyWebIO & Flask: Developed an interactive web interface for users to leverage the trained model’s predictions.

Model Deployment

  • Microsoft Container Registry (MCR): Securely stored the Docker image for deployment.
  • Microsoft Azure Web App Service: Hosted the web application built for user interaction with the prediction model.

CI/CD (Continuous Integration/Continuous Delivery)

  • GitHub Actions: Automated the build, testing, and deployment pipeline for efficient and reliable updates.