Donor Retention
In recent years, UNICEF has seen a decline in the number of monthly donors. In addition, donor behaviours have shifted, and traditional recruitment channels are losing effectiveness. Research from other sectors shows that retaining existing customers is more cost-effective and efficient than acquiring new ones. Given these factors, maintaining current donor relationships is more crucial than ever. This project aims to enhance donor retention using artificial intelligence. It will focus on developing a deeper understanding of donor behaviour and creating predictive models to identify when donors are likely to leave, enabling targeted interventions to retain them.
Leverage AI to predict and enhance retention rates among monthly donors, identifying key factors that influence their commitment. By understanding the characteristics that contribute to long-term donor commitment, UNICEF can tailor engagement strategies to improve overall donor satisfaction and loyalty.
2024-07-23 08:36
Post
Having demonstrated that our initial models were not performing well with the provided data, we redirected our efforts in weeks 6 and 7. Our primary objective remains to support UNICEF in their technological journey and provide them with an AI tool that, even if it doesn't directly identify donors at risk of churning, enables data understanding, facilitating more informed decision-making. During week 6, we started building an AI assistant designed to empower UNICEF with a tool where they can upload their data and query insights through natural language. The AI assistant tool that we have developed functions with two LLMs. The first LLM translates natural language queries into SQL, allowing the backend to extract relevant data from the database. The second LLM then translates this data into a natural language response. The result is displayed in a user-friendly chat interface. This natural language-to-SQL will enable non-technical people to query their data and give UNICEF a tool to explore and gather insights about their data. In the same week, we received new data from UNICEF containing the payment history for their donor base. Recognizing the importance of time series data for our task, we noted that the new dataset was limited in scope, containing only a handful of features. Aware of the limitations of the data, towards the end of week 6, we decided to split our team efforts into the AI assistant tool and training new models with the newly available data. In week 7, we worked on pre-processing the new data and feature engineering. Towards the end of the week, we started training our models concentrating on XGBoost, GBM, and LSTM models, aiming for improved performance. During this week, we also made improvements to the AI assistant. Apart from refining its functionality and appearance, we thought to introduce two additional modes: one for understanding customer segments and another for data visualization. These new features are still under development. Feel free to reach out if you have any questions, need more information, or have any advice for us! |
2024-07-23 07:57
Post
In the fourth week, we began with a cleaned and processed dataset, allowing us to move into the next phase of our project (see previous post for detailed steps). This phase involved exploring various models, including Random Forest, CatBoost, KNN, SVC, GBM, and XGBoost. For each model, we evaluated metrics such as accuracy, precision, recall, F1 score, F1-false, and F1-true. Our initial goal was to establish a fair comparison to narrow down the models for further testing and hyperparameter tuning. During the tests, CatBoost, GBM and XGBoost emerged as the best-performing models. We also chose to explore a simple Decision Tree due to UNICEF's interest in model explainability and their primary objective of understanding and identifying why people are churning. Despite an exhaustive hyperparameter tuning, we found that while our models performed well on the validation data (approximately 85% accuracy on June data), the results were unsatisfactory on the July data. The models detected only a small fraction of actual churners and misidentified many non-churners as churners. Further investigation revealed high volatility, indicating the model's difficulty in classifying already seen donors in the training set. In addition to the modelling efforts, we brainstormed with UNICEF about our final solution and the project's conclusion. We considered various tools such as Tableau, PowerBI, Google Looker, and Qlik. Ultimately, we decided to implement our own AI solution, hosted on an AWS instance provided by UNICEF. We began developing our own website and built the tech stack around it. The chosen architecture is depicted in the accompanying image. Looking ahead, we aim to provide UNICEF with insights into what type of data they should collect to train and obtain better models. We will also work on model explainability to understand which features our models considered most important, so we can report these to UNICEF. Feel free to reach out if you have any questions, need more information, or have any advice for us! |
We started the project by getting familiar with the dataset provided by UNICEF, which includes donor data over the past 5 years and email interactions over the past 6 months. Unfortunately, we only have the first and last interaction events for donors, not the intermediate. Despite this limitation, we will use the available data to build and assess our models. Our main goal is to evaluate both the predictions' quality and the models' interpretability to understand what factors lead to donor churn. Next, we dived into literature and market research on predicting non-profit donor churn. We arranged a few meetings with experts, which greatly helped us and guided us in the right direction. This research helped us identify key features and methodologies used in donor retention models. We also assessed different platforms for deploying our solution. While we haven’t finalised this yet, we’re getting closer. We know our models will be deployed to AWS, and that a solution with a direct integration in Salesforce for a user-friendly interface won’t be possible, so we are currently looking into alternatives with our client. The alternatives include Qlik and building our platform. About the models for the predictions, our research pointed us towards more traditional approaches. These include decision trees, random forests, K-Nearest Neighbors, Gradient Boosting Machine, CatBoost, Support Vector Machine, and Extreme Gradient Boosting. We will start by implementing these models. One challenge with these models is the need for manual feature engineering. In week 2, after thorough data cleaning (removing duplicates, handling missing values, and standardizing formats), we tackled this task. We combined donor data with their email interactions, creating features like interaction frequency, time since the last donation, and response rates. We also included an estimate of salary per donor based on their postal code, which we scrapped from the web. Other things that we have considered and implemented are: outlier removal using exploratory approaches and the Interquartile Range method, feature extraction using Pearson and Kendall Tau for numerical features and chi^2 tests for categorical features. We spent quite some time on this, as the quality of input data directly affects the quality of our model predictions. Lastly, something interesting that we addressed, is the unbalanced nature of our data, where the number of non-churners is significantly higher than the number of churners. To balance the classes, we generated synthetic data using SMOTE and ADASYN, and will be experimenting with training models with this new balanced dataset and with the original dataset and a weighted loss functions. Feel free to reach out if you have any questions, need more information, or have any advice for us! |