How to Maximize the Gains of the Coupon Offers to Customers?

Chao Lin
6 min readFeb 13, 2021

The Final Project for the Udacity Data Scientist Nanodegree

Photo by Starbucks

In this Starbucks Capstone Challenge of the Udacity Data Scientist Nanodegree, I conducted a exploratory analysis of the simulated data provided by Starbucks, to find out how people make purchasing decisions based on the discount or coupon offers they received. I also build a prediction model with FunkSVD algorithm to predict the responsiveness of the users toward the offers.

Business Understanding

For completing this project, I am going to answer the questions below:

1. Which offer should be sent to a particular customer to maximize the offer’s sales gain?

2. Which demographic groups respond best to which offer type?

Before we answer the questions, we are going to do the data wrangling process first to clean the data so it can be analyzed.

Dataset Description

The data is contained in three files:

  • portfolio.json — containing offer ids and metadata about each offer (duration, type, etc.)
  • profile.json — demographic data for each customer
  • transcript.json — records for “transactions”, “offers received”, “offers viewed”, and “offers completed”

Here is the schema and explanation of each variable in the files:

portfolio.json

  • id (string) — offer id
  • offer_type (string) — the type of offer ie BOGO, discount, informational
  • difficulty (int) — the minimum required to spend to complete an offer
  • reward (int) — the reward is given for completing an offer
  • duration (int) — time for the offer to be open, in days
  • channels (list of strings)

profile.json

  • age (int) — age of the customer
  • became_member_on (int) — the date when customer created an app account
  • gender (str) — gender of the customer (note some entries contain ‘O’ for other rather than M or F)
  • id (str) — customer-id
  • income (float) — customer’s income

transcript.json

  • event (str) — record description (ie transaction, offer received, offer viewed, etc.)
  • person (str) — customer-id
  • time (int) — time in hours since the start of the test. The data begins at time t=0
  • value — (dict of strings) — either an offer id or transaction amount depending on the record

Data Exploration and Cleaning

For the profile.json, since the ratio of rows containing NaN values is only 12.8%, I deleted those rows firstly. The graphs below shows a brief picture of the distributions of the demographics.

Distribution of Age in Profile
Distribution of income in Profile
Distribution of Gender

Also I renamed the ‘id’ column to ‘user_id’, and created a member_days column to hold the days since the user became a member.

Cleaned Profile Data

As for the portfolio.json, I changed the duration from day to hour, applied one-hot-encoding to the channels column, and renamed the ‘id’ column to ‘offer_id’.

Cleaned Portfolio Data

As for the transcript.json, I kept only offer related rows, extracted the offer id from the value column, renamed the ‘person’ column to ‘user_id’, and sort the values in a ascending order by ‘user_id’ and ‘time’.

Cleaned Transcript Data

Modeling

In order to implement the FunkSVD modeling, I need to create a user-offer-matrix. Also note that not all the offers completed can be seen as ‘successful’ since some of the offers were never viewed by the customer. I identified the ‘successful’ offers as those firstly viewed and then completed, and added a column to mark the success of the offers.

Then I created a user-offer-matrix with the rows to be the user IDs, the columns to be the offer IDs, and the value to be whether the offer is successful.

Part of the user-offer-matrix

After that, I splited the data sets to train set and test set with a test_size of 0.3. Then I implemented the FunkSVD algorithm with latent_features = 15, learning_rate = 0.005, and iters = 250. As a result, I got the mean squared error of the user matrix and offer matrix as 0.015.

Evaluation

By implementing the prediction with the trained FunkSVD model to the test set, I got a mean squred error of 0.1327.

Recommendation Strategy

Now we can use the trained FunkSVD model to provide offers to our customers. If the customer is a new customer, then give him the top offers that gains the most reactions. The top offers are shown as below:

Sales Gains of Offers
List of Top Offers to be Recommended to a New Customer

Further Exploratory Analysis of the Reactions to the Offers

To dip into further insights of the demographics influences to the responsiveness to the offers, I conducted some exploratory analysis.

First let’s see the response distribution by gender.

It seems males tend to respond to discount rather than BOGO, while females tend to respond to offers more than males. The group with gender ‘other’ tend to response to the offers the most.

Next let’s check the response of the user groups by gender to various channels.

It seems people tend to response to social channel better, also it shows the same trend that in terms of responsiveness: other gender > females > males.

Since the income often influence how people respond to offers, we would like to check the responsiveness to different offers by different income levels. We divide the users into 4 groups with different income levels: 0–40000$, 40000–60000$, 60000–80000$, and 80000+$.

It seems people with higher income tend to respond better than those with lower income. Among all the income levels, except those with 80000+$ income, tend to discount better than BOGO.

Also considering people of different ages might react differently to offer channels, we would like to check the responsiveness to different offers by different ages. We divide the users into 4 groups by ages: 0–30, 30–50, 50–70 and 70+.

Seems youger people tend to respond less to offers than older people.

Improvement

Due to the Cold Start Problem, the recommendation engine using FunkSVD is not good to use on new customers. I use the top sales offers to recommend to new customers, but improvements can be made by using the demographical data of the customers, with a user collaborative recommend scheme.

Conclusion

By completing this project, I built a model based on FunkSVD to predict the success of the offers to a particular customer. The mean squred error is 0.0015 for the train set and 0.1327 for the test set. Also by further exploration analysis, I found the insights below:

Women respond better than men.

People respond better to social channel than other channels.

People with higher income respond better to offers.

Older people respond to offers better than younger people.

To see more about this analysis, see the link to my Github available here.

--

--

Chao Lin
0 Followers

Aiming to become a data scientist