Skip to content

Performing user engagement analysis on given dataset of user activity and recommendations

Notifications You must be signed in to change notification settings

vishruti-patel/engagement-analysis

Repository files navigation

User Engagement

Introduction

We are provided with 2 datasets activity.csv and recommendations. Activity depicts the click stream and engagement of a users on a platform where user sessions are run by moderators. Recommendations depicts creative recommendations given by moderators to users to boost engagement and satisfaction. We also have moderator who is responsponsible to manage user sessions like 1:1s and offline feedbacks. However the moderators dataset is already aggregated.

Datasets

User Activity Data:

  • Implies active user participation via metrics like average session length, the messages sent, and resources clicked per session.
  • Timestamp data reveals the preferred session times
  • Feedback ratings provide user satisfaction level.

Moderator Performance Data:

  • Provides the moderator effectiveness information by tracking the key metrics like number of sessions moderated, the average response time and the user satisfaction scores.

Recommendation Data:

  • Provides us with the recommendation id which is unique to each user based on the recommendation type suggested to the user during the session.
  • Feedback score and click through rates(CTR) could be useful to identify the interest of the user in types of recommendations.

Data Model

Observations and Patterns

The provided datasets simulate business use cases which require analytical attention for achieving favourable results. During the initial stage of exploration, I made the following observations on datasets. 

User Activity Data:

  • user_id column is unique in user_activity_data.csv.
  • We have one-to-many mapping for a user_id in user_activity_date.csv <-> recommendations.csv.
  • Timestamp column in user activity data spans around 1 year (01/2024 to 01/2025)
  • Feedback Rating range from 1 to 6 and Resource clicked range from 0 to 6.
  • Messages Sent range from 1 to 85 and Session length 5 to 141
  • Longer sessions with higher feedback ratings imply higher user’s satisfaction whereas longer sessions - with lower feedback ratings imply user’s dissatisfaction.
  • The number of messages sent during the session imply the user's active engagement.

Moderator Performance Data:

  • Avg Response time ranges between 2 to 30.71 (assumption: seconds)
  • Chat sessions range between 17 to 493 (assumption: minutes).
  • Satisfaction score falls between the range 1 to 5.
  • The higher average response time could be correlated with the lower satisfaction score of the user
  • Datasets do not associate a moderator to any user or recommendation session as there is no foreign key.

Recommendation Data:

  • Feedback score range between 1 to 5 and click through rate between 0.1 to 1.
  • Podcast, Blog. Videos are distinct recommendation types.
  • Higher the number of resources clicked per session implies higher user engagement.

Analysis

User Engagement

  • Aggregated recommendations dataset by user_id to gather avg_feedback_score, []recommendation_types and avg_click_through_rate.
  • The result could be used then to join with the user_activity dataset on the user_id column.
  • We will be performing left-join as a recommendation may or may not exist for user activity / session.
  • click_through_rate in recommendations dataset is ambiguous in-terms of how it’s calculated, hence I am defining a new custom metric - PER_SESSION_CTR.
  • Further, we are also defining a new metric, USER_ENGAGEMENT_SCORE by taking a weighted sum for all contributing features from the joined dataset.

PER SESSION_CTR

Defined as (recources_clicked / recommendations_given). After joining the two datasets, we’ll have,

  • Sessions with recommendations:
  • Sessions without recommendations:

We’ll need to Normalize Engagement Across Both Scenarios. When recommendations are absent, resources_clicked alone isn't a perfect metric because it doesn't take into account whether users have the opportunity to interact with recommendations. In this case, we need to scale the resources_clicked metric relative to the session length (or a better alternative feature) to create a normalized measure of engagement.

USER_ENGAGEMENT_SCORE

Finally, we will compute the weighted sub of all the contributing features as user_engagement_score.

--- User engagement
with recommendations_cte as (
    select user_id,
           array_agg(recommendation_type) as recommendation_types,
           round(avg(click_through_rate)::numeric, 2) as avg_recommendation_ctr,
               round(avg(feedback_score)::numeric, 2) as avg_feedback_score,
           count(recommendation_type) as recommendation_counts
    from recommendations
    group by user_id
),

user_ranking as (
    select ua.user_id,
           cast(ua.timestamp as date) as date,
           ua.session_id,
           ua.session_length,
           ua.messages_sent,
           ua.feedback_rating,
           ua.resources_clicked,
           rc.recommendation_types,
           avg_recommendation_ctr,
           -- Calculate per session CTR, handle cases with clicks but no recommendations
           CASE WHEN rc.recommendation_types is null or array_length(rc.recommendation_types, 1) = 0 THEN ua.resources_clicked / ua.session_length -- No recommendations, CTR = 0
           WHEN ua.resources_clicked = 0 THEN 0  -- No clicks, CTR = 0
                ELSE (ua.resources_clicked / array_length(rc.recommendation_types, 1))
           END AS per_session_ctr
    from user_activity ua
    left join recommendations_cte rc
        on ua.user_id = rc.user_id
    order by rc.user_id
)

select *,
      round(
       (session_length::numeric * 0.2) +
       (messages_sent::numeric * 0.2) +
       (feedback_rating::numeric * 0.2) +
       (resources_clicked::numeric * 0.2) +
       (per_session_ctr::numeric * 0.2), 2
      ) as user_engagement_score
from user_ranking;

Moderator Performance Score

For showcasing top moderators, I have defined moderator_performance_score as weighted sum of chat_sessions_moderated, avg_response_time and user_satisfaction_score

--- top performing moderators
select *
from (select *,
             round(
                     (chat_sessions_moderated::numeric * 0.33) +
                     (avg_response_time::numeric * 0.33) +
                     (user_satisfaction_score::numeric * 0.33), 2
             ) as moderator_performance_score
      from moderator_performance
      )t
order by moderator_performance_score desc;

Recommendations

Reviewed aggregated insights per recommendation_type. From User Engagement, we’ll generate additional insights and contributions of recommendation.

--- recommendation aggregation
select recommendation_type,
       count(recommendation_type) as total_recommended,
       round(avg(feedback_score::numeric), 2) as avg_feedback_score,
       round(avg(click_through_rate::numeric)) as avg_ctr
from recommendations
group by recommendation_type;

Execution of Analysis

  • I have provided supportiv_analysis.py file with requirements.txt.
  • I am using Python Version: 3.11.2
  • When executing a python file, create a new folder data_set as the Python program uses data_set/<file_name>.csv for reading datasets into pandas dataframe.
  • Install dependencies using: $> pip install -r requirements.txt
  • Execute Data Analysis using: $> python supportive_analysis.py
  • The execution will automatically create new folder: user_engagement_visualizations

Visualizations

In this section, I will put some limelight on some of the analytical insights we uncovered by performing analysis described in the earlier section.

User engagement score trend week-by-week

  • During August the user engagement dropped significantly.
  • December and January are the highest user engagement months.

User engagement score trend week-by-week

Average User engagement score by Recommendation Type:

  • Recommendations of [podcast, blog, video] yield highest user engagement.
  • The frequency / Density of recommending all three together is less as shown on boxplot.

Average User engagement score by Recommendation Type

Average User engagement score by Recommendation Type

Correlation Heatmap of engagement metrics

  • User engagement is highly correlated with session_length and messages_sent.

Correlation Heatmap of engagement metrics

CTR distribution by feedback rating:

  • Feedback ratings of 2 and 3 are densely populated for per_session_ctr of range 0.5 to 1.5.

CTR distribution by feedback rating

Distribution of user engagement score

  • User engagement score between 10 and 30 is most common for platform users.
  • Important to provide attention to users whose engagement score ranges between 5-10.

Distribution of user engagement score

Messages sent vs Resources clicked by Recommendation Type

Messages sent vs Resources clicked by Recommendation Type

Pair Plot Of all correlated features

Pair Plot Of all correlated features

Conclusion

In conclusion, the user engagement is a crucial business metric which showcases how users are reacting to different experimentations, features, experimentations etc. It’s a key indicator in determining customer success. Every business aims to collect as much data as possible to capture contributing factors to user engagement and customer satisfaction.

About

Performing user engagement analysis on given dataset of user activity and recommendations

Resources

Stars

Watchers

Forks

Releases

No releases published

Packages

No packages published

Languages