GitHub - vishruti-patel/engagement-analysis: Performing user engagement analysis on given dataset of user activity and recommendations

User Engagement

Introduction

We are provided with 2 datasets activity.csv and recommendations. Activity depicts the click stream and engagement of a users on a platform where user sessions are run by moderators. Recommendations depicts creative recommendations given by moderators to users to boost engagement and satisfaction. We also have moderator who is responsponsible to manage user sessions like 1:1s and offline feedbacks. However the moderators dataset is already aggregated.

Datasets

User Activity Data:

Implies active user participation via metrics like average session length, the messages sent, and resources clicked per session.
Timestamp data reveals the preferred session times
Feedback ratings provide user satisfaction level.

Moderator Performance Data:

Provides the moderator effectiveness information by tracking the key metrics like number of sessions moderated, the average response time and the user satisfaction scores.

Recommendation Data:

Provides us with the recommendation id which is unique to each user based on the recommendation type suggested to the user during the session.
Feedback score and click through rates(CTR) could be useful to identify the interest of the user in types of recommendations.

Observations and Patterns

The provided datasets simulate business use cases which require analytical attention for achieving favourable results. During the initial stage of exploration, I made the following observations on datasets.

User Activity Data:

user_id column is unique in user_activity_data.csv.
We have one-to-many mapping for a user_id in user_activity_date.csv <-> recommendations.csv.
Timestamp column in user activity data spans around 1 year (01/2024 to 01/2025)
Feedback Rating range from 1 to 6 and Resource clicked range from 0 to 6.
Messages Sent range from 1 to 85 and Session length 5 to 141
Longer sessions with higher feedback ratings imply higher user’s satisfaction whereas longer sessions - with lower feedback ratings imply user’s dissatisfaction.
The number of messages sent during the session imply the user's active engagement.

Moderator Performance Data:

Avg Response time ranges between 2 to 30.71 (assumption: seconds)
Chat sessions range between 17 to 493 (assumption: minutes).
Satisfaction score falls between the range 1 to 5.
The higher average response time could be correlated with the lower satisfaction score of the user
Datasets do not associate a moderator to any user or recommendation session as there is no foreign key.

Recommendation Data:

Feedback score range between 1 to 5 and click through rate between 0.1 to 1.
Podcast, Blog. Videos are distinct recommendation types.
Higher the number of resources clicked per session implies higher user engagement.

Analysis

User Engagement

Aggregated recommendations dataset by user_id to gather avg_feedback_score, []recommendation_types and avg_click_through_rate.
The result could be used then to join with the user_activity dataset on the user_id column.
We will be performing left-join as a recommendation may or may not exist for user activity / session.
click_through_rate in recommendations dataset is ambiguous in-terms of how it’s calculated, hence I am defining a new custom metric - PER_SESSION_CTR.
Further, we are also defining a new metric, USER_ENGAGEMENT_SCORE by taking a weighted sum for all contributing features from the joined dataset.

PER SESSION_CTR

Defined as (recources_clicked / recommendations_given). After joining the two datasets, we’ll have,

Sessions with recommendations:
Sessions without recommendations:

We’ll need to Normalize Engagement Across Both Scenarios. When recommendations are absent, resources_clicked alone isn't a perfect metric because it doesn't take into account whether users have the opportunity to interact with recommendations. In this case, we need to scale the resources_clicked metric relative to the session length (or a better alternative feature) to create a normalized measure of engagement.

USER_ENGAGEMENT_SCORE

Finally, we will compute the weighted sub of all the contributing features as user_engagement_score.

--- User engagement
with recommendations_cte as (
    select user_id,
           array_agg(recommendation_type) as recommendation_types,
           round(avg(click_through_rate)::numeric, 2) as avg_recommendation_ctr,
               round(avg(feedback_score)::numeric, 2) as avg_feedback_score,
           count(recommendation_type) as recommendation_counts
    from recommendations
    group by user_id
),

user_ranking as (
    select ua.user_id,
           cast(ua.timestamp as date) as date,
           ua.session_id,
           ua.session_length,
           ua.messages_sent,
           ua.feedback_rating,
           ua.resources_clicked,
           rc.recommendation_types,
           avg_recommendation_ctr,
           -- Calculate per session CTR, handle cases with clicks but no recommendations
           CASE WHEN rc.recommendation_types is null or array_length(rc.recommendation_types, 1) = 0 THEN ua.resources_clicked / ua.session_length -- No recommendations, CTR = 0
           WHEN ua.resources_clicked = 0 THEN 0  -- No clicks, CTR = 0
                ELSE (ua.resources_clicked / array_length(rc.recommendation_types, 1))
           END AS per_session_ctr
    from user_activity ua
    left join recommendations_cte rc
        on ua.user_id = rc.user_id
    order by rc.user_id
)

select *,
      round(
       (session_length::numeric * 0.2) +
       (messages_sent::numeric * 0.2) +
       (feedback_rating::numeric * 0.2) +
       (resources_clicked::numeric * 0.2) +
       (per_session_ctr::numeric * 0.2), 2
      ) as user_engagement_score
from user_ranking;

Moderator Performance Score

For showcasing top moderators, I have defined moderator_performance_score as weighted sum of chat_sessions_moderated, avg_response_time and user_satisfaction_score

--- top performing moderators
select *
from (select *,
             round(
                     (chat_sessions_moderated::numeric * 0.33) +
                     (avg_response_time::numeric * 0.33) +
                     (user_satisfaction_score::numeric * 0.33), 2
             ) as moderator_performance_score
      from moderator_performance
      )t
order by moderator_performance_score desc;

Recommendations

Reviewed aggregated insights per recommendation_type. From User Engagement, we’ll generate additional insights and contributions of recommendation.

--- recommendation aggregation
select recommendation_type,
       count(recommendation_type) as total_recommended,
       round(avg(feedback_score::numeric), 2) as avg_feedback_score,
       round(avg(click_through_rate::numeric)) as avg_ctr
from recommendations
group by recommendation_type;

Execution of Analysis

I have provided supportiv_analysis.py file with requirements.txt.
I am using Python Version: 3.11.2
When executing a python file, create a new folder data_set as the Python program uses data_set/<file_name>.csv for reading datasets into pandas dataframe.
Install dependencies using: $> pip install -r requirements.txt
Execute Data Analysis using: $> python supportive_analysis.py
The execution will automatically create new folder: user_engagement_visualizations

Visualizations

In this section, I will put some limelight on some of the analytical insights we uncovered by performing analysis described in the earlier section.

User engagement score trend week-by-week

During August the user engagement dropped significantly.
December and January are the highest user engagement months.

Average User engagement score by Recommendation Type:

Recommendations of [podcast, blog, video] yield highest user engagement.
The frequency / Density of recommending all three together is less as shown on boxplot.

Correlation Heatmap of engagement metrics

User engagement is highly correlated with session_length and messages_sent.

CTR distribution by feedback rating:

Feedback ratings of 2 and 3 are densely populated for per_session_ctr of range 0.5 to 1.5.

Distribution of user engagement score

User engagement score between 10 and 30 is most common for platform users.
Important to provide attention to users whose engagement score ranges between 5-10.

Messages sent vs Resources clicked by Recommendation Type

Pair Plot Of all correlated features

Conclusion

In conclusion, the user engagement is a crucial business metric which showcases how users are reacting to different experimentations, features, experimentations etc. It’s a key indicator in determining customer success. Every business aims to collect as much data as possible to capture contributing factors to user engagement and customer satisfaction.

Name		Name	Last commit message	Last commit date
Latest commit History 1 Commit
data_set		data_set
user_engagement_visualizations		user_engagement_visualizations
.gitignore		.gitignore
data_analysis.py		data_analysis.py
readme.md		readme.md
requirements.txt		requirements.txt

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

User Engagement

Introduction

Datasets

User Activity Data:

Moderator Performance Data:

Recommendation Data:

Observations and Patterns

User Activity Data:

Moderator Performance Data:

Recommendation Data:

Analysis

User Engagement

PER SESSION_CTR

USER_ENGAGEMENT_SCORE

Moderator Performance Score

Recommendations

Execution of Analysis

Visualizations

User engagement score trend week-by-week

Average User engagement score by Recommendation Type:

Correlation Heatmap of engagement metrics

CTR distribution by feedback rating:

Distribution of user engagement score

Messages sent vs Resources clicked by Recommendation Type

Pair Plot Of all correlated features

Conclusion

About

Releases

Packages

Languages

vishruti-patel/engagement-analysis

Folders and files

Latest commit

History

Repository files navigation

User Engagement

Introduction

Datasets

User Activity Data:

Moderator Performance Data:

Recommendation Data:

Observations and Patterns

User Activity Data:

Moderator Performance Data:

Recommendation Data:

Analysis

User Engagement

PER SESSION_CTR

USER_ENGAGEMENT_SCORE

Moderator Performance Score

Recommendations

Execution of Analysis

Visualizations

User engagement score trend week-by-week

Average User engagement score by Recommendation Type:

Correlation Heatmap of engagement metrics

CTR distribution by feedback rating:

Distribution of user engagement score

Messages sent vs Resources clicked by Recommendation Type

Pair Plot Of all correlated features

Conclusion

About

Resources

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages