Call Center Agent Malpractice Detection (CCAMD)

This repo contains supplementary material for our manuscript called A Novel Semi-supervised Framework for Call Center Agent Malpractice Detection via Neural Feature Learning to be published in Expert Systems with Applications Journal.

Abstract

The corresponding work presents a practical solution to the problem of call center agent malpractice. A semi-supervised framework comprising of non-linear power transformation, neural feature learning with k-means and agglomerative clustering is outlined. We put these building blocks together and tune the parameters so that the best performance was obtained. The data used in the experiments is obtained from our in-house call center. It is made up of recorded agent-customer conversations which have been annotated using a convolutional neural network (CNN) based segmenter. The methods provided a means of tuning the parameters of the neural network to achieve a desirable result. We show that, using our proposed framework, it is possible to significantly reduce the malpractice classification error of a clustering model (either k-means or agglomerative). By presenting the amount of silence per call as a key performance indicator, we show that the proposed system has increased the efficiency of quality control managers thus enhancing agents performance at our call center since deployment.

Data

We share the data we used in the study. The data is generated by using a CNN-based audio segmenter tool called inaSpeechSegmenter. The tool segments a given audio file into four basic segment types namely speech, silence, noise and music. It is possible further detect gender (i.e. etiher male or female) in speech segments. But in this study we did not consider gender.

We segmented numerous call center recordings by using this tool. Our data comprises only percentages of speech, silence, noise and music segments in a given call center recording. Since our call center recordings are confidential material we only share the very values we used to train our proposed machine learning architectures. But we share our script which is capable of converting a given Global System for Mobile Audio (GSM) file to segment information and hence segment percentages. We also share a sample .gsm file in which two company personnel mimics a marketing call between a customer and a customer representative. The file reflects vast majority of the features of a conventional phone call.

Citation

@article{OzanIheme:2022,
  title={A Novel Semi-supervised Framework for Call Center Agent Malpractice Detection via Neural Feature Learning},
  author={Ozan, Şükrü and Iheme, Leonardo O.},
  journal={*****************},
  volume={**********},
  pages={************},
  year={*********},
  publisher={************}
}

File List

training.csv under data folder is the training data we used for training our proposed frameworks. Each row represents one training sample. The file comprises 5 columns named record, speech, silence, noise, and music, respectively. These columns hold the corresponding percentage values which were calculated by preprocessing call center recordings using inaSpeechSegmenter.
- record : This column holds the index values of the data samples.
- speech : This column holds the percentages of speech segments in data samples.
- silence : This column holds the percentages of silence segments in data samples.
- noise : This column holds the percentages of noise segments in data samples.
- music : This column holds the percentages of music segments in data samples.
validation.csv under data folder is the validation data we used for testing our proposed frameworks. This file does have an additional malpractice column which represents a label for each data sample in the data set. Each row represents one validation sample. The file comprises of 6 columns named record, speech, silence, noise, music, and malpractice respectively. These columns hold the corresponding percentage values which were calculated by preprocessing call center recordings using inaSpeechSegmenter.
- record : This column holds the index values of the data samples.
- speech : This column holds the percentages of speech segments in data samples.
- silence : This column holds the percentages of silence segments in data samples.
- noise : This column holds the percentages of noise segments in data samples.
- music : This column holds the percentages of music segments in data samples.
- malpractice : This column holds a boolean flag value as TRUE or FALSE. If the value is TRUE the data sample is considered to be a malpractice.
20220411_141801.gsm is a hypothetical call center recording between two people, one acting as a customer and the other as a customer representative.
extract_features.py this file generates labels for an input audio file by using inaSpeechSegmenter.
main_script.py the main results in our manuscript can be reproduced with this script.

Name		Name	Last commit message	Last commit date
Latest commit History 34 Commits
code		code
data		data
results		results
.gitattributes		.gitattributes
.gitignore		.gitignore
20220411_141801.gsm		20220411_141801.gsm
LICENSE		LICENSE
README.md		README.md
extract_features.py		extract_features.py
main_script.py		main_script.py

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

Call Center Agent Malpractice Detection (CCAMD)

Abstract

Data

Citation

File List

About

Releases

Packages

Contributors 2

Languages

License

adresgezgini/CCAMD

Folders and files

Latest commit

History

Repository files navigation

Call Center Agent Malpractice Detection (CCAMD)

Abstract

Data

Citation

File List

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Contributors 2

Languages

Packages