Below are some json profiles representing fictional customers from an ecommerce company. The profiles contain information about the customer, their orders, their transactions, what payment methods they used and whether the customer is fraudulent or not. Your task is to:
- Transform the json profiles into a dataframe of feature vectors.
- Provide exploratory analysis of the dataset, and to summarise and explain the key trends in the data, explaining which factors appear to be most important in predicting fraud.
- Construct a model to predict if a customer is fraudulent based on their profile.
- Report on the models success and show what features are most important in that model.
Please use Python for this exercise. You can use whatever external software libraries you think are appropriate.
We're looking for
- Readable code
- High quality analysis
- Clear explanations of the analysis and conclusions
Please don't spend more than 3-4 hours on this task. You may find that there are more aspects of the data that you can realistically investigate in that time, and that's fine. If that is the case, please just describe what your next steps might be if there were more time allocated to the task.