diff --git a/README b/README new file mode 100644 index 00000000..ec3a6d2f --- /dev/null +++ b/README @@ -0,0 +1,27 @@ +# Translation Delivery Time CLI + +This command line application calculates the moving average delivery time of translation events based on a specified window size. + +## Requirements + +- Python 3.7.x or higher + +## Installation + +1. Clone or download the repository to your local machine. +2. Ensure you have Python installed. + +## Usage + +Run the application from the command line with the following command: +python unbabel_cli.py --input_file events.json --window_size 10 + + +Replace `events.json` with the path to your input JSON file containing translation events and `10` with your desired window size. + +## Testing + +To test the application, you can run the unit tests provided in the `tests-unbabel-cli.py` file. Use the following command: +python -m unittest tests_unbabel_cli.py + +DISCLAIMER: Unfortunatly the output is not as expected, the code is what I could do with the time I had \ No newline at end of file diff --git a/README.md b/README.md deleted file mode 100644 index e326eb60..00000000 --- a/README.md +++ /dev/null @@ -1,86 +0,0 @@ -# Backend Engineering Challenge - - -Welcome to our Engineering Challenge repository 🖖 - -If you found this repository it probably means that you are participating in our recruitment process. Thank you for your time and energy. If that's not the case please take a look at our [openings](https://unbabel.com/careers/) and apply! - -Please fork this repo before you start working on the challenge, read it careful and take your time and think about the solution. Also, please fork this repository because we will evaluate the code on the fork. - -This is an opportunity for us both to work together and get to know each other in a more technical way. If you have any questions please open and issue and we'll reach out to help. - -Good luck! - -## Challenge Scenario - -At Unbabel we deal with a lot of translation data. One of the metrics we use for our clients' SLAs is the delivery time of a translation. - -In the context of this problem, and to keep things simple, our translation flow is going to be modeled as only one event. - -### *translation_delivered* - -Example: - -```json -{ - "timestamp": "2018-12-26 18:12:19.903159", - "translation_id": "5aa5b2f39f7254a75aa4", - "source_language": "en", - "target_language": "fr", - "client_name": "airliberty", - "event_name": "translation_delivered", - "duration": 20, - "nr_words": 100 -} -``` - -## Challenge Objective - -Your mission is to build a simple command line application that parses a stream of events and produces an aggregated output. In this case, we're interested in calculating, for every minute, a moving average of the translation delivery time for the last X minutes. - -If we want to count, for each minute, the moving average delivery time of all translations for the past 10 minutes we would call your application like (feel free to name it anything you like!). - - unbabel_cli --input_file events.json --window_size 10 - -The input file format would be something like: - - {"timestamp": "2018-12-26 18:11:08.509654","translation_id": "5aa5b2f39f7254a75aa5","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 20} - {"timestamp": "2018-12-26 18:15:19.903159","translation_id": "5aa5b2f39f7254a75aa4","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 31} - {"timestamp": "2018-12-26 18:23:19.903159","translation_id": "5aa5b2f39f7254a75bb3","source_language": "en","target_language": "fr","client_name": "taxi-eats","event_name": "translation_delivered","nr_words": 100, "duration": 54} - -Assume that the lines in the input are ordered by the `timestamp` key, from lower (oldest) to higher values, just like in the example input above. - -The output file would be something in the following format. - -``` -{"date": "2018-12-26 18:11:00", "average_delivery_time": 0} -{"date": "2018-12-26 18:12:00", "average_delivery_time": 20} -{"date": "2018-12-26 18:13:00", "average_delivery_time": 20} -{"date": "2018-12-26 18:14:00", "average_delivery_time": 20} -{"date": "2018-12-26 18:15:00", "average_delivery_time": 20} -{"date": "2018-12-26 18:16:00", "average_delivery_time": 25.5} -{"date": "2018-12-26 18:17:00", "average_delivery_time": 25.5} -{"date": "2018-12-26 18:18:00", "average_delivery_time": 25.5} -{"date": "2018-12-26 18:19:00", "average_delivery_time": 25.5} -{"date": "2018-12-26 18:20:00", "average_delivery_time": 25.5} -{"date": "2018-12-26 18:21:00", "average_delivery_time": 25.5} -{"date": "2018-12-26 18:22:00", "average_delivery_time": 31} -{"date": "2018-12-26 18:23:00", "average_delivery_time": 31} -{"date": "2018-12-26 18:24:00", "average_delivery_time": 42.5} -``` - -#### Notes - -Before jumping right into implementation we advise you to think about the solution first. We will evaluate, not only if your solution works but also the following aspects: - -+ Simple and easy to read code. Remember that [simple is not easy](https://www.infoq.com/presentations/Simple-Made-Easy) -+ Comment your code. The easier it is to understand the complex parts, the faster and more positive the feedback will be -+ Consider the optimizations you can do, given the order of the input lines -+ Include a README.md that briefly describes how to build and run your code, as well as how to **test it** -+ Be consistent in your code. - -Feel free to, in your solution, include some your considerations while doing this challenge. We want you to solve this challenge in the language you feel most comfortable with. Our machines run Python (3.7.x or higher) or Go (1.16.x or higher). If you are thinking of using any other programming language please reach out to us first 🙏. - -Also, if you have any problem please **open an issue**. - -Good luck and may the force be with you diff --git a/events.json b/events.json new file mode 100644 index 00000000..881e5706 --- /dev/null +++ b/events.json @@ -0,0 +1,3 @@ +{"timestamp": "2018-12-26 18:11:08.509654","translation_id": "5aa5b2f39f7254a75aa5","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 20} +{"timestamp": "2018-12-26 18:15:19.903159","translation_id": "5aa5b2f39f7254a75aa4","source_language": "en","target_language": "fr","client_name": "airliberty","event_name": "translation_delivered","nr_words": 30, "duration": 31} +{"timestamp": "2018-12-26 18:23:19.903159","translation_id": "5aa5b2f39f7254a75bb3","source_language": "en","target_language": "fr","client_name": "taxi-eats","event_name": "translation_delivered","nr_words": 100, "duration": 54} \ No newline at end of file diff --git a/output.json b/output.json new file mode 100644 index 00000000..e0a6bd40 --- /dev/null +++ b/output.json @@ -0,0 +1,14 @@ +{"date": "2018-12-26 18:11:00", "average_delivery_time": 0} +{"date": "2018-12-26 18:12:00", "average_delivery_time": 20.0} +{"date": "2018-12-26 18:13:00", "average_delivery_time": 20.0} +{"date": "2018-12-26 18:14:00", "average_delivery_time": 20.0} +{"date": "2018-12-26 18:15:00", "average_delivery_time": 20.0} +{"date": "2018-12-26 18:16:00", "average_delivery_time": 25.5} +{"date": "2018-12-26 18:17:00", "average_delivery_time": 25.5} +{"date": "2018-12-26 18:18:00", "average_delivery_time": 25.5} +{"date": "2018-12-26 18:19:00", "average_delivery_time": 25.5} +{"date": "2018-12-26 18:20:00", "average_delivery_time": 25.5} +{"date": "2018-12-26 18:21:00", "average_delivery_time": 31.0} +{"date": "2018-12-26 18:22:00", "average_delivery_time": 31.0} +{"date": "2018-12-26 18:23:00", "average_delivery_time": 31.0} +{"date": "2018-12-26 18:24:00", "average_delivery_time": 42.5} diff --git a/tests_unbabel_cli.py b/tests_unbabel_cli.py new file mode 100644 index 00000000..d4c8338e --- /dev/null +++ b/tests_unbabel_cli.py @@ -0,0 +1,46 @@ +import unittest +import json +from datetime import datetime +from unbabel_cli import parse_event, moving_average + +class TestMovingAverages(unittest.TestCase): + def setUp(self): + self.events = [ + {"timestamp": "2018-12-26 18:11:08.509654", "duration": 20}, + {"timestamp": "2018-12-26 18:15:19.903159", "duration": 31}, + {"timestamp": "2018-12-26 18:23:19.903159", "duration": 54} + ] + self.parsed_events = [ + {"timestamp": datetime(2018, 12, 26, 18, 11, 8, 509654), "duration": 20}, + {"timestamp": datetime(2018, 12, 26, 18, 15, 19, 903159), "duration": 31}, + {"timestamp": datetime(2018, 12, 26, 18, 23, 19, 903159), "duration": 54} + ] + + def test_parse_event(self): + parsed = [parse_event(event) for event in self.events] + self.assertEqual(parsed, self.parsed_events) + + def test_moving_average(self): + results = moving_average(self.parsed_events, 10) + expected_results = [ + {"date": "2018-12-26 18:11:00", "average_delivery_time": 0}, + {"date": "2018-12-26 18:12:00", "average_delivery_time": 20}, + {"date": "2018-12-26 18:13:00", "average_delivery_time": 20}, + {"date": "2018-12-26 18:14:00", "average_delivery_time": 20}, + {"date": "2018-12-26 18:15:00", "average_delivery_time": 20}, + {"date": "2018-12-26 18:16:00", "average_delivery_time": 25.5}, + {"date": "2018-12-26 18:17:00", "average_delivery_time": 25.5}, + {"date": "2018-12-26 18:18:00", "average_delivery_time": 25.5}, + {"date": "2018-12-26 18:19:00", "average_delivery_time": 25.5}, + {"date": "2018-12-26 18:20:00", "average_delivery_time": 25.5}, + {"date": "2018-12-26 18:21:00", "average_delivery_time": 25.5}, + {"date": "2018-12-26 18:22:00", "average_delivery_time": 31}, + {"date": "2018-12-26 18:23:00", "average_delivery_time": 31}, + {"date": "2018-12-26 18:24:00", "average_delivery_time": 42.5} + ] + self.assertEqual(results, expected_results) + + # Add other tests here if needed + +if __name__ == '__main__': + unittest.main() diff --git a/unbabel_cli.py b/unbabel_cli.py new file mode 100644 index 00000000..8f728e9a --- /dev/null +++ b/unbabel_cli.py @@ -0,0 +1,88 @@ +import json +import argparse +from datetime import datetime, timedelta + +# Function to parse each event from JSON format +def parse_event(event): + return { + 'timestamp': datetime.strptime(event['timestamp'], '%Y-%m-%d %H:%M:%S.%f'), + 'duration': event['duration'] + } + +# Function to calculate the moving average delivery time +def moving_average(events, window_size): + # Sort events based on timestamp + events = sorted(events, key=lambda x: x['timestamp']) + # Define start and end times of the event stream + start_time = events[0]['timestamp'].replace(second=0, microsecond=0) + end_time = events[-1]['timestamp'].replace(second=0, microsecond=0) + timedelta(minutes=1) + # Define the window size + window = timedelta(minutes=window_size) + result = [] + + # Loop through each minute in the event stream + current_time = start_time + while current_time <= end_time: + # Calculate the start of the current window + window_start = current_time - window + timedelta(minutes=1) + # Extract events within the current window + window_events = [event['duration'] for event in events if window_start <= event['timestamp'] <= current_time] + # Calculate the average delivery time for the events within the window + if window_events: + avg_duration = sum(window_events) / len(window_events) + else: + avg_duration = 0 + # Append the result to the output list + result.append({'date': current_time.strftime('%Y-%m-%d %H:%M:%S'), 'average_delivery_time': avg_duration}) + # Move to the next minute + current_time += timedelta(minutes=1) + + return result + + + +# Main function to read input, calculate moving average, and write output +def main(input_file, window_size): + try: + # Read events from the input file + events = [] + with open(input_file, 'r') as file: + for line in file: + try: + # Parse each line as JSON and append to events list + event = json.loads(line.strip()) + events.append(parse_event(event)) + except json.JSONDecodeError as e: + raise ValueError(f"Invalid JSON in input file: {e}") + + # Check if events list is empty + if not events: + raise ValueError("Input file is empty") + + # Check if window size is valid + if not isinstance(window_size, int) or window_size <= 0: + raise ValueError("Window size must be a positive integer") + + # Calculate moving average delivery time + averages = moving_average(events, window_size) + + # Write output to file + with open('output.json', 'w') as file: + for avg in averages: + file.write(json.dumps(avg) + '\n') + + except FileNotFoundError: + print(f"Error: Input file '{input_file}' not found") + except ValueError as e: + print(f"Error: {e}") + + +# Command line argument parsing +if __name__ == '__main__': + parser = argparse.ArgumentParser(description='Process translation delivery events.') + parser.add_argument('--input_file', type=str, required=True, help='Path to the input JSON file with events.') + parser.add_argument('--window_size', type=int, required=True, help='Size of the moving average window in minutes.') + + args = parser.parse_args() + # Call main function with provided arguments + main(args.input_file, args.window_size)