This is a python project that cleans site metrics data. The site metrics data is in the format as shown in /resources/test_exc.xlsx file. This application reads the xl file using python pandas library. The data from the metrics file is converted to a dataframe. Cleans it and transforms it in a cleaner format as shown in resources/output_exc.xlsx file
- Remove duplicates
- Convert the metric dates from column to rows using pandas melt function
- Add day of the month column
- clean Date values to format yyyy/mm/dd from pandas date time format
python3 -m venv .venv
source .venv/bin/activate
pip3 install -r requirements.txt
cd src
python3 main.py
go to tests folder from root folder
cd tests
pytest test_data_format_handler.py
- The number of sites can be dynamic
- The metrics can also be dynamic. Once can add more metrics or remove some, the code will still work
- The format of data input should be same as the input file
- One short coming is the data is not accurately sorted based on Site IDs. The site 10 comes next to site 1 instead of site 2. thi is becase site id is a string. This can be handle in a better way.
- Exception handling is not done due to time constraints