Layer | Description | Remarks |
---|---|---|
Data Ingestion/Streaming | Bring your data into your data platform. | |
Data Processing | Create your data processing pipelines. | Apache Spark vs MapReduce vs Flink vs Storm vs Kafka Streams |
Data Cataloging | Store your metadata. | |
Data Storage | Store your structured and unstructured data. | Data warehouses vs lake |
Data Consumption | Enable your user personas for purpose-built analytics and machine learning. | |
Security and governance | Protect your data across the layers and data access management. |
Use Case | Processing Type | Remarks |
---|---|---|
⭐ Fraud Detection | Stream Processing | Fraud detection systems need to determine if the usage patterns of a credit card have unexpectedly changed, and block the card if it is likely to have been stolen. |
⭐ Financial Stock Market | Stream Processing | Trading systems need to examine price changes in a financial market and execute trades according to specified rules. |
Log analytics | Stream Processing | Log files generated by server or applications |
User Events on app like ClickStreams | Stream Processing | Customer interaction data from a web application or mobile application |
Manufacturing Systems | Stream Processing | Manufacturing systems need to monitor the status of machines in a factory, and quickly identify the problem if there is a malfunction. |
Military Systems | Stream Processing | Military and intelligence systems need to track the activities of a potential aggregation, and raise the alarm if there are signs of an attack. |
Stream Analytics | Stream Processing | Measuring the rate of some type of event (how often it occurs per time interval) - Calculating the rolling average of a value over some time period - Comparing current statistics to previous time intervals (e.g. to detect trends or to alert on metrics that are unusually high or low compared to the same time last week). |
⭐ Data from IoT sensors | Stream Processing | Internet of Things (IoT), ad tech, gaming etc. |
Payment Processing Systems | Stream Processing | |
⭐ ETL Pipeline | Batch Processing | Read more |
Building indexes for search DBs | Batch Processing | Apache Hadoop can be used to build indexes for Lucene/Solr. |
Recommendation System | Batch Processing | 50-100 MapReduce jobs are used for recommendation system in Google. |
Ranking System | Batch Processing | |
Machine learning systems | Batch Processing | Example - Classifiers (spam filters, anomaly detection, image recognition etc.) |
Remarks | |
---|---|
Data Volume | 100s of TB to PB-scale and higher |
Architecture | Parallel Processing often involved using Hadoop, Spark, data warehouse platforms. |
Necessity | Processing of data sets too large for operational databases |
Nominally | Big data tech sometimes imposed on small data problems |