Skip to content

Latest commit

 

History

History
46 lines (38 loc) · 10.5 KB

Readme.md

File metadata and controls

46 lines (38 loc) · 10.5 KB

Layers in Big data architecture

Layer Description Remarks
Data Ingestion/Streaming Bring your data into your data platform.
Data Processing Create your data processing pipelines. Apache Spark vs MapReduce vs Flink vs Storm vs Kafka Streams
Data Cataloging Store your metadata.
Data Storage Store your structured and unstructured data. Data warehouses vs lake
Data Consumption Enable your user personas for purpose-built analytics and machine learning.
Security and governance Protect your data across the layers and data access management.

General Use Cases of Big Data Processing

Use Case Processing Type Remarks
⭐ Fraud Detection Stream Processing Fraud detection systems need to determine if the usage patterns of a credit card have unexpectedly changed, and block the card if it is likely to have been stolen.
⭐ Financial Stock Market Stream Processing Trading systems need to examine price changes in a financial market and execute trades according to specified rules.
Log analytics Stream Processing Log files generated by server or applications
User Events on app like ClickStreams Stream Processing Customer interaction data from a web application or mobile application
Manufacturing Systems Stream Processing Manufacturing systems need to monitor the status of machines in a factory, and quickly identify the problem if there is a malfunction.
Military Systems Stream Processing Military and intelligence systems need to track the activities of a potential aggregation, and raise the alarm if there are signs of an attack.
Stream Analytics Stream Processing Measuring the rate of some type of event (how often it occurs per time interval)
- Calculating the rolling average of a value over some time period
- Comparing current statistics to previous time intervals (e.g. to detect trends or to alert on metrics that are unusually high or low compared to the same time last week).
⭐ Data from IoT sensors Stream Processing Internet of Things (IoT), ad tech, gaming etc.
Payment Processing Systems Stream Processing
⭐ ETL Pipeline Batch Processing Read more
Building indexes for search DBs Batch Processing Apache Hadoop can be used to build indexes for Lucene/Solr.
Recommendation System Batch Processing 50-100 MapReduce jobs are used for recommendation system in Google.
Ranking System Batch Processing
Machine learning systems Batch Processing Example - Classifiers (spam filters, anomaly detection, image recognition etc.)

Various Services in Data layers

How can we define big data?

Remarks
Data Volume 100s of TB to PB-scale and higher
Architecture Parallel Processing often involved using Hadoop, Spark, data warehouse platforms.
Necessity Processing of data sets too large for operational databases
Nominally Big data tech sometimes imposed on small data problems

Read more