A Simple Big Data pipeline with MySql

DKC Career
Oct 19, 2019
1 min read

Updated: Jan 22, 2020

Efficient implementation of data pipeline increase performance of Data warehouse and ease to generate quality KPI.

As you already know bigdata is very important these days to analyse huge data in less time and with less effort. We are discussing about a simple use case which can be seen around in real life, which is connecting RDBMS with Hadoop/Spark ecosystem and show the analytical dashboard on BI tool.

Architecture Diagram

This architecture looks simple in first look but when you go in deep while implementation then you need to take care of multiple things like connector, code quality, data type conversion, serialization, spark optimization, sqoop optimization, hive storage optimization... many more.

Component Used

RDBMS
Hadoop cluser
Spark
Sqoop
Hive
Business Intelligence Tool

It can be any BI tool which can connect to hive/spark by using jdbc/thrift connction. Also we can attach any MPP engine on top hive to query in ms.. we have used prestodb to get speed 10x faster then spark.

Thanks for vising...

AnyDataFlow

A Simple Big Data pipeline with MySql

Recent Posts

Comments

Subscribe to Our Newsletter