top of page

A Simple Big Data pipeline with MySql

Updated: Jan 22, 2020

Efficient implementation of data pipeline increase performance of Data warehouse and ease to generate quality KPI.


As you already know bigdata is very important these days to analyse huge data in less time and with less effort. We are discussing about a simple use case which can be seen around in real life, which is connecting RDBMS with Hadoop/Spark ecosystem and show the analytical dashboard on BI tool.


Architecture Diagram

ree

This architecture looks simple in first look but when you go in deep while implementation then you need to take care of multiple things like connector, code quality, data type conversion, serialization, spark optimization, sqoop optimization, hive storage optimization... many more.


Component Used


  1. RDBMS

  2. Hadoop cluser

  3. Spark

  4. Sqoop

  5. Hive

  6. Business Intelligence Tool

It can be any BI tool which can connect to hive/spark by using jdbc/thrift connction. Also we can attach any MPP engine on top hive to query in ms.. we have used prestodb to get speed 10x faster then spark.


Thanks for vising...


 
 
 

Recent Posts

See All

Comments


bottom of page