top of page
  • Twitter Social Icon
  • LinkedIn Social Icon
  • Facebook Social Icon

A Simple Big Data pipeline with MySql

  • Writer: anydataflow
    anydataflow
  • Oct 19, 2019
  • 1 min read

Updated: Jan 22, 2020

Efficient implementation of data pipeline increase performance of Data warehouse and ease to generate quality KPI.


As you already know bigdata is very important these days to analyse huge data in less time and with less effort. We are discussing about a simple use case which can be seen around in real life, which is connecting RDBMS with Hadoop/Spark ecosystem and show the analytical dashboard on BI tool.


Architecture Diagram

This architecture looks simple in first look but when you go in deep while implementation then you need to take care of multiple things like connector, code quality, data type conversion, serialization, spark optimization, sqoop optimization, hive storage optimization... many more.


Component Used


  1. RDBMS

  2. Hadoop cluser

  3. Spark

  4. Sqoop

  5. Hive

  6. Business Intelligence Tool

It can be any BI tool which can connect to hive/spark by using jdbc/thrift connction. Also we can attach any MPP engine on top hive to query in ms.. we have used prestodb to get speed 10x faster then spark.


Thanks for vising...


 
 
 

Comments


SIGN UP AND STAY UPDATED!

Thanks for submitting!

  • Grey Twitter Icon
  • Grey LinkedIn Icon
  • Grey Facebook Icon

© 2023 by Talking Business.  Proudly created with anydataflow.com

bottom of page