top of page
  • Twitter Social Icon
  • LinkedIn Social Icon
  • Facebook Social Icon

Mainframe (Cobol data) source in Spark

  • Writer: anydataflow
    anydataflow
  • Oct 24, 2019
  • 1 min read

Updated: Jul 20, 2021

As Apache Spark became the first choice for data processing and analytics because it is scalable, easy to code, support lot of file formats like avro, parquet, orc, text, json...


Mainframe mainly work around cobol and store data in cobol file format. So reading cobol file format in spark is again challenge.


Thanks to the team who developing "Cobrix", it is available on github which you can build for your spark version and use to handle MainFrame COBOL date efficiently.


Documentation is good so very easy to use it:

1. spark must be installed and running

2. java and maven must be installed

wget https://github.com/AbsaOSS/cobrix/archive/v0.5.0.zip


unzip v0.5.0.zip

cd cobrix-0.5.0

mvn package -DskipTests



spark-shell --jars /root/cobrix-0.5.0/examples/spark-cobol-app/target/spark-cobol-app-0.0.1-SNAPSHOT.jar


val df = spark.read.format("za.co.absa.cobrix.spark.cobol.source").option("encoding", "ebcdic").option("generate_record_id", true).option("copybook", "/copybook.txt").load("/datafile-1.bin")

df.write.saveAsTable("mainframe");


df.shoe(5,false)





-Thanks




 
 
 

Comments


SIGN UP AND STAY UPDATED!

Thanks for submitting!

  • Grey Twitter Icon
  • Grey LinkedIn Icon
  • Grey Facebook Icon

© 2023 by Talking Business.  Proudly created with anydataflow.com

bottom of page