Mainframe (Cobol data) source in Spark

DKC Career
Oct 24, 2019
1 min read

Updated: Jul 20, 2021

As Apache Spark became the first choice for data processing and analytics because it is scalable, easy to code, support lot of file formats like avro, parquet, orc, text, json...

Mainframe mainly work around cobol and store data in cobol file format. So reading cobol file format in spark is again challenge.

Thanks to the team who developing "Cobrix", it is available on github which you can build for your spark version and use to handle MainFrame COBOL date efficiently.

Documentation is good so very easy to use it:

1. spark must be installed and running

2. java and maven must be installed

wget https://github.com/AbsaOSS/cobrix/archive/v0.5.0.zip

unzip v0.5.0.zip

cd cobrix-0.5.0

mvn package -DskipTests

spark-shell --jars /root/cobrix-0.5.0/examples/spark-cobol-app/target/spark-cobol-app-0.0.1-SNAPSHOT.jar

val df = spark.read.format("za.co.absa.cobrix.spark.cobol.source").option("encoding", "ebcdic").option("generate_record_id", true).option("copybook", "/copybook.txt").load("/datafile-1.bin")

df.write.saveAsTable("mainframe");

df.shoe(5,false)

-Thanks

AnyDataFlow

Mainframe (Cobol data) source in Spark

Recent Posts

Comments

Subscribe to Our Newsletter