Mainframe (Cobol data) source in Spark
- anydataflow
- Oct 24, 2019
- 1 min read
Updated: Jul 20, 2021
As Apache Spark became the first choice for data processing and analytics because it is scalable, easy to code, support lot of file formats like avro, parquet, orc, text, json...
Mainframe mainly work around cobol and store data in cobol file format. So reading cobol file format in spark is again challenge.
Thanks to the team who developing "Cobrix", it is available on github which you can build for your spark version and use to handle MainFrame COBOL date efficiently.
Documentation is good so very easy to use it:
1. spark must be installed and running
2. java and maven must be installed
wget https://github.com/AbsaOSS/cobrix/archive/v0.5.0.zip
unzip v0.5.0.zip
cd cobrix-0.5.0
mvn package -DskipTests
spark-shell --jars /root/cobrix-0.5.0/examples/spark-cobol-app/target/spark-cobol-app-0.0.1-SNAPSHOT.jar
val df = spark.read.format("za.co.absa.cobrix.spark.cobol.source").option("encoding", "ebcdic").option("generate_record_id", true).option("copybook", "/copybook.txt").load("/datafile-1.bin")
df.write.saveAsTable("mainframe");
df.shoe(5,false)
-Thanks
Comments