Section 038 - Apache Spark using Python - Development Life Cycle using Python

15 videos • 1,736 views • by itversity In this module we will learn about the development of data engineering pipeline using spark. Below are the following topics covered in this module. Setup Virtual Environment and Install Pyspark Getting Started with Pycharm Passing Run Time Arguments Accessing OS Environment Variables Getting Started with Spark Create Function for Spark Session Setup Sample Data Read Data from Files Process Data using Spark APIs Write Data to Files Validating Writing Data to Files Productionizing the Code Setting up Data for Production Validation Running Application using YARN Detailed Validation of the Application