Spark Installation and configuration

Interaction with driver program

  • spark context
  • spark session
  • sql context
  • hive context

Installation Path:

https://sparkbyexamples.com/spark/apache-spark-installation-on-windows/


After In Jupyter
!pip install pyspark
!pip install findspark


#fetching java env variables from local system
import findspark
findspark.init()


#create spark-session
from pyspark.sql import SparkSession
spark=SparkSession.builder\
.master("local")\
.appName("Localspark")\
.getOrCreate()


Note:
  • master() – If you are running it on the cluster you need to use your master name as an argument to master().
  • appName() – Used to set your application name.
  • getOrCreate() – This returns a SparkSession object if already exists, and creates a new one if not exist.

#for spark context creation
from pyspark import SparkContext
sc=SparkContext.getOrCreate()
sc


#get all configuration
sc.getConf().getAll()


#for stopping
sc.stop()
spark.stop()


Upgrade Python

C:\Users\mhtpr\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip




=================================