Interaction with driver program
- spark context
- spark session
- sql context
- hive context
Installation Path:
https://sparkbyexamples.com/spark/apache-spark-installation-on-windows/
After In Jupyter
!pip install pyspark
!pip install findspark
!pip install findspark
#fetching java env variables from local system
import findspark
findspark.init()
import findspark
findspark.init()
#create spark-session
from pyspark.sql import SparkSession
spark=SparkSession.builder\
.master("local")\
.appName("Localspark")\
.getOrCreate()
from pyspark.sql import SparkSession
spark=SparkSession.builder\
.master("local")\
.appName("Localspark")\
.getOrCreate()
Note:
master()
– If you are running it on the cluster you need to use your master name as an argument to master().appName()
– Used to set your application name.getOrCreate()
– This returns a SparkSession object if already exists, and creates a new one if not exist.
#for spark context creation
from pyspark import SparkContext
sc=SparkContext.getOrCreate()
sc
#get all configuration
sc.getConf().getAll()
#for stopping
sc.stop()
spark.stop()
Upgrade Python
C:\Users\mhtpr\AppData\Local\Programs\Python\Python310\python.exe -m pip install --upgrade pip
=================================