Spark Context & Spark Physical Structure & Stage Execution

  Data Store in 2 objects


1.RDD (Resilient distributed data)--->> Unstructured data store --->> Immutable
2.Data frame -->> structured data store(schema, tables)


Before RDD --->> After DATA FRAME --->>> After DATA SETS
  • RDD Using in --->> R, py-spark, Scala, java
  • Data frame Using in --->> R, py-spark, Scala, java
  • Data Sets using in--->> Scala/ java

Spark Context:

Spark Context is the primary point of entry for Spark capabilities. A Spark Context represents a Spark cluster's connection that is useful in building RDDs, accumulators, and broadcast variables on the cluster.

Available as ---->> "SC"



Spark
Physical Structure Front End










Spark Physical Structure Back End











Stage Execution