This article was originally published in my old blog here
1. Global Managed Table
A managed table is a Spark SQL table for which Spark manages both the data and the metadata. A global managed table is available across all clusters. When you drop the table both data and metadata gets dropped.
dataframe.write.saveAsTable("my_table")
2. Global Unmanaged/External Table
Spark manages the metadata, while you control the data location. As soon as you add ‘path’ option in dataframe writer it will be treated as global external/unmanaged table. When you drop table only metadata gets dropped. A global unmanaged/external table is available across all clusters.
dataframe.write.option('path', "<your-storage-path>").saveAsTable("my_table")
3. Local Table (a.k.a) Temporary Table (a.k.a) Temporary View
Spark session scoped. A local table is not accessible from other clusters (or if using databricks notebook not in other notebooks as well) and is not registered in the metastore.
dataframe.createOrReplaceTempView()
4. Global Temporary View
Spark application scoped, global temporary views are tied to a system preserved temporary database global_temp. This view can be shared across different spark sessions (or if using databricks notebooks, then shared across notebooks).
dataframe.createOrReplaceGlobalTempView("my_global_view")
can be accessed as,
spark.read.table("global_temp.my_global_view")
5. Global Permanent View
Persist a dataframe as permanent view. The view definition is recorded in the underlying metastore. You can only create permanent view on global managed table or global unmanaged table. Not allowed to create a permanent view on top of any temporary views or dataframe. Note: Permanent views are only available in SQL API — not available in dataframe API
spark.sql("CREATE VIEW permanent_view AS SELECT * FROM table")
There isn’t a function like dataframe.createOrReplacePermanentView()
Reference
Is it possible to create persistent view in Spark? - stackoverflow