Global temporary views are introduced in Spark 2.1.0 release. This feature is useful when you want to share data among different sessions and keep alive until your application ends. In Spark SQL, temporary views are session-scoped and will be automatically dropped if the session terminates.
All the global temporary views are tied to a system preserved temporary database global_temp
. The database name is preserved, and thus, users are not allowed create/use/drop this database. The database name can be changed by an internal SQL configuration spark.sql.globalTempDatabase
. Different from the temporary views, we always need to use the qualified name to access it. Below is an example.
1 2 3 4 5 6 7 8 9 10 11 12 13 14 15 16 17 |
/** Create global temp views */ // option 1) using the SQL interface spark.sql("CREATE GLOBAL TEMP VIEW gView1 AS SELECT 1, 2") // option 2) using the Dataset interface Seq(1, "a").toDF("i", "j").createGlobalTempView("gView2") /** Access global temp views */ // option 1) using the SQL interface spark.sql("SELECT * FROM global_temp.gView1").show() // option 2) using the Dataset interface spark.table("global_temp.gView2").show() /** Drop global temp views */ // option 1) using the SQL interface spark.sql("DROP VIEW global_temp.gView1") // option 2) using the catalog interface spark.catalog.dropGlobalTempView("gView2") |
Users are also allowed to insert the data to the global temporary views when they are built using the existing data source files. However, such usage is not encouraged. Thus, this post does not show the example.
A typical usage scenario for global temporary views is in the Thrift server. When spark.sql.hive.thriftServer.singleSession
is set to false
(default), the Thrift server will create multiple sessions. Global temporary views can be used for sharing the data.
Before the Spark 2.1, the alternative is to create a persistent view whose metadata is stored in the catalog. Since the catalog is global, the persistent views can be accessed from different sessions. To create/use persistent views, you must enable Hive support. This limit is expected to be lifted in the next release (i.e., version 2.2).