Introduction to Global Temp Views

Global temporary views are introduced in Spark 2.1.0 release. This feature is useful when you want to share data among different sessions and keep alive until your application ends. In Spark SQL, temporary views are session-scoped and will be automatically dropped if the session terminates.

All the global temporary views are tied to a system preserved temporary database global_temp. The database name is preserved, and thus, users are not allowed create/use/drop this database. The database name can be changed by an internal SQL configuration spark.sql.globalTempDatabase. Different from the temporary views, we always need to use the qualified name to access it. Below is an example.

Users are also allowed to insert the data to the global temporary views when they are built using the existing data source files. However, such usage is not encouraged. Thus, this post does not show the example.

A typical usage scenario for global temporary views is in the Thrift server. When spark.sql.hive.thriftServer.singleSession is set to false (default), the Thrift server will create multiple sessions. Global temporary views can be used for sharing the data.

Before the Spark 2.1, the alternative is to create a persistent view whose metadata is stored in the catalog. Since the catalog is global, the persistent views can be accessed from different sessions. To create/use persistent views, you must enable Hive support. This limit is expected to be lifted in the next release (i.e., version 2.2).

Leave a Reply

Your email address will not be published. Required fields are marked *