

To use the Spark History Service, run Hive queries as the spark user, or run Spark jobs, the associated user must have sufficient HDFS access. (Note: if you installed the tech preview, these will already be in the file.) For example: :18080 Make sure the following values are specified, including hostname and port.
#Install apache spark install
This article provides step by step guide to install the latest version of Apache Spark 3.0.1 on a UNIX alike system (Linux) or Windows Subsystem for Linux (WSL). This will minimize the amount of work you need to do to set up environment variables before running Spark applications.Įdit the nf file in the Spark client /conf directory. Apache Spark 3.0.1 Installation on Linux or WSL Guide. We recommend that you set HADOOP_CONF_DIR to the appropriate directory for example: By default these files are in the $SPARK_HOME directory, typically owned by root in RMP installation. The user who starts Spark services needs to have read and write permissions to the log file and PID directory. Download Apache Spark distribution Set the environment variables. These settings are required for starting Spark services (for example, the History Service and the Thrift server). # Location of the pid file (default: /tmp) # This can be any directory where the spark user has R/W access

In this file, add the property and specify the Hive metastore as its value: Ĭreate a spark-env.sh file in the Spark client /conf directory, and make sure the file has the following entries: # Location where log files are stored (default: $/logs) (Note: if you installed the Spark tech preview you can skip this step.) If you plan to use Hive with Spark, create a hive-site.xml file in the Spark client /conf directory. Note: the following instructions are for a non-Kerberized cluster.Ĭreate a java-opts file in the Spark client /conf directory.
