scala - Spark: how to not use aws credentials explicitly in Spark application -
in spark application, have aws credentials passed in via command line arguments.
spark.sparkcontext.hadoopconfiguration.set("fs.s3.awsaccesskeyid", awsaccesskeyid) spark.sparkcontext.hadoopconfiguration.set("fs.s3.awssecretaccesskey", awssecretaccesskey) spark.sparkcontext.hadoopconfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.natives3filesystem")
however, in cluster mode explicitly passing credential between nodes huge security issue since these credentials being passed text.
how make application work iamrole or other proper approach doesn't need 2 lines of code in spark app:
spark.sparkcontext.hadoopconfiguration.set("fs.s3.awsaccesskeyid", awsaccesskeyid) spark.sparkcontext.hadoopconfiguration.set("fs.s3.awssecretaccesskey", awssecretaccesskey)
you can add following config in core-site.xml of hadoop conf , cannot add in code base
<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration> <property> <name>fs.s3n.awsaccesskeyid</name> <value>my_aws_access_key_id_here</value> </property> <property> <name>fs.s3n.awssecretaccesskey</name> <value>my_aws_secret_access_key_here</value> </property> </configuration>
to use above file export hadoop_conf_dir=~/private/.aws/hadoop_conf
before running spark or conf/spark-env.sh
and iam role there bug open in spark 1.6 https://issues.apache.org/jira/browse/spark-16363
Comments
Post a Comment