scala - Spark: how to not use aws credentials explicitly in Spark application -

- February 15, 2011

in spark application, have aws credentials passed in via command line arguments.

spark.sparkcontext.hadoopconfiguration.set("fs.s3.awsaccesskeyid", awsaccesskeyid) spark.sparkcontext.hadoopconfiguration.set("fs.s3.awssecretaccesskey", awssecretaccesskey) spark.sparkcontext.hadoopconfiguration.set("fs.s3.impl", "org.apache.hadoop.fs.s3native.natives3filesystem")

however, in cluster mode explicitly passing credential between nodes huge security issue since these credentials being passed text.

how make application work iamrole or other proper approach doesn't need 2 lines of code in spark app:

spark.sparkcontext.hadoopconfiguration.set("fs.s3.awsaccesskeyid", awsaccesskeyid) spark.sparkcontext.hadoopconfiguration.set("fs.s3.awssecretaccesskey", awssecretaccesskey)

you can add following config in core-site.xml of hadoop conf , cannot add in code base

<?xml version="1.0"?> <?xml-stylesheet type="text/xsl" href="configuration.xsl"?> <configuration>   <property>   <name>fs.s3n.awsaccesskeyid</name>   <value>my_aws_access_key_id_here</value>   </property>   <property>   <name>fs.s3n.awssecretaccesskey</name>   <value>my_aws_secret_access_key_here</value>   </property> </configuration>

to use above file export hadoop_conf_dir=~/private/.aws/hadoop_conf before running spark or conf/spark-env.sh

and iam role there bug open in spark 1.6 https://issues.apache.org/jira/browse/spark-16363

Search This Blog

QR

scala - Spark: how to not use aws credentials explicitly in Spark application -

Comments

Post a Comment

Popular posts from this blog

java - .class files under target/classes folder Maven -

linux - Could not find a package configuration file provided by "Qt5Svg" -

simple.odata.client - Simple OData Client Unlink -