Member-only story
The default path for hadoop file system is configured at core-site.xml like
<configuration>
<property>
<name>fs.defaultFS</name>
<value>hdfs://host:port</value>
</property>
</configuration>
To get the file from spark, we will need to use SparkContext.
import org.apache.spark.SparkContext
val sc=SparkContext.getOrCreate()
Then we can get reference to the textFile by passing hadoop path:
val textFile = sc.textFile("hdfs://host:9000/user/ubuntu/books/alice.txt")
Get the first sentence of textFile for example
textFile.first()
String = The Project Gutenberg EBook of Alice’s Adventures in Wonderland, by Lewis Carroll
Happy coding ~~