我用 spark 读写 es 报错:
py4j.protocol.Py4JJavaError: An error occurred while calling z:org.apache.spark.api.python.PythonRDD.newAPIHadoopRDD.
: java.lang.ClassNotFoundException: org.elasticsearch.hadoop.mr.LinkedMapWritable
代码如下:
from pyspark import SparkConf, SparkContext
conf = SparkConf().setMaster("local").setAppName("elasticsearch-hadoop")
sc = SparkContext(conf=conf)
# read in ES index/type "products/kcosmetics"
es_rdd = sc.newAPIHadoopRDD(
inputFormatClass="org.elasticsearch.hadoop.mr.EsInputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf={ "es.resource" : "products" })
print(es_rdd.first())
kcosmetics_availability = es_rdd.map(lambda item: ("key",{
'id': item[0] , ## _id from products/kcosmetics
'availability': item[1]['availability']
}))
# write the results to "titanic/value_counts"
kcosmetics_availability.saveAsNewAPIHadoopFile(
path='-',
outputFormatClass="org.elasticsearch.hadoop.mr.EsOutputFormat",
keyClass="org.apache.hadoop.io.NullWritable",
valueClass="org.elasticsearch.hadoop.mr.LinkedMapWritable",
conf={
"es.index.auto.create": "true", # auto creating index as inserted
"es.mapping.id": "id", # auto mapping id as index id
"es.resource" : "products/kcosmetics_stocks" })
根据错误信息我又去安装 elasticsearch-hadoop, 结果提示我:
java.lang.Error: Multiple ES-Hadoop versions detected in the classpath; please use only one
jar:file:/home/andy/Desktop/spark-2.2.0-bin-hadoop2.7/jars/elasticsearch-spark-20_2.11-5.6.3.jar
jar:file:/home/andy/Desktop/spark-2.2.0-bin-hadoop2.7/jars/elasticsearch-hadoop-mr-5.6.3.jar
spark 版本是:2.2 elasticsearch-spark 版本: 5.6.3
这是一个专为移动设备优化的页面(即为了让你能够在 Google 搜索结果里秒开这个页面),如果你希望参与 V2EX 社区的讨论,你可以继续到 V2EX 上打开本讨论主题的完整版本。
V2EX 是创意工作者们的社区,是一个分享自己正在做的有趣事物、交流想法,可以遇见新朋友甚至新机会的地方。
V2EX is a community of developers, designers and creative people.