首页 > Spark Streaming集成Kafka代码构建失败

Spark Streaming集成Kafka代码构建失败

env

ubuntu14.04 64bit
java 1.8
scala 2.11.7
spark-hadoop2.6 1.5.1
interlliJ IDEA
sbt 0.13

问题

自己照着Spark streaming example修改了一点代码,用sbt构建一直提示失败

import org.apache.spark.streaming._
import org.apache.spark.streaming.kafka._
import org.apache.spark.SparkConf

//import org.apache.spark.clients.producer.{ProducrConfig, KafkaProducer, ProducerRecord}

import java.util.HashMap

object KafkaDemo1 {
    def main(args: Array[String]) {
        if (args.length < 4) {
            System.err.println("Usage: KafkaWordCount <zkQuorum> <group> <topics> <numThreads>")
            System.exit(1)
        }

        //StreamingExamples.setStreamingLogLevels()

        val Array(zkQuorum, group, topics, numThreads) = args
        val sparkConf = new SparkConf().setAppName("demo1")
        val ssc = new StreamingContext(sparkConf, Seconds(2))
        ssc.checkpoint("checkpoint")

        val topicMap = topics.split(",").map((_, numThreads.toInt)).toMap
        val input = KafkaUtils.createStream(ssc, zkQuorum, group, topicMap).map(_._2)

        //val words = input.count()//计算单词数量
        println(input.count())
        println(input.first())

        val words = input.flatMap(_.split(" "))
        println("print first 10 lines")
        words.take(10).foreach(println)
        val wordCount = words.map(x => (x, 1L))
          .reduceByKeyAndWindow(_ + _, _ - _, Minutes(10), Seconds(2), 2)
        wordCount.print()

        ssc.start()
        ssc.awaitTermination()
    }
}
name := "demo1"
version := "0.0.1"
scalaVersion := "2.11.7"

//libraryDependencies += "org.apache.spark" % "spark-core" % "1.5.1" % "provided"

libraryDependencies ++=    Seq(
    "org.apache.spark" % "spark-core_2.11" % "1.5.1" % "provided",
    "org.apache.spark" % "spark-streaming-kafka_2.10" % "1.5.1"
)
[success] Total time: 0 s, completed Oct 23, 2015 10:58:55 AM
[info] Updating {file:/home/adolph/workspace//spark-streaming-kafka/demo1/}demo1...
[info] Resolving org.apache.hadoop#hadoop-mapreduce-client-common;2.2.0 .[info] Resolving com.sun.jersey.jersey-test-framework#jersey-test-framewo[info] Resolving org.apache.hadoop#hadoop-mapreduce-client-shuffle;2.2.0 [info] Resolving org.apache.hadoop#hadoop-mapreduce-client-jobclient;2.2.[info] Resolving org.apache.hadoop#hadoop-yarn-server-nodemanager;2.2.0 .[info] Resolving org.eclipse.jetty.orbit#javax.servlet;3.0.0.v20111201101[info] Resolving org.scala-lang.modules#scala-parser-combinators_2.11;1.0[info] Resolving com.fasterxml.jackson.module#jackson-module-scala_2.11;2[info] Resolving org.scala-lang.modules#scala-parser-combinators_2.11;1.0[info] Resolving jline#jline;2.12.1 ...
[info] Done updating.
[info] Compiling 1 Scala source to /home/adolph/workspace/linkernetwork/cloud-data/spark-streaming-kafka/demo1/target/scala-2.11/classes...
[error] /home/adolph/workspace/linkernetwork/cloud-data/spark-streaming-kafka/demo1/demo1.scala:20: not found: type StreamingContext
[error]         val ssc = new StreamingContext(sparkConf, Seconds(2))
[error]                       ^
[error] /home/adolph/workspace/linkernetwork/cloud-data/spark-streaming-kafka/demo1/demo1.scala:20: not found: value Seconds
[error]         val ssc = new StreamingContext(sparkConf, Seconds(2))
[error]                                                   ^
[error] missing or invalid dependency detected while loading class file 'KafkaUtils.class'.
[error] Could not access term api in package org.apache.spark.streaming,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'KafkaUtils.class' was compiled against an incompatible version of org.apache.spark.streaming.
[error] missing or invalid dependency detected while loading class file 'KafkaUtils.class'.
[error] Could not access type JavaStreamingContext in value org.apache.spark.streaming.java,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'KafkaUtils.class' was compiled against an incompatible version of org.apache.spark.streaming.java.
[error] missing or invalid dependency detected while loading class file 'KafkaUtils.class'.
[error] Could not access type StreamingContext in package org.apache.spark.streaming,
[error] because it (or its dependencies) are missing. Check your build definition for
[error] missing or conflicting dependencies. (Re-run with `-Ylog-classpath` to see the problematic classpath.)
[error] A full rebuild may help if 'KafkaUtils.class' was compiled against an incompatible version of org.apache.spark.streaming.
[error] 5 errors found
[error] (compile:compileIncremental) Compilation failed
[error] Total time: 8 s, completed Oct 23, 2015 10:59:03 AM

说明

问题可能处在两处

  1. 依赖没有引入正确

  2. 代码中引入的包有问题。

我是用interlliJ作为IDE的,但是刚开始用,还不知道如何自动导包,一直提示报错。

  1. 希望能提供一个IDEA开发scala应用的教程
    希望大家帮我看一下哪里除了问题,


我用的就是scala 2.11啊,不过我的spark是自编译的,而官网提供的预编译包都是2.10的,import的jar包要与spark编译时用的scala版本一致.


给你一个IDEA编写streaming的测试用例,下载直接编译就可以使用https://github.com/jacksu/utils4s/blob/master/spark-streaming-demo/md/spark-streaming-kafka测试用例.md


1、spark与kafka的介绍
2、spark的集群安装
3、Spark RDD函数讲解与实战分析
4、Spark 的java操作实现简单程序
5、SparkRDD原理详细剖析
6、Spark 机器学习,API阅读
7、Kafka架构介绍以及集群安装
8、Kafka生产者Producer的实战
9、Kafka消费者Consumer剖析与实战
10、Kafka复杂消费者的详细讲解
11、Kafka数据安全,以及Spark Kafka Streaming API
12、Spark+Kafka+Mysql整合
13、Spark 机器学习ALS设计
14、Spark ALS协同过滤java实战
15、Spark ALS给用户推荐产品
16、Spark机器学习后存储到Mysql
17、Spark读取Kafka流构建Als模型
18、Spark处理Kafka流构建Als模型
19、Spark处理Kafka流实现实时推荐算法
20、Spark学习经验总结,spark2与spark1的区别,下期预告

spark+kafka实时流机器学习实战,由夜行侠老师精心录制
http://www.itjoin.org/course/...


问题处在scala版本和spark等使用到的scala版本不一致。

kafka和spark都使用到scala这一开源语言。它们使用稳定版的scala而不是最新版。

这里笔者使用的是scala 2.11.7,官网上的最新版。存在兼容性问题。


Spark+Kafka实时流机器学习实战
课程观看地址:http://www.xuetuwuyou.com/cou...
课程出自学途无忧网:http://www.xuetuwuyou.com

一、课程使用到的软件及版本
①Spark1.6.2
②kafka0.8.2.1
③centos6.5

二、课程适合人群
适合想学kafka,Spark实时流计算,Spark机器学习的学员

三、课程目标
①学好Kafka,及其整个架构实现原理
②熟练运用Spark机器学习
③熟练掌控Spark与Kafka结合,实现实时流计算

四、课程目录
第1课、spark与kafka的介绍
第2课、spark的集群安装
第3课、Spark RDD函数讲解与实战分析
第4课、Spark 的java操作实现简单程序
第5课、SparkRDD原理详细剖析
第6课、Spark 机器学习,API阅读
第7课、Kafka架构介绍以及集群安装
第8课、Kafka生产者Producer的实战
第9课、Kafka消费者Consumer剖析与实战
第10课、Kafka复杂消费者的详细讲解
第11课、Kafka数据安全,以及Spark Kafka Streaming API
第12课、Spark+Kafka+Mysql整合
第13课、Spark 机器学习ALS设计
第14课、Spark ALS协同过滤java实战
第15课、Spark ALS给用户推荐产品
第16课、Spark机器学习后存储到Mysql
第17课、Spark读取Kafka流构建Als模型
第18课、Spark处理Kafka流构建Als模型
第19课、Spark处理Kafka流实现实时推荐算法
第20课、Spark学习经验总结,spark2与spark1的区别,下期预告

推荐组合学习:《深入浅出Spark机器学习实战(用户行为分析)》
课程观看地址:http://www.xuetuwuyou.com/cou...

【热门文章】
【热门文章】