<error和解决方式>CDH5.5版本bug导致ResourceManager无法启动及部分组件重装无法启动
目录:
ResourceManager无法启动
scm日志
[root@master1 cloudera-scm-agent]# pwd
/opt/cm-5.5.4/log/cloudera-scm-agent
[root@master1 cloudera-scm-agent]# ll
total 1864
-rw-r--r-- 1 root root 1852696 Dec 21 19:44 cloudera-scm-agent.log
Traceback (most recent call last):
File "/opt/cm-5.5.4/lib64/cmf/agent/src/cmf/agent.py", line 1449, in handle_heartbeat_processes
new_process.activate()
File "/opt/cm-5.5.4/lib64/cmf/agent/src/cmf/agent.py", line 2817, in activate
self.write_process_conf()
File "/opt/cm-5.5.4/lib64/cmf/agent/src/cmf/agent.py", line 2925, in write_process_conf
"source_parcel_environment", env))
File "/opt/cm-5.5.4/lib64/cmf/agent/src/cmf/util.py", line 373, in source
raise e
ValueError: dictionary update sequence element #91 has length 1; 2 is required
ResourceManager日志
2016-12-20 09:41:08,914 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 15: SIGTERM
2016-12-20 09:41:08,918 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2016-12-20 09:41:08,919 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2016-12-20 09:41:08,920 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2016-12-20 09:41:09,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Returning, interrupted : java.lang.InterruptedException
2016-12-20 09:41:09,051 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
解决方式
修改报错文件/opt/cm-5.5.4/lib64/cmf/agent/src/cmf/util.py
pipe = subprocess.Popen(['/bin/bash', '-c', ". %s; %s; env | grep -v {|grep -v}" % (path, command)],
删除集群组件服务后通过Couldera Manager重新安装服务
HBase的HMaster无法启动
HMaster日志
2016-12-21 22:00:51,889 FATAL org.apache.hadoop.hbase.master.HMaster: Failed to become active master
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
at org.apache.hadoop.hbase.master.handler.CreateTableHandler.checkAndSetEnablingTable(CreateTableHandler.java:160)
at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:133)
at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:233)
at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:902)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:739)
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:169)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1481)
at java.lang.Thread.run(Thread.java:745)
2016-12-21 22:00:51,899 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2016-12-21 22:00:51,899 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
at org.apache.hadoop.hbase.master.handler.CreateTableHandler.checkAndSetEnablingTable(CreateTableHandler.java:160)
at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:133)
at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:233)
at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:902)
at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:739)
at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:169)
at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1481)
at java.lang.Thread.run(Thread.java:745)
解决方式
[root@agent1 /]# find / -name zkCli*
/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/zkCli.sh
[root@agent1 /]# sh /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/zkCli.sh
Connecting to localhost:2181
2016-12-21 22:03:43,684 [myid:] - INFO [main:Environment@100] - Client environment:zookeeper.version=3.4.5-cdh5.5.4--1, built on 04/25/2016 18:53 GMT
2016-12-21 22:03:43,689 [myid:] - INFO [main:Environment@100] - Client environment:host.name=agent1
2016-12-21 22:03:43,689 [myid:] - INFO [main:Environment@100] - Client environment:java.version=1.7.0_79
2016-12-21 22:03:43,691 [myid:] - INFO [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2016-12-21 22:03:43,691 [myid:] - INFO [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.7.0_79/jre
2016-12-21 22:03:43,691 [myid:] - INFO [main:Environment@100] - Client environment:java.class.path=/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../build/classes:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../build/lib/*.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../lib/slf4j-log4j12.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../lib/slf4j-log4j12-1.7.5.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../lib/slf4j-api-1.7.5.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../lib/log4j-1.2.16.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../lib/jline-2.11.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../zookeeper-3.4.5-cdh5.5.4.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../src/java/lib/*.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../conf:.:/usr/java/jdk1.7.0_79//lib/dt.jar:/usr/java/jdk1.7.0_79//lib/tools.jar
2016-12-21 22:03:43,692 [myid:] - INFO [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2016-12-21 22:03:43,692 [myid:] - INFO [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2016-12-21 22:03:43,692 [myid:] - INFO [main:Environment@100] - Client environment:java.compiler=<NA>
2016-12-21 22:03:43,692 [myid:] - INFO [main:Environment@100] - Client environment:os.name=Linux
2016-12-21 22:03:43,692 [myid:] - INFO [main:Environment@100] - Client environment:os.arch=amd64
2016-12-21 22:03:43,692 [myid:] - INFO [main:Environment@100] - Client environment:os.version=2.6.32-431.el6.x86_64
2016-12-21 22:03:43,692 [myid:] - INFO [main:Environment@100] - Client environment:user.name=root
2016-12-21 22:03:43,692 [myid:] - INFO [main:Environment@100] - Client environment:user.home=/root
2016-12-21 22:03:43,693 [myid:] - INFO [main:Environment@100] - Client environment:user.dir=/
2016-12-21 22:03:43,694 [myid:] - INFO [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@279ac931
Welcome to ZooKeeper!
2016-12-21 22:03:43,720 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@975] - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
2016-12-21 22:03:43,728 [myid:] - WARN [main-SendThread(localhost:2181):ClientCnxn$SendThread@1102] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
JLine support is enabled
[zk: localhost:2181(CONNECTING) 0] 2016-12-21 22:03:43,833 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@975] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2016-12-21 22:03:43,835 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@852] - Socket connection established, initiating session, client: /127.0.0.1:52794, server: localhost/127.0.0.1:2181
2016-12-21 22:03:43,846 [myid:] - INFO [main-SendThread(localhost:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x25921ad67c1001a, negotiated timeout = 30000
WATCHER::
WatchedEvent state:SyncConnected type:None path:null
[zk: localhost:2181(CONNECTED) 0]
[zk: localhost:2181(CONNECTED) 0]
[zk: localhost:2181(CONNECTED) 0]
[zk: localhost:2181(CONNECTED) 0]
[zk: localhost:2181(CONNECTED) 0] ls /
[isr_change_notification, hbase, zookeeper, admin, consumers, config, hive_zookeeper_namespace_hive, brokers, controller_epoch]
[zk: localhost:2181(CONNECTED) 1] ls /hbase
[meta-region-server, backup-masters, table, draining, region-in-transition, running, table-lock, namespace, hbaseid, online-snapshot, replication, splitWAL, recovering-regions, rs, flush-table-proc]
[zk: localhost:2181(CONNECTED) 2] rmr /hbase
[zk: localhost:2181(CONNECTED) 3] ls /
[isr_change_notification, zookeeper, admin, consumers, config, hive_zookeeper_namespace_hive, brokers, controller_epoch]
kafka broker无法启动
错误日志
12月 21, 晚上10点22:12.744 INFO kafka.server.KafkaServer
starting
12月 21, 晚上10点22:12.751 INFO kafka.server.KafkaServer
Connecting to zookeeper on agent1:2181,agent2:2181,agent3:2181
12月 21, 晚上10点22:12.752 INFO org.apache.zookeeper.ZooKeeper
Initiating client connection, connectString=agent1:2181,agent2:2181,agent3:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@37912c1a
12月 21, 晚上10点22:12.752 INFO org.I0Itec.zkclient.ZkEventThread
Starting ZkClient event thread.
12月 21, 晚上10点22:12.753 INFO org.I0Itec.zkclient.ZkClient
Waiting for keeper state SyncConnected
12月 21, 晚上10点22:12.753 INFO org.apache.zookeeper.ClientCnxn
Opening socket connection to server agent2/172.30.200.22:2181. Will not attempt to authenticate using SASL (unknown error)
12月 21, 晚上10点22:12.754 INFO org.apache.zookeeper.ClientCnxn
Socket connection established to agent2/172.30.200.22:2181, initiating session
12月 21, 晚上10点22:12.758 INFO org.apache.zookeeper.ClientCnxn
Session establishment complete on server agent2/172.30.200.22:2181, sessionid = 0x15921ad678700b0, negotiated timeout = 6000
12月 21, 晚上10点22:12.758 INFO org.I0Itec.zkclient.ZkClient
zookeeper state changed (SyncConnected)
12月 21, 晚上10点22:12.801 INFO kafka.log.LogManager
Loading logs.
12月 21, 晚上10点22:12.808 INFO kafka.log.LogManager
Logs loading complete.
12月 21, 晚上10点22:12.889 INFO kafka.log.LogManager
Starting log cleanup with a period of 300000 ms.
12月 21, 晚上10点22:12.895 INFO kafka.log.LogManager
Starting log flusher with a default period of 9223372036854775807 ms.
12月 21, 晚上10点22:12.896 INFO kafka.log.LogCleaner
Starting the log cleaner
12月 21, 晚上10点22:12.898 INFO kafka.log.LogCleaner
[kafka-log-cleaner-thread-0], Starting
12月 21, 晚上10点22:12.902 FATAL kafka.server.KafkaServer
Fatal error during KafkaServer startup. Prepare to shutdown
kafka.common.InconsistentBrokerIdException: Configured broker.id 219 doesn't match stored broker.id 157 in meta.properties. If you moved your data, make sure your configured broker.id matches. If you intend to create a new broker, you should remove all data in your data directories (log.dirs).
at kafka.server.KafkaServer.getBrokerId(KafkaServer.scala:635)
at kafka.server.KafkaServer.startup(KafkaServer.scala:184)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37)
at kafka.Kafka$.main(Kafka.scala:67)
at com.cloudera.kafka.wrap.Kafka$.main(Kafka.scala:76)
at com.cloudera.kafka.wrap.Kafka.main(Kafka.scala)
12月 21, 晚上10点22:12.905 INFO kafka.server.KafkaServer
shutting down
12月 21, 晚上10点22:12.910 INFO kafka.log.LogManager
Shutting down.
12月 21, 晚上10点22:12.911 INFO kafka.log.LogCleaner
Shutting down the log cleaner.
12月 21, 晚上10点22:12.912 INFO kafka.log.LogCleaner
[kafka-log-cleaner-thread-0], Shutting down
12月 21, 晚上10点22:12.913 INFO kafka.log.LogCleaner
[kafka-log-cleaner-thread-0], Shutdown completed
12月 21, 晚上10点22:12.913 INFO kafka.log.LogCleaner
[kafka-log-cleaner-thread-0], Stopped
12月 21, 晚上10点22:12.922 INFO kafka.log.LogManager
Shutdown complete.
12月 21, 晚上10点22:12.923 INFO org.I0Itec.zkclient.ZkEventThread
Terminate ZkClient event thread.
12月 21, 晚上10点22:12.927 INFO org.apache.zookeeper.ClientCnxn
EventThread shut down
12月 21, 晚上10点22:12.927 INFO org.apache.zookeeper.ZooKeeper
Session: 0x15921ad678700b0 closed
12月 21, 晚上10点22:12.929 INFO kafka.server.KafkaServer
shut down completed
12月 21, 晚上10点22:12.930 FATAL kafka.server.KafkaServerStartable
Fatal error during KafkaServerStartable startup. Prepare to shutdown
kafka.common.InconsistentBrokerIdException: Configured broker.id 219 doesn't match stored broker.id 157 in meta.properties. If you moved your data, make sure your configured broker.id matches. If you intend to create a new broker, you should remove all data in your data directories (log.dirs).
at kafka.server.KafkaServer.getBrokerId(KafkaServer.scala:635)
at kafka.server.KafkaServer.startup(KafkaServer.scala:184)
at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37)
at kafka.Kafka$.main(Kafka.scala:67)
at com.cloudera.kafka.wrap.Kafka$.main(Kafka.scala:76)
at com.cloudera.kafka.wrap.Kafka.main(Kafka.scala)
12月 21, 晚上10点22:12.932 INFO kafka.server.KafkaServer
shutting down
解决方式
删除properties文件