<error和解决方式>CDH5.5版本bug导致ResourceManager无法启动及部分组件重装无法启动

时间:Dec. 24, 2016 分类:

目录:

ResourceManager无法启动

scm日志

[root@master1 cloudera-scm-agent]# pwd
/opt/cm-5.5.4/log/cloudera-scm-agent
[root@master1 cloudera-scm-agent]# ll
total 1864
-rw-r--r-- 1 root root 1852696 Dec 21 19:44 cloudera-scm-agent.log


Traceback (most recent call last):
  File "/opt/cm-5.5.4/lib64/cmf/agent/src/cmf/agent.py", line 1449, in handle_heartbeat_processes
    new_process.activate()
  File "/opt/cm-5.5.4/lib64/cmf/agent/src/cmf/agent.py", line 2817, in activate
    self.write_process_conf()
  File "/opt/cm-5.5.4/lib64/cmf/agent/src/cmf/agent.py", line 2925, in write_process_conf
    "source_parcel_environment", env))
  File "/opt/cm-5.5.4/lib64/cmf/agent/src/cmf/util.py", line 373, in source
    raise e
ValueError: dictionary update sequence element #91 has length 1; 2 is required

ResourceManager日志

2016-12-20 09:41:08,914 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: RECEIVED SIGNAL 15: SIGTERM
2016-12-20 09:41:08,918 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2016-12-20 09:41:08,919 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2016-12-20 09:41:08,920 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted
2016-12-20 09:41:09,030 ERROR org.apache.hadoop.yarn.server.resourcemanager.ResourceManager: Returning, interrupted : java.lang.InterruptedException
2016-12-20 09:41:09,051 ERROR org.apache.hadoop.security.token.delegation.AbstractDelegationTokenSecretManager: ExpiredTokenRemover received java.lang.InterruptedException: sleep interrupted

解决方式

修改报错文件/opt/cm-5.5.4/lib64/cmf/agent/src/cmf/util.py

pipe = subprocess.Popen(['/bin/bash', '-c', ". %s; %s; env | grep -v {|grep -v}" % (path, command)],

删除集群组件服务后通过Couldera Manager重新安装服务

HBase的HMaster无法启动

HMaster日志

2016-12-21 22:00:51,889 FATAL org.apache.hadoop.hbase.master.HMaster: Failed to become active master
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
    at org.apache.hadoop.hbase.master.handler.CreateTableHandler.checkAndSetEnablingTable(CreateTableHandler.java:160)
    at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:133)
    at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:233)
    at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
    at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:902)
    at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:739)
    at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:169)
    at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1481)
    at java.lang.Thread.run(Thread.java:745)
2016-12-21 22:00:51,899 FATAL org.apache.hadoop.hbase.master.HMaster: Master server abort: loaded coprocessors are: []
2016-12-21 22:00:51,899 FATAL org.apache.hadoop.hbase.master.HMaster: Unhandled exception. Starting shutdown.
org.apache.hadoop.hbase.TableExistsException: hbase:namespace
    at org.apache.hadoop.hbase.master.handler.CreateTableHandler.checkAndSetEnablingTable(CreateTableHandler.java:160)
    at org.apache.hadoop.hbase.master.handler.CreateTableHandler.prepare(CreateTableHandler.java:133)
    at org.apache.hadoop.hbase.master.TableNamespaceManager.createNamespaceTable(TableNamespaceManager.java:233)
    at org.apache.hadoop.hbase.master.TableNamespaceManager.start(TableNamespaceManager.java:86)
    at org.apache.hadoop.hbase.master.HMaster.initNamespace(HMaster.java:902)
    at org.apache.hadoop.hbase.master.HMaster.finishActiveMasterInitialization(HMaster.java:739)
    at org.apache.hadoop.hbase.master.HMaster.access$500(HMaster.java:169)
    at org.apache.hadoop.hbase.master.HMaster$1.run(HMaster.java:1481)
    at java.lang.Thread.run(Thread.java:745)

解决方式

[root@agent1 /]# find / -name zkCli*
/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/zkCli.sh
[root@agent1 /]# sh /opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/zkCli.sh
Connecting to localhost:2181
2016-12-21 22:03:43,684 [myid:] - INFO  [main:Environment@100] - Client environment:zookeeper.version=3.4.5-cdh5.5.4--1, built on 04/25/2016 18:53 GMT
2016-12-21 22:03:43,689 [myid:] - INFO  [main:Environment@100] - Client environment:host.name=agent1
2016-12-21 22:03:43,689 [myid:] - INFO  [main:Environment@100] - Client environment:java.version=1.7.0_79
2016-12-21 22:03:43,691 [myid:] - INFO  [main:Environment@100] - Client environment:java.vendor=Oracle Corporation
2016-12-21 22:03:43,691 [myid:] - INFO  [main:Environment@100] - Client environment:java.home=/usr/java/jdk1.7.0_79/jre
2016-12-21 22:03:43,691 [myid:] - INFO  [main:Environment@100] - Client environment:java.class.path=/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../build/classes:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../build/lib/*.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../lib/slf4j-log4j12.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../lib/slf4j-log4j12-1.7.5.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../lib/slf4j-api-1.7.5.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../lib/netty-3.2.2.Final.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../lib/log4j-1.2.16.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../lib/jline-2.11.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../zookeeper-3.4.5-cdh5.5.4.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../src/java/lib/*.jar:/opt/cloudera/parcels/CDH-5.5.4-1.cdh5.5.4.p0.9/lib/zookeeper/bin/../conf:.:/usr/java/jdk1.7.0_79//lib/dt.jar:/usr/java/jdk1.7.0_79//lib/tools.jar
2016-12-21 22:03:43,692 [myid:] - INFO  [main:Environment@100] - Client environment:java.library.path=/usr/java/packages/lib/amd64:/usr/lib64:/lib64:/lib:/usr/lib
2016-12-21 22:03:43,692 [myid:] - INFO  [main:Environment@100] - Client environment:java.io.tmpdir=/tmp
2016-12-21 22:03:43,692 [myid:] - INFO  [main:Environment@100] - Client environment:java.compiler=<NA>
2016-12-21 22:03:43,692 [myid:] - INFO  [main:Environment@100] - Client environment:os.name=Linux
2016-12-21 22:03:43,692 [myid:] - INFO  [main:Environment@100] - Client environment:os.arch=amd64
2016-12-21 22:03:43,692 [myid:] - INFO  [main:Environment@100] - Client environment:os.version=2.6.32-431.el6.x86_64
2016-12-21 22:03:43,692 [myid:] - INFO  [main:Environment@100] - Client environment:user.name=root
2016-12-21 22:03:43,692 [myid:] - INFO  [main:Environment@100] - Client environment:user.home=/root
2016-12-21 22:03:43,693 [myid:] - INFO  [main:Environment@100] - Client environment:user.dir=/
2016-12-21 22:03:43,694 [myid:] - INFO  [main:ZooKeeper@438] - Initiating client connection, connectString=localhost:2181 sessionTimeout=30000 watcher=org.apache.zookeeper.ZooKeeperMain$MyWatcher@279ac931
Welcome to ZooKeeper!
2016-12-21 22:03:43,720 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@975] - Opening socket connection to server localhost/0:0:0:0:0:0:0:1:2181. Will not attempt to authenticate using SASL (unknown error)
2016-12-21 22:03:43,728 [myid:] - WARN  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1102] - Session 0x0 for server null, unexpected error, closing socket connection and attempting reconnect
java.net.ConnectException: Connection refused
    at sun.nio.ch.SocketChannelImpl.checkConnect(Native Method)
    at sun.nio.ch.SocketChannelImpl.finishConnect(SocketChannelImpl.java:739)
    at org.apache.zookeeper.ClientCnxnSocketNIO.doTransport(ClientCnxnSocketNIO.java:350)
    at org.apache.zookeeper.ClientCnxn$SendThread.run(ClientCnxn.java:1081)
JLine support is enabled
[zk: localhost:2181(CONNECTING) 0] 2016-12-21 22:03:43,833 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@975] - Opening socket connection to server localhost/127.0.0.1:2181. Will not attempt to authenticate using SASL (unknown error)
2016-12-21 22:03:43,835 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@852] - Socket connection established, initiating session, client: /127.0.0.1:52794, server: localhost/127.0.0.1:2181
2016-12-21 22:03:43,846 [myid:] - INFO  [main-SendThread(localhost:2181):ClientCnxn$SendThread@1235] - Session establishment complete on server localhost/127.0.0.1:2181, sessionid = 0x25921ad67c1001a, negotiated timeout = 30000

WATCHER::

WatchedEvent state:SyncConnected type:None path:null

[zk: localhost:2181(CONNECTED) 0] 
[zk: localhost:2181(CONNECTED) 0] 
[zk: localhost:2181(CONNECTED) 0] 
[zk: localhost:2181(CONNECTED) 0] 
[zk: localhost:2181(CONNECTED) 0] ls /
[isr_change_notification, hbase, zookeeper, admin, consumers, config, hive_zookeeper_namespace_hive, brokers, controller_epoch]
[zk: localhost:2181(CONNECTED) 1] ls /hbase
[meta-region-server, backup-masters, table, draining, region-in-transition, running, table-lock, namespace, hbaseid, online-snapshot, replication, splitWAL, recovering-regions, rs, flush-table-proc]
[zk: localhost:2181(CONNECTED) 2] rmr /hbase
[zk: localhost:2181(CONNECTED) 3] ls /
[isr_change_notification, zookeeper, admin, consumers, config, hive_zookeeper_namespace_hive, brokers, controller_epoch]

kafka broker无法启动

错误日志

12月 21, 晚上10点22:12.744  INFO    kafka.server.KafkaServer    
starting
12月 21, 晚上10点22:12.751  INFO    kafka.server.KafkaServer    
Connecting to zookeeper on agent1:2181,agent2:2181,agent3:2181
12月 21, 晚上10点22:12.752  INFO    org.apache.zookeeper.ZooKeeper  
Initiating client connection, connectString=agent1:2181,agent2:2181,agent3:2181 sessionTimeout=6000 watcher=org.I0Itec.zkclient.ZkClient@37912c1a
12月 21, 晚上10点22:12.752  INFO    org.I0Itec.zkclient.ZkEventThread   
Starting ZkClient event thread.
12月 21, 晚上10点22:12.753  INFO    org.I0Itec.zkclient.ZkClient    
Waiting for keeper state SyncConnected
12月 21, 晚上10点22:12.753  INFO    org.apache.zookeeper.ClientCnxn 
Opening socket connection to server agent2/172.30.200.22:2181. Will not attempt to authenticate using SASL (unknown error)
12月 21, 晚上10点22:12.754  INFO    org.apache.zookeeper.ClientCnxn 
Socket connection established to agent2/172.30.200.22:2181, initiating session
12月 21, 晚上10点22:12.758  INFO    org.apache.zookeeper.ClientCnxn 
Session establishment complete on server agent2/172.30.200.22:2181, sessionid = 0x15921ad678700b0, negotiated timeout = 6000
12月 21, 晚上10点22:12.758  INFO    org.I0Itec.zkclient.ZkClient    
zookeeper state changed (SyncConnected)
12月 21, 晚上10点22:12.801  INFO    kafka.log.LogManager    
Loading logs.
12月 21, 晚上10点22:12.808  INFO    kafka.log.LogManager    
Logs loading complete.
12月 21, 晚上10点22:12.889  INFO    kafka.log.LogManager    
Starting log cleanup with a period of 300000 ms.
12月 21, 晚上10点22:12.895  INFO    kafka.log.LogManager    
Starting log flusher with a default period of 9223372036854775807 ms.
12月 21, 晚上10点22:12.896  INFO    kafka.log.LogCleaner    
Starting the log cleaner
12月 21, 晚上10点22:12.898  INFO    kafka.log.LogCleaner    
[kafka-log-cleaner-thread-0], Starting 
12月 21, 晚上10点22:12.902  FATAL   kafka.server.KafkaServer    
Fatal error during KafkaServer startup. Prepare to shutdown
kafka.common.InconsistentBrokerIdException: Configured broker.id 219 doesn't match stored broker.id 157 in meta.properties. If you moved your data, make sure your configured broker.id matches. If you intend to create a new broker, you should remove all data in your data directories (log.dirs).
    at kafka.server.KafkaServer.getBrokerId(KafkaServer.scala:635)
    at kafka.server.KafkaServer.startup(KafkaServer.scala:184)
    at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37)
    at kafka.Kafka$.main(Kafka.scala:67)
    at com.cloudera.kafka.wrap.Kafka$.main(Kafka.scala:76)
    at com.cloudera.kafka.wrap.Kafka.main(Kafka.scala)
12月 21, 晚上10点22:12.905  INFO    kafka.server.KafkaServer    
shutting down
12月 21, 晚上10点22:12.910  INFO    kafka.log.LogManager    
Shutting down.
12月 21, 晚上10点22:12.911  INFO    kafka.log.LogCleaner    
Shutting down the log cleaner.
12月 21, 晚上10点22:12.912  INFO    kafka.log.LogCleaner    
[kafka-log-cleaner-thread-0], Shutting down
12月 21, 晚上10点22:12.913  INFO    kafka.log.LogCleaner    
[kafka-log-cleaner-thread-0], Shutdown completed
12月 21, 晚上10点22:12.913  INFO    kafka.log.LogCleaner    
[kafka-log-cleaner-thread-0], Stopped 
12月 21, 晚上10点22:12.922  INFO    kafka.log.LogManager    
Shutdown complete.
12月 21, 晚上10点22:12.923  INFO    org.I0Itec.zkclient.ZkEventThread   
Terminate ZkClient event thread.
12月 21, 晚上10点22:12.927  INFO    org.apache.zookeeper.ClientCnxn 
EventThread shut down
12月 21, 晚上10点22:12.927  INFO    org.apache.zookeeper.ZooKeeper  
Session: 0x15921ad678700b0 closed
12月 21, 晚上10点22:12.929  INFO    kafka.server.KafkaServer    
shut down completed
12月 21, 晚上10点22:12.930  FATAL   kafka.server.KafkaServerStartable   
Fatal error during KafkaServerStartable startup. Prepare to shutdown
kafka.common.InconsistentBrokerIdException: Configured broker.id 219 doesn't match stored broker.id 157 in meta.properties. If you moved your data, make sure your configured broker.id matches. If you intend to create a new broker, you should remove all data in your data directories (log.dirs).
    at kafka.server.KafkaServer.getBrokerId(KafkaServer.scala:635)
    at kafka.server.KafkaServer.startup(KafkaServer.scala:184)
    at kafka.server.KafkaServerStartable.startup(KafkaServerStartable.scala:37)
    at kafka.Kafka$.main(Kafka.scala:67)
    at com.cloudera.kafka.wrap.Kafka$.main(Kafka.scala:76)
    at com.cloudera.kafka.wrap.Kafka.main(Kafka.scala)
12月 21, 晚上10点22:12.932  INFO    kafka.server.KafkaServer    
shutting down

解决方式

删除properties文件