<error和解决方式>hadoop集群常见简单问题

时间:Nov. 7, 2016 分类:

目录:

1.HDFS启动时进入安全模式

报错信息:

 "RemoteException": {
    "exception": "SafeModeException", 
    "javaClassName": "org.apache.hadoop.hdfs.server.namenode.SafeModeException", 
    "message": "Cannot set permission for /ats/done. Name node is in safe mode.\nThe reported blocks 8 needs additional 13 blocks to reach the threshold 1.0000 of total blocks 20.\nThe number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached."

解决方法:

[root@root ~]# su hdfs
[hdfs@root root]$ hadoop dfsadmin -safemod  leave
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.

Safe mode is OFF

2.DataNode启动IO超时

报错信息:

java.net.SocketTimeoutException: 480000 millistimeout while waiting for channel to be ready for write
2014-05-06 14:28:09,386 ERRORorg.apache.hadoop.hdfs.server.datanode.DataNode: hadoop-datanode1:50010ataXceiver error processing READ_BLOCKoperation src: /192.168.1.191:48854 dest: /192.168.1.191:50010
java.net.SocketTimeoutException: 480000 millistimeout while waiting for channel to be ready for write. ch :java.nio.channels.SocketChannel[connected local=/192.168.1.191:50010remote=/192.168.1.191:48854
atorg.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246) 
atorg.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
atorg.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:546)
atorg.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:710)
atorg.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340)
atorg.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)
atorg.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
atorg.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
atjava.lang.Thread.run(Thread.java:722)

解决方式:

修改hadoop配置文件hdfs-site.xml,增加dfs.datanode.socket.write.timeout和dfs.socket.timeout两个属性的设置。
<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>6000000</value>
</property>

<property>
<name>dfs.socket.timeout</name>
<value>6000000</value>
</property>

3.DataNode因为磁盘磁盘损坏无法启动

报错信息:

2016-06-29 01:00:53,541 FATAL datanode.DateNode (DataNode.java:secureMain(2533)) - Exception in secureMain
java.net.BindExcetion: Problem binding to [0.0.0.0:50010] java.net.BindException: Address already in use;For more defail see:http://wiki.apache.org/hadoop/BindException

解决方式:

在hdfs.site.xml中修改以下属性为1

<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>1</value>
</property>

就可以启动datanode了,判断那块磁盘损坏进行磁盘更换。

4.编写的Hadoop程序出错

[15:10:41,949][ INFO][main][org.apache.hadoop.mapred.JobClient:1330] – Task Id : attempt_201202281244_0003_m_000000_1, Status : FAILED
Error: java.lang.ClassNotFoundException: com.sca.commons.ScaException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:864)
at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.getMapperClass(MultithreadedMapper.java:95)
at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:127)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.mapred.Child.main(Child.java:249)

解决方法:

出现这种问题的主要原因是调用了第三方的Jar包导致的,需要把该jar包放到每台nodemanager节点的HADOOP_HOME目录下。

5.分析大量文件报错

报错信息:

ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 192-168-11-58:50010:DataXceiver error processing WRITE_BLOCK operation  src: 
或
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out 

解决方式:

1
/etc/security/limits.conf中添加
# End of file
*               -       nofile          1000000
*               -       nproc          1000000
2
在hdfs.site.xml
<property>   
    <name>dfs.datanode.max.transfer.threads</name>   
    <value>8192</value> 
</property>  

6.DataNode时间不同步无法启动

报错信息:

2016-10-21 02:11:03,822 ERROR datanode.DataNode (DataXceiver.java:run(257)) - DZT08:50010:DataXceiver error processing unknown operation src: /127.0.0.1:33294 dst: /127.0.0.1:50010
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:315)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:216)
at java.lang.Thread.run(Thread.java:744)

解决方式:

设置时间相同,并配置NTP服务器

7.Hive动态分区异常

报错信息:

[Fatal Error] Operator FS_2 (id=2): Number of dynamic partitions
Exceededhive.exec.max.dynamic.partitions.pernode

解决方法:

hive> sethive.exec.max.dynamic.partitions.pernode = 10000;

8.Hivemetastore连接超时

报错信息:

FAILED:SemanticException org.apache.thrift.transport.TTransportException:java.net.SocketTimeoutException: Read timed out

解决方法:

hive>set hive.metastore.client.socket.timeout=500;