<error和解决方式>hadoop集群常见简单问题
目录:
1.HDFS启动时进入安全模式
报错信息:
"RemoteException": {
"exception": "SafeModeException",
"javaClassName": "org.apache.hadoop.hdfs.server.namenode.SafeModeException",
"message": "Cannot set permission for /ats/done. Name node is in safe mode.\nThe reported blocks 8 needs additional 13 blocks to reach the threshold 1.0000 of total blocks 20.\nThe number of live datanodes 1 has reached the minimum number 0. Safe mode will be turned off automatically once the thresholds have been reached."
解决方法:
[root@root ~]# su hdfs
[hdfs@root root]$ hadoop dfsadmin -safemod leave
DEPRECATED: Use of this script to execute hdfs command is deprecated.
Instead use the hdfs command for it.
Safe mode is OFF
2.DataNode启动IO超时
报错信息:
java.net.SocketTimeoutException: 480000 millistimeout while waiting for channel to be ready for write
2014-05-06 14:28:09,386 ERRORorg.apache.hadoop.hdfs.server.datanode.DataNode: hadoop-datanode1:50010ataXceiver error processing READ_BLOCKoperation src: /192.168.1.191:48854 dest: /192.168.1.191:50010
java.net.SocketTimeoutException: 480000 millistimeout while waiting for channel to be ready for write. ch :java.nio.channels.SocketChannel[connected local=/192.168.1.191:50010remote=/192.168.1.191:48854
atorg.apache.hadoop.net.SocketIOWithTimeout.waitForIO(SocketIOWithTimeout.java:246)
atorg.apache.hadoop.net.SocketOutputStream.waitForWritable(SocketOutputStream.java:172)
atorg.apache.hadoop.net.SocketOutputStream.transferToFully(SocketOutputStream.java:220)
at org.apache.hadoop.hdfs.server.datanode.BlockSender.sendPacket(BlockSender.java:546)
atorg.apache.hadoop.hdfs.server.datanode.BlockSender.sendBlock(BlockSender.java:710)
atorg.apache.hadoop.hdfs.server.datanode.DataXceiver.readBlock(DataXceiver.java:340)
atorg.apache.hadoop.hdfs.protocol.datatransfer.Receiver.opReadBlock(Receiver.java:101)
atorg.apache.hadoop.hdfs.protocol.datatransfer.Receiver.processOp(Receiver.java:65)
atorg.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:221)
atjava.lang.Thread.run(Thread.java:722)
解决方式:
修改hadoop配置文件hdfs-site.xml,增加dfs.datanode.socket.write.timeout和dfs.socket.timeout两个属性的设置。
<property>
<name>dfs.datanode.socket.write.timeout</name>
<value>6000000</value>
</property>
<property>
<name>dfs.socket.timeout</name>
<value>6000000</value>
</property>
3.DataNode因为磁盘磁盘损坏无法启动
报错信息:
2016-06-29 01:00:53,541 FATAL datanode.DateNode (DataNode.java:secureMain(2533)) - Exception in secureMain
java.net.BindExcetion: Problem binding to [0.0.0.0:50010] java.net.BindException: Address already in use;For more defail see:http://wiki.apache.org/hadoop/BindException
解决方式:
在hdfs.site.xml中修改以下属性为1
<property>
<name>dfs.datanode.failed.volumes.tolerated</name>
<value>1</value>
</property>
就可以启动datanode了,判断那块磁盘损坏进行磁盘更换。
4.编写的Hadoop程序出错
[15:10:41,949][ INFO][main][org.apache.hadoop.mapred.JobClient:1330] – Task Id : attempt_201202281244_0003_m_000000_1, Status : FAILED
Error: java.lang.ClassNotFoundException: com.sca.commons.ScaException
at java.net.URLClassLoader$1.run(URLClassLoader.java:202)
at java.security.AccessController.doPrivileged(Native Method)
at java.net.URLClassLoader.findClass(URLClassLoader.java:190)
at java.lang.ClassLoader.loadClass(ClassLoader.java:306)
at sun.misc.Launcher$AppClassLoader.loadClass(Launcher.java:301)
at java.lang.ClassLoader.loadClass(ClassLoader.java:247)
at java.lang.Class.forName0(Native Method)
at java.lang.Class.forName(Class.java:247)
at org.apache.hadoop.conf.Configuration.getClassByName(Configuration.java:819)
at org.apache.hadoop.conf.Configuration.getClass(Configuration.java:864)
at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.getMapperClass(MultithreadedMapper.java:95)
at org.apache.hadoop.mapreduce.lib.map.MultithreadedMapper.run(MultithreadedMapper.java:127)
at org.apache.hadoop.mapred.MapTask.runNewMapper(MapTask.java:764)
at org.apache.hadoop.mapred.MapTask.run(MapTask.java:370)
at org.apache.hadoop.mapred.Child$4.run(Child.java:255)
at java.security.AccessController.doPrivileged(Native Method)
at javax.security.auth.Subject.doAs(Subject.java:396)
at org.apache.hadoop.security.UserGroupInformation.doAs(UserGroupInformation.java:1083)
at org.apache.hadoop.mapred.Child.main(Child.java:249)
解决方法:
出现这种问题的主要原因是调用了第三方的Jar包导致的,需要把该jar包放到每台nodemanager节点的HADOOP_HOME目录下。
5.分析大量文件报错
报错信息:
ERROR org.apache.hadoop.hdfs.server.datanode.DataNode: 192-168-11-58:50010:DataXceiver error processing WRITE_BLOCK operation src:
或
Shuffle Error: Exceeded MAX_FAILED_UNIQUE_FETCHES; bailing-out
解决方式:
1
/etc/security/limits.conf中添加
# End of file
* - nofile 1000000
* - nproc 1000000
2
在hdfs.site.xml
<property>
<name>dfs.datanode.max.transfer.threads</name>
<value>8192</value>
</property>
6.DataNode时间不同步无法启动
报错信息:
2016-10-21 02:11:03,822 ERROR datanode.DataNode (DataXceiver.java:run(257)) - DZT08:50010:DataXceiver error processing unknown operation src: /127.0.0.1:33294 dst: /127.0.0.1:50010
java.io.EOFException
at java.io.DataInputStream.readShort(DataInputStream.java:315)
at org.apache.hadoop.hdfs.protocol.datatransfer.Receiver.readOp(Receiver.java:58)
at org.apache.hadoop.hdfs.server.datanode.DataXceiver.run(DataXceiver.java:216)
at java.lang.Thread.run(Thread.java:744)
解决方式:
设置时间相同,并配置NTP服务器
7.Hive动态分区异常
报错信息:
[Fatal Error] Operator FS_2 (id=2): Number of dynamic partitions
Exceededhive.exec.max.dynamic.partitions.pernode
解决方法:
hive> sethive.exec.max.dynamic.partitions.pernode = 10000;
8.Hivemetastore连接超时
报错信息:
FAILED:SemanticException org.apache.thrift.transport.TTransportException:java.net.SocketTimeoutException: Read timed out
解决方法:
hive>set hive.metastore.client.socket.timeout=500;