<error和解决方式>Hive因内存问题无法进行关联查询
目录:
SQL
hive> SELECT tmp.book_category, ROUND(tmp.revenue, 2) AS revenue
> FROM (
> SELECT books.category AS book_category, SUM(books.price * transactions.quantity) AS revenue
> FROM books JOIN transactions ON (
> transactions.book_id = books.id
> AND YEAR(transactions.transaction_date) BETWEEN 2008 AND 2010
> )
> GROUP BY books.category
> ) tmp
> ORDER BY revenue DESC LIMIT 10;
Error信息
Query ID = hdfs_20161211135159_c666e369-c56e-4ee8-a16e-c366e32d4b6a
Total jobs = 2
SLF4J: Class path contains multiple SLF4J bindings.
SLF4J: Found binding in [jar:file:/usr/crh/4.9.2.5-1051/hadoop/lib/slf4j-log4j12-1.7.5.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: Found binding in [jar:file:/usr/crh/4.9.2.5-1051/spark/lib/spark-assembly-1.6.2-hadoop2.6.4.jar!/org/slf4j/impl/StaticLoggerBinder.class]
SLF4J: See http://www.slf4j.org/codes.html#multiple_bindings for an explanation.
SLF4J: Actual binding is of type [org.slf4j.impl.Log4jLoggerFactory]
Execution log at: /tmp/hdfs/hdfs_20161211135159_c666e369-c56e-4ee8-a16e-c366e32d4b6a.log
2016-12-11 13:52:05 Starting to launch local task to process map join; maximum memory = 1056964608
2016-12-11 13:52:08 Processing rows: 200000 Hashtable size: 199999 Memory usage: 89726264 percentage: 0.085
2016-12-11 13:52:08 Processing rows: 300000 Hashtable size: 299999 Memory usage: 129972400 percentage: 0.123
2016-12-11 13:52:08 Processing rows: 400000 Hashtable size: 399999 Memory usage: 162869288 percentage: 0.154
2016-12-11 13:52:09 Processing rows: 500000 Hashtable size: 499999 Memory usage: 180633424 percentage: 0.17
2016-12-11 13:52:09 Processing rows: 600000 Hashtable size: 599999 Memory usage: 221480896 percentage: 0.209
2016-12-11 13:52:10 Processing rows: 700000 Hashtable size: 699999 Memory usage: 254688600 percentage: 0.24
2016-12-11 13:52:10 Processing rows: 800000 Hashtable size: 799999 Memory usage: 285144816 percentage: 0.269
2016-12-11 13:52:10 Processing rows: 900000 Hashtable size: 899999 Memory usage: 313415848 percentage: 0.295
2016-12-11 13:52:11 Processing rows: 1000000 Hashtable size: 999999 Memory usage: 340383632 percentage: 0.321
2016-12-11 13:52:11 Processing rows: 1100000 Hashtable size: 1099999 Memory usage: 381282200 percentage: 0.359
2016-12-11 13:52:12 Processing rows: 1200000 Hashtable size: 1199999 Memory usage: 408453936 percentage: 0.385
2016-12-11 13:52:12 Processing rows: 1300000 Hashtable size: 1299999 Memory usage: 437863080 percentage: 0.413
2016-12-11 13:52:13 Processing rows: 1400000 Hashtable size: 1399999 Memory usage: 465089472 percentage: 0.439
2016-12-11 13:52:13 Processing rows: 1500000 Hashtable size: 1499999 Memory usage: 489292472 percentage: 0.461
2016-12-11 13:52:14 Processing rows: 1600000 Hashtable size: 1599999 Memory usage: 532300800 percentage: 0.502
2016-12-11 13:52:14 Processing rows: 1700000 Hashtable size: 1699999 Memory usage: 555522448 percentage: 0.524
2016-12-11 13:52:14 Processing rows: 1800000 Hashtable size: 1799999 Memory usage: 578111984 percentage: 0.545
2016-12-11 13:52:15 Processing rows: 1900000 Hashtable size: 1899999 Memory usage: 621706968 percentage: 0.586
Execution failed with exit status: 3
Obtaining error information
Task failed!
Task ID:
Stage-6
Logs:
/tmp/hdfs/hive.log
FAILED: Execution Error, return code 3 from org.apache.hadoop.hive.ql.exec.mr.MapredLocalTask
解决方式
修改参数 hive.mapjoin.localtask.max.memory.usage或hive.auto.convert.join
- hive.mapjoin.localtask.max.memory.usage为本地任务可以使用内存的百分比,默认为0.90,如果localtask mapjoin 表很小可以试试。
- hive.auto.convert.join为是否自动mapjoin,默认值为true,用的时候一定要注意, 如果你的sql 很长join会常多,关闭mapjoin任务数会成10倍激增,contener满了任务同样会非常之慢
按照解决方式进行解决
因为我提交的SQL中join的两张表大小相近,我采用了set hive.auto.convert.join=false;进行优化
hive> set hive.auto.convert.join=false;
hive> SELECT tmp.book_category, ROUND(tmp.revenue, 2) AS revenue
> FROM (
> SELECT books.category AS book_category, SUM(books.price * transactions.quantity) AS revenue
> FROM books JOIN transactions ON (
> transactions.book_id = books.id
> AND YEAR(transactions.transaction_date) BETWEEN 2008 AND 2010
> )
> GROUP BY books.category
> ) tmp
> ORDER BY revenue DESC LIMIT 10;
Query ID = hdfs_20161211140910_4b8fa815-c8d1-407d-838f-fcb022843eb8
Total jobs = 3
Launching Job 1 out of 3
Number of reduce tasks not specified. Estimated from input data size: 33
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1481426664220_0010, Tracking URL = http://large01:8088/proxy/application_1481426664220_0010/
Kill Command = /usr/crh/current/hadoop-client/bin/hadoop job -kill job_1481426664220_0010
Hadoop job information for Stage-1: number of mappers: 10; number of reducers: 33
2016-12-11 14:09:23,426 Stage-1 map = 0%, reduce = 0%
2016-12-11 14:09:32,910 Stage-1 map = 20%, reduce = 0%, Cumulative CPU 9.65 sec
2016-12-11 14:09:38,186 Stage-1 map = 23%, reduce = 0%, Cumulative CPU 102.32 sec
2016-12-11 14:09:39,262 Stage-1 map = 33%, reduce = 0%, Cumulative CPU 130.98 sec
2016-12-11 14:09:41,367 Stage-1 map = 37%, reduce = 0%, Cumulative CPU 140.66 sec
2016-12-11 14:09:42,419 Stage-1 map = 47%, reduce = 0%, Cumulative CPU 167.49 sec
2016-12-11 14:09:44,525 Stage-1 map = 50%, reduce = 0%, Cumulative CPU 178.92 sec
2016-12-11 14:09:45,583 Stage-1 map = 60%, reduce = 1%, Cumulative CPU 211.87 sec
2016-12-11 14:09:46,641 Stage-1 map = 60%, reduce = 2%, Cumulative CPU 214.26 sec
2016-12-11 14:09:47,716 Stage-1 map = 63%, reduce = 2%, Cumulative CPU 225.78 sec
2016-12-11 14:09:48,769 Stage-1 map = 83%, reduce = 4%, Cumulative CPU 255.0 sec
2016-12-11 14:09:49,825 Stage-1 map = 87%, reduce = 7%, Cumulative CPU 265.91 sec
2016-12-11 14:09:50,879 Stage-1 map = 87%, reduce = 11%, Cumulative CPU 276.34 sec
2016-12-11 14:09:51,935 Stage-1 map = 87%, reduce = 14%, Cumulative CPU 294.43 sec
2016-12-11 14:09:52,991 Stage-1 map = 87%, reduce = 16%, Cumulative CPU 301.87 sec
2016-12-11 14:09:54,046 Stage-1 map = 87%, reduce = 17%, Cumulative CPU 310.65 sec
2016-12-11 14:09:55,101 Stage-1 map = 90%, reduce = 17%, Cumulative CPU 327.6 sec
2016-12-11 14:09:56,180 Stage-1 map = 93%, reduce = 18%, Cumulative CPU 335.48 sec
2016-12-11 14:09:57,236 Stage-1 map = 97%, reduce = 19%, Cumulative CPU 343.41 sec
2016-12-11 14:09:58,308 Stage-1 map = 100%, reduce = 21%, Cumulative CPU 352.0 sec
2016-12-11 14:09:59,362 Stage-1 map = 100%, reduce = 35%, Cumulative CPU 370.22 sec
2016-12-11 14:10:00,414 Stage-1 map = 100%, reduce = 46%, Cumulative CPU 409.78 sec
2016-12-11 14:10:01,468 Stage-1 map = 100%, reduce = 60%, Cumulative CPU 478.81 sec
2016-12-11 14:10:02,520 Stage-1 map = 100%, reduce = 69%, Cumulative CPU 547.4 sec
2016-12-11 14:10:03,573 Stage-1 map = 100%, reduce = 79%, Cumulative CPU 611.92 sec
2016-12-11 14:10:04,627 Stage-1 map = 100%, reduce = 91%, Cumulative CPU 689.11 sec
2016-12-11 14:10:08,843 Stage-1 map = 100%, reduce = 97%, Cumulative CPU 708.46 sec
2016-12-11 14:10:14,089 Stage-1 map = 100%, reduce = 100%, Cumulative CPU 718.24 sec
MapReduce Total cumulative CPU time: 11 minutes 58 seconds 240 msec
Ended Job = job_1481426664220_0010
Launching Job 2 out of 3
Number of reduce tasks not specified. Estimated from input data size: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1481426664220_0011, Tracking URL = http://large01:8088/proxy/application_1481426664220_0011/
Kill Command = /usr/crh/current/hadoop-client/bin/hadoop job -kill job_1481426664220_0011
Hadoop job information for Stage-2: number of mappers: 1; number of reducers: 1
2016-12-11 14:10:29,593 Stage-2 map = 0%, reduce = 0%
2016-12-11 14:10:37,973 Stage-2 map = 100%, reduce = 0%, Cumulative CPU 4.87 sec
2016-12-11 14:10:46,371 Stage-2 map = 100%, reduce = 100%, Cumulative CPU 9.54 sec
MapReduce Total cumulative CPU time: 9 seconds 540 msec
Ended Job = job_1481426664220_0011
Launching Job 3 out of 3
Number of reduce tasks determined at compile time: 1
In order to change the average load for a reducer (in bytes):
set hive.exec.reducers.bytes.per.reducer=<number>
In order to limit the maximum number of reducers:
set hive.exec.reducers.max=<number>
In order to set a constant number of reducers:
set mapreduce.job.reduces=<number>
Starting Job = job_1481426664220_0012, Tracking URL = http://large01:8088/proxy/application_1481426664220_0012/
Kill Command = /usr/crh/current/hadoop-client/bin/hadoop job -kill job_1481426664220_0012
Hadoop job information for Stage-3: number of mappers: 1; number of reducers: 1
2016-12-11 14:11:02,550 Stage-3 map = 0%, reduce = 0%
2016-12-11 14:11:10,974 Stage-3 map = 100%, reduce = 0%, Cumulative CPU 3.51 sec
2016-12-11 14:11:19,389 Stage-3 map = 100%, reduce = 100%, Cumulative CPU 7.33 sec
MapReduce Total cumulative CPU time: 7 seconds 330 msec
Ended Job = job_1481426664220_0012
MapReduce Jobs Launched:
Stage-Stage-1: Map: 10 Reduce: 33 Cumulative CPU: 718.24 sec HDFS Read: 2149544516 HDFS Write: 66660 SUCCESS
Stage-Stage-2: Map: 1 Reduce: 1 Cumulative CPU: 9.54 sec HDFS Read: 79780 HDFS Write: 2020 SUCCESS
Stage-Stage-3: Map: 1 Reduce: 1 Cumulative CPU: 7.33 sec HDFS Read: 6537 HDFS Write: 280 SUCCESS
Total MapReduce CPU Time Spent: 12 minutes 15 seconds 110 msec
OK
GAMES 2.0331278255E8
COMICS-GRAPHIC-NOVELS 2.0302505306E8
HISTORY 2.027993259E8
TRUE-CRIME 2.0275285849E8
BIOGRAPHY-AUTOBIOGRAPHY 2.0259484642E8
MUSIC 2.0237437571E8
LITERARY-COLLECTIONS 2.0206328537E8
COMPUTERS 2.0199008218E8
NATURE 2.0193277083E8
PERFORMING-ARTS 2.0180201212E8
Time taken: 132.131 seconds, Fetched: 10 row(s)
另外我试过set hive.mapjoin.localtask.max.memory.usage=0.99;是没有效果的,要在两个表的差距较大的时候进行测试了。