hadoop验证

  |   0 评论   |   1,157 浏览

背景

安装完成了CDH版本的hadoop,需要验证下安装是否正确。

HDFS

MR验证

准备文件

echo  "Hello World, Gooebyd World. " > file0
echo  "Hello Hadoop, Goodbye Hadoop. " > file1

建立目录

sudo -u hdfs hadoop fs -mkdir /user/test
sudo -u hdfs hadoop fs -chown test /user/test

上传文件

hadoop fs -put file0 /user/test/wordcount/input
hadoop fs -put file1 /user/test/wordcount/input

执行MR作业

hadoop jar /home/admin/hadoop/cloudera/parcels/CDH-5.14.2-1.cdh5.14.2.p0.3/jars/hadoop-examples.jar wordcount /user/test/wordcount/input /user/test/wordcount/output

查看结果

hadoop fs -getmerge wordcount/output output.txt
$cat output.txt
Hadoop,	1
Hadoop.	1
Goodbye	1
World,	1
World.	1
Gooebyd	1
Hello	2

HIVE 验证

准备文件

echo "Hello, Hive" > file2

建立目录

hadoop fs -mkdir -p hive/input

上传文件

hadoop fs -put file2 hive/input

建表

hive

hive> create external table test(
    >     name string,
    >     company string
    > ) row format delimited
    > fields terminated by ','
    > location '/user/test/hive/input';
OK
Time taken: 0.317 seconds

查看表

hive> show tables;
OK
test
Time taken: 0.072 seconds, Fetched: 1 row(s)

查看表结构

hive> desc test;
OK
name                	string
company             	string
Time taken: 0.54 seconds, Fetched: 2 row(s)

查看表内容

hive> select * from test;
OK
Hello	 Hive
Time taken: 0.425 seconds, Fetched: 1 row(s)

执行SQL

`$hive -e "select count(*) from test" 2>/dev/null`
1

删除表

$hive -e "drop table test"

Hbase 验证

TBD

Impala Hive 验证

进入shell: impala-shell

建库

> create database test;
Query: create database test
Fetched 0 row(s) in 0.39s
> use test;
Query: use test

建表

> create table t1 (x int);
Query: create table t1 (x int)
Fetched 0 row(s) in 0.72s

插入数据

> insert into t1 values (1), (3), (2), (4);

查询数据

> select * from t1;

结果

+---+
| x |
+---+
| 1 |
| 3 |
| 2 |
| 4 |
+---+
Fetched 4 row(s) in 0.48s
> select min(x), max(x), sum(x), avg(x) from t1;
+--------+--------+--------+--------+
| min(x) | max(x) | sum(x) | avg(x) |
+--------+--------+--------+--------+
| 1      | 4      | 10     | 2.5    |
+--------+--------+--------+--------+
Fetched 1 row(s) in 0.14s

join操作

> create table t2 (id int, word string);

Fetched 0 row(s) in 0.39s
> insert into t2 values (1, "one"), (3, "three"), (5, 'five');

Modified 3 row(s) in 4.19s
> select word from t1 join t2 on (t1.x = t2.id);

+-------+
| word  |
+-------+
| one   |
| three |
+-------+
Fetched 2 row(s) in 0.45s
> select count(distinct word) from t2;

+----------------------+
| count(distinct word) |
+----------------------+
| 3                    |
+----------------------+
Fetched 1 row(s) in 0.24s

更多的见impala初体验

Impala Kudu 验证

建库

> CREATE DATABASE impala_kudu;

Fetched 0 row(s) in 0.25s
> USE impala_kudu;

Fetched 0 row(s) in 0.00s

建表

CREATE TABLE my_first_table (
  id BIGINT PRIMARY KEY,
  name STRING
)
PARTITION BY HASH PARTITIONS 5
STORED AS KUDU  
TBLPROPERTIES('kudu.master_addresses' = 'kudu.master.test.com:7051') ;

结果
Fetched 0 row(s) in 1.72s

写入

> insert  into my_first_table values(100,'张三');

Modified 1 row(s), 0 row error(s) in 4.07s

查询

select * from my_first_table;

+-----+------+
| id  | name |
+-----+------+
| 100 | 张三 |
+-----+------+
Fetched 1 row(s) in 1.06s

FAQ

Q: 时区问题:
A:
默认impala配置不是中国的时区,所以在用from_unixtime的时候,有误差。 解决方案:impala启动时加 -use_local_tz_for_unix_timestamp_conversions=true

在cdh里面 impala->配置->impala Daemon ->Impala Daemon 命令行参数高级配置代码段(安全阀)

 -use_local_tz_for_unix_timestamp_conversions=true

参考

评论

发表评论

validate