一.在Hive中创建存放要处理的数据的表$hive> create table textlines(line string) 导入数据:$hive> load data inpath ’tmp/input’ overwrite into table textlines;如果想从本地导入,可以使用这个load data local inpath '...',这里的就会是这样的/tmp/input/wordcount。 查看是否导入成功:$hive> select * from texlines; 二.用HIVE实现map/reduce的计算 下面是HIVE编程指南给出的word count HQL:SELECT word,count(1) AS count FROM (SELECT explode(split(,'[ \t]+') ) AS word FROM docs ) w GROUP BY word ORDER BY word; 我讲上面语句拆分成两条语句: SELECT explode(split(line,’\s’)) AS word FROM docs;SELECT word,count(1) FROM w GROUP BY word ORDER BY word; 所以我先创建一个words表,存放第一条查询的结果:$hive> create table words(word STRING);$hive> insert overwrite table words select explode(split(line,'\s’)) word from textlines; 注:关键语句是select explode(split(line,'\s’)) 这句将处罚Hadoop进行MapReduce计算,explode()函数表示将字符串分割为数组,\s是正则,表示空白字符,split(line,'\s’)就是将line(textiles表的字段)按照正则\s分割,然后explode函数将KEY相同的value存在数组里。这HQL条语句相当于Hadoop中的Map函数。 现在来处理第二条HQL语句:SELECT word,count(1) FROM w GROUP BY word ORDER BY word;$hive> SELECT word,count(1) FROM words GROUP BY word ORDER BY word;注:这条语句相当于Hadoop中的reduce函数。 三.将我们处理获得的结果导出:$hive> INSERT OVERWRITE LOCAL DIRECTORY ‘/tmp/wordcount_result’ SELECT word,count(1) FROM words GROUP BY word ORDER BY word;LOCAL去掉表示导出到HDFS的目录; 下面就用Sqoop将HDFS上的结果存储到mysql上了