学习到 <<Hadoop in Action>> chapter 3.1时,使用hdfs java api编写FileMerge时发现报个错误,开始时觉得有点莫名其妙,后来查看api才只有原来是这样。
需求: 准备把local file system 中的某个文件夹下所有文件,合并到hdfs文件系统中一个文件里。
程序如下:
-
package
com.test.study;
-
-
import
java.io.IOException;
-
-
import
org.apache.hadoop.conf.Configuration;
-
import
org.apache.hadoop.fs.FSDataInputStream;
-
import
org.apache.hadoop.fs.FSDataOutputStream;
-
import
org.apache.hadoop.fs.FileStatus;
-
import
org.apache.hadoop.fs.FileSystem;
-
import
org.apache.hadoop.fs.Path;
-
-
public
class
FileMerge {
-
-
public
static
void
main(String[] args) {
-
Path inputDir =
new
Path(args[
0
]);
-
Path outputFile =
new
Path(args[
1
]);
-
-
try
{
-
mergeFile(inputDir, outputFile);
-
}
catch
(IOException e) {
-
e.printStackTrace();
-
}
-
}
-
-
public
static
void
mergeFile(Path inputDir, Path outputFile)
throws
IOException{
-
Configuration conf =
new
Configuration();
-
FileSystem local = FileSystem.getLocal(conf);
-
FileSystem hdfs = FileSystem.get(conf);
-
-
// open output file stream
-
FSDataOutputStream out =
null
;
-
if
(!hdfs.exists(outputFile)){
-
out = hdfs.create(outputFile);
-
}
else
{
-
System.out.println(
“output file [”
+outputFile.getName()+
“] has existed.”
);
-
return
;
-
}
-
-
FileStatus[] inputFiles = local.listStatus(inputDir);
-
FSDataInputStream input =
null
;
-
byte
[] buffer =
new
byte
[
1024
];
-
int
length = –
1
;
-
for
(FileStatus fileStatus: inputFiles){
-
if
(fileStatus.isDir()){
-
continue
;
-
}
-
input = local.open(fileStatus.getPath());
-
while
((length = input.read(buffer))>
0
){
-
out.write(buffer,
0
, length);
-
}
-
input.close();
-
}
-
-
out.close();
-
}
-
-
}
运行程序: java org.test.study.FileMerge /home/walkerJong/hadoop-1.2/logs /user/walkerJong/hadoop.log
发现报以下错误:
-
java.io.IOException: Mkdirs failed to create /user/walkerJong
-
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:378)
-
at org.apache.hadoop.fs.ChecksumFileSystem.create(ChecksumFileSystem.java:364)
-
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:564)
-
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:545)
-
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:452)
-
at org.apache.hadoop.fs.FileSystem.create(FileSystem.java:444)
-
at com.b5m.study.FileMerge.mergeFile(FileMerge.java:33)
-
at com.b5m.study.FileMerge.main(FileMerge.java:19)
百思不得解决,看了看源码,发现还是没找到错在哪里?
最后在 FileSystem hdfs = FileSystem.get(conf); 之后加了一句: System.out.println(hdfs.getName()); 打印出来的结果居然是 file:///
明显获得的FileSystem是 local file system.
错就应该错在 FileSystem hdfs = FileSystem.get(conf); 这里,因为我们想获取得到的时hdfs的文件系统,但是运行结果确实local file system.
查看FileSystem.get()相关api说明发现:
FileSystem.get(Configuration conf) 使用配置文件来获取文件系统, 配置文件conf/core-site.xml,若没有指定则返回local file system. (原来是这样)
FileSystem.get(URI uri, Configuration conf) 根据uri和conf来确定文件系统。
修改程序: FileSystem hdfs = FileSystem.get(URI.create(“hdfs://localhost:9000/”), conf);
再次运行程序,OK;
参考:
http://blog.csdn.net/walkerJong/article/details/37763777