Search notes:


HDFS stands for Hadoop distributed file system.
HDFS is designed to solve the limitations of NFS. It can store large amounts of data. It is fault tolerant (if a machine in the cluster fails, it causes no disruption of service). If too many clients try to access the data resulting in a performance bottleneck, adding more nodes to the cluster remedy it. And finally, HDFS integrates well with Hadoop MapReduce.
HDFS is modelled after GFS (Google Files System).


Files are optimzed for high-throughput sequential reads and writes over large files.


Inefficient for small files (but the goal was to handle big files anyway).
Lack for (transparent) compression.
No random access to files: data can only appended to files.

See also

Oracle has a driver to access HDFS data in external tables.
SQL Server: Big Data Clusters
