Overview#

Data-lake is a method of storing data within a system that facilitates the collocation of data in variant schematas and structural forms, usually object blobs or file System.

The concept of Data-lake is to have a single data Store of all data in the organizational Entity ranging from raw data (which implies exact copy of source Data Store) to transformed data which is used for various tasks including reporting, visualization, Data Security Analytics and Machine Learning.

Data-lake includes structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and even binary data (images, audio, video) thus creating a centralized data Store accommodating all forms of data.

Data-lake is an organizational Entity resource

One example of a Data-lake is the Hadoop Distributed File System used within Apache Hadoop

Generally a Data-lake has less structure than a Data warehouse but there is not "formally" defined difference.

Apache Hadoop, Azure Storage and the Amazon Simple Storage Service platform can be used to build Data-lake repositories.

More Information#

There might be more information for this subject on one of the following:

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-6) was last changed on 30-Jul-2017 13:20 by jim