Overview#

Data lake is a method of storing data within a system or repository, in its natural format, that facilitates the collocation of data in various schemata and structural forms, usually object blobs or files.

The concept of Data lake is to have a single data Store of all data in the organizational Entity ranging from raw data (which implies exact copy of source Data Store) to transformed data which is used for various tasks including reporting, visualization, analytics and Machine Learning.

Data lake includes structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and even binary data (images, audio, video) thus creating a centralized data Store accommodating all forms of data.

Data lake is an organizational Entity resource

One example of a Data lake is the Hadoop Distributed File System used within Apache Hadoop

Typically a Data lake has less structure than a Data warehouse

More Information#

There might be more information for this subject on one of the following:

Add new attachment

Only authorized users are allowed to upload new attachments.
« This page (revision-6) was last changed on 05-Jul-2017 08:22 by jim