Data-lake is a method of storing data within a system that facilitates the collocation of data in variant schematas and structural forms, usually object blobs or file System.

Data-lake has no definitive meaning and is a Buzzword

The concept of Data-lake is to have a single data Store of all data in the organizational Entity ranging from raw data (which implies exact copy of source Data Store) to transformed data which is used for various tasks including reporting, visualization, Data Security Analytics and Machine Learning.

Data-lake includes structured data from relational databases (rows and columns), semi-structured data (CSV, logs, XML, JSON), unstructured data (emails, documents, PDFs) and even binary data (images, audio, video) thus creating a centralized data Store accommodating all forms of data.

Data-lake offers unified Object storage where the individual objects schema is not part of the Data-lake

Data-lake is an organizational Entity resource

One example of a Data-lake is the Hadoop Distributed File System used within Apache Hadoop

Generally a Data-lake has less structure than a Data warehouse but there is not "formally" defined difference.

Apache Hadoop, Azure Storage, Google Cloud Storage and the AWS Simple Storage Service (S3) platform can be used to build Data-lake repositories.

More Information#

There might be more information for this subject on one of the following: