Data management is the process of collecting, storing, organizing, maintaining, and consuming an organization’s data, in a secure, efficient and cost-effective way. Effective data management is very crucial to any business applications to provide analytical information and help drive operational decision-making.
Data management requires a combination of different functions that collectively aim to ensure that the data in corporate systems is accurate, available, and accessible.
ETL stands for extract, transform, and load.
An ETL system extracts data from one or more source systems, enforces data quality and consistency standards, transforms data so that data from different sources can combine, and finally delivers data to the target system, so that application developers can build applications and end users can make data-driven decisions.
The first phase of an ETL process involves extracting the data from the source systems. The extraction involves data validation to confirm whether the data pulled from the sources has the correct/expected values. In the second phase, the data transformation, rules are applied to the extracted data to prepare it for loading into the end target. Data cleansing is a very important function of data transformation, which aims to load only valid data to the target. One or more of the following transformation types are required to create an efficient ETL system.
The third phase loads data into the target system. Depending on the requirements, this process varies widely. As the load phase interacts with the target system, the constraints defined in the target system apply, which also contribute to the overall data quality performance of the ETL process.
Data architecture is the combination of rules, policies, standards, and models that define the type of data that an organization creates and collects, and how it is used, stored, managed, and integrated. Data architecture provides a formal approach to creating and managing the dataflow and how data is processed across an organization’s IT systems and applications.
Without the guidance of a properly implemented data architecture design, common data operations might be implemented in different ways, rendering it difficult to understand and control the flow of data within such systems. Properly executed, the data architecture phase of information system planning forces an organization to precisely specify and describe both internal and external information flows.
Data architecture must refer to all the processes and methodologies that address data at rest, data in motion, data sets and how these relate to data dependent processes and applications. It includes the primary data entities and data types and sources that are essential to an organization in its data sourcing and management needs.
Enterprise data architecture consists of three different layers or processes: