Sherlock Holmes, a classic fiction character by Sir Arthur Conan Doyle (1859-1930). Most of us are familiar with Sherlock Holmes, either by the classic novels or by the movies and the TV series. Holmes is known for his deep knowledge in a variety of fields, his ability to observe, his ability of deduction, and mostly for his logical reasoning. It is the combination of those characteristics that separate Holmes from Inspector Lestrade, Inspector Gregson, Inspector Bradstreet, and even his own partner, Dr. Watson. It is the combination of those characteristics that give him the ability to solve the most bizarre problems and some of the most cunning crimes.
Nowadays almost two centuries after Holmes’s incarnation, we may not have to investigate crimes for a living, at least not all of us, but we must solve complicated business problems.
Translated into professional environments, we may not have individually the width and breadth of knowledge Holmes has, but we have access to a vast variety of huge datasets. Datasets that grow constantly.
We may not have the chance to observe directly the proofs of the business problems we try to solve, but we usually have access to all the information needed to investigate them.
For the business problems, we must solve, both knowledge and observation are available to everyone. But knowledge and observation were also available to all the inspectors and Holmes, and not all of them solved the mysterious cases. The two other characteristics are those that make a great difference, the difference between solving the problem and not. Both those characteristics in modern data-driven professional environments can share the same name. “Data Integration”, “Data Management” are two of many names provided to the process.
The idea is simple, elementary if you prefer. From the huge available datasets, we must deduct only the necessary information needed to solve the problem. Getting more information is not just a waste of computing time and resources but it can also confuse us in ways that we cannot solve the problem or even worse to come to faulty conclusions.
After data deduction, logical reasoning is necessary for us, to combine all the provided information in a correct logical way. The plethora of available information nowadays demands from us deep knowledge of the data to combine them. Collecting data is not considered a difficult problem anymore. The are many tested and trusted ways that ensure us that the needed quantity of information will be available anytime. Every one of us interacts with systems that create structured datasets in almost any daily process. We also constantly create unstructured datasets using tons of apps and smart devices. The problem is transformed from creating bid datasets to creating useful datasets. Useful datasets that combined will drive us to the desired solutions. After the great increase of generated and provided datasets we experienced in the last decade it is now crystal clear that quality of information is more important than quantity. Data Integration is the way to secure the desired quality. Quality data will point out the solution.
As Holmes used to say, elementary my dear Watson.
Angelos Matakiadis, Senior BI Consultant, WITSIDE