back to briefing papers

Data Warehousing & BIS

by Stuart Cooke

Introduction

This paper is intended to provide business and IS managers with an insight into the realities of Data Warehousing. The paper starts by explaining what Data Warehousing entails and why there is now so much interest in it. It then goes on to provide advice on the major benefits and pitfalls, finally outlining CEC Europe’s approach to a) defining a Data Warehousing strategy, and b) how to go about vendor selection and implementation.

 New Data for Old

The flattening of organisational structures which has taken place in most companies over the past three years and the move towards collaborative working mean that more employees than ever before are directly involved in corporate decision making. Unfortunately, the vital information needed to support this process has been far from comprehensive as traditional methods of information retrieval have failed to keep pace and existing data sources have proved conflicting and unreliable. Recently however, the convergence of these cultural shifts and the appearance of new data storage and analysis technologies has stirred great interest in Data Warehousing. For the first time organisations are starting to move away from the use of technology simply to obtain ever greater operational efficiency to using it to achieve greater flexibility, sharper market response and as a means of innovation.

What is Data Warehousing?

Data Warehousing is not new; the concept has been around for many years but only recently has the technology been available to support it. Data Warehousing is the embodiment of two simple concepts:

  • The integration of a wide variety of corporate and external data sources to provide a consistent, accurate, logical, shared picture of business information.

  • The provision of flexible Decision Support Tools to analyse that information.

Conceptually, a Data Warehouse looks like this:

The Data Warehouse

Information Sources always include the core operational systems which form the backbone of day-to-day activities, such as accounting, order entry and inventory control. It is these systems which have traditionally provided management information to support decision making. There is however a limit to what they can do in this area as they are always designed first and foremost for operational performance and tend not to provide information in a format useful for interpretative analysis.

Decision Support Tools are used to analyse the information stored in the warehouse, typically to identify trends and new business opportunities. These tools could be a simple spreadsheet, a graphical query tool or even a neural network. Neural nets are often able to identify associations, trends and niches which the end-user would never have thought of looking for themselves.

The Data Warehouse itself is the bridge between the operational systems and the decision support tools. It holds a copy of much of the operational system data in a logical structure which is more conducive to analysis. The Data Warehouse, which will be refreshed in scheduled bursts from operational systems and from relevant external data sources, provides a single, consistent view of corporate data, leaving operational systems unaffected.

Benefits

    • Data Warehouses are notoriously difficult to cost justify. The primary benefit of a Data Warehouse is the capability for better decision making; this is pretty difficult to quantify at the outset of a Data Warehousing project. The major benefits that will accrue therefore are likely to be longer term and implementors should not generally expect to see direct short term cost benefits. There are, however, some very good reasons for adopting Data Warehousing, not least the fact that your business is likely to be left out in the cold if you don’t:
      Data Warehousing delivers the capability for better, faster decision making.

    • Data Warehousing enables the identification of business trends, patterns and new opportunities. Organisations that have adopted Data Warehousing have been able to identify new markets and opportunities that were previously hidden from them.
    • Because Data Warehousing enables a better understanding of an organisation’s information, it focuses attention on the quality of that data and provides the business with a real incentive to improve it.
    • Because operational systems are relieved of the need to support enquiries, they become quicker and easier to operate.
    • Because the Data Warehouse is not required to support operational systems, high volumes of operational transactions will not slow its response.
    • Because operational systems are relieved of the need to support enquiries, they can be developed and delivered more quickly.
    • Because operational systems are relieved of the need to support enquiries, they are no longer impacted by changes in the way in which information is analysed and delivered.

    Risks & Pitfalls

    Experience suggests:

      • A Data Warehouse can consume a great deal of computing and human resources: more than most projects anticipate.

      • A Data Warehouse development which has too broad a scope and therefore too long a lead time will suffer from a loss of momentum and credibility which will be hard to recover.
      • A Data Warehouse established on an inadequately defined business information model (meta-model) will fail.
      • A Data Warehouse established without the full support and involvement of the business community will fail.
      • A Data Warehouse project that does not fully address the inevitable and fundamental issue of business data ownership will fail.
      • The learning curve for the users of a Data Warehouse can be steep. Typically, it must include an understanding of new concepts, terminology, data structures and data definitions.
      • The integration of the multiplicity of data sources which exist in most organisations can be problematic at several levels: Physical connectivity, conflicting data definitions, timeliness and the quality of the information itself.
      • There is a multiplicity of vendors offering partial or total Data Warehousing solutions. A Data Warehousing project which selects too many vendors on the basis that each one is ‘best of breed’ in their respective field is likely to fail.

      Emerging Best Practice

      Experience suggests:

      • You should be careful to ensure your development is not over ambitious. Given the lead times associated with delivery of business benefits, consider creating subsets of the ultimate warehouse focused around specific subject areas such as customers or products. These subsets, known as Data Marts, can be implemented incrementally within the context of a commonly understood meta-model.

      • You should always ensure that the project is sponsored by a senior business manager.
      • You should involve the business community from the outset, ensuring strong participation from every corner of the business, even those who are not expected to be users at the outset.
      • You should focus the implementation around a commonly agreed and understood Business Information Model.
      • You should address the issues of data ownership and terminology as they occur, and not attempt to defer them to a later date.
      • You should take the opportunity to improve the quality of the data which resides in your operational systems. It is not uncommon to offer financial incentives to staff to improve the quality of information on, for example, customers or suppliers prior to implementation.
      • You should go to great lengths to minimise the actual number of data sources which have to be accessed by the warehouse.
      • You should strike a balance between the number of vendors from which you could source components and the number you actually need. Going for a single vendor solution is not however always the best option.

      The CEC Approach

      A successful outcome requires a structured approach:

      CEC Approach to Data Warehousing

      Step 1: Visioning
      Establish the business objectives to be addressed, the key performance indicators to be achieved and the business benefits to be delivered. Involve the Executive Sponsor and a Visioning Team of senior business and IT managers.

      Step 2: Technology Analysis
      Examine all potential data sources, both internal and external. Determine the optimum set and identify the steps that will be required for cleaning, enhancement and extraction.

      Step 3: Solution Options
      Review the marketplace to identify the likely providers of hardware, software and services.

      Step 4: Information Analysis
      Create a comprehensive Business Information Model for the warehouse. Estimate the frequency, volumes and ownership of data going into the warehouse. Involve a diverse selection of participants from the business community and obtain consensus.

      Step 5: Solution Evaluation
      Evaluate vendors to determine best fit.

      Step 6: Implementation Programme
      Develop an implementation programme and deliver.

      G lossary
       

      Data Mining The progressive analysis of data by identifying associations, patterns and clusters.
      Datamart A small Data Warehouse, typically focused around a single subject area such as customers or products.
      DBMS Database Management System: The complete collection of software that manages access to, and updating of, a database.
      DSS Decision Support System: A refined management information system where the emphasis is on providing business managers with key information for corporate decision making.
      EIS Executive Information System: A component of a DSS which intelligently identifies business critical information, trends and patterns by applying criteria and conditions to data sources. The EIS can automatically notify the user when these criteria have been met.
      Information Catalogue An aid to locating information in the Data Warehouse such as pre-defined queries and reports. The catalogue could also invoke a number of DSS tools.
      Metadata The data definitions of the Data Warehouse. This describes information on the origin and history of the data in the warehouse.
      Middleware The software and hardware which acts as a glue cementing the components of a Data Warehouse together.
      MPP Massively Parallel Processors: A computer with multiple processors, each with its own memory and disk storage units.
      Neural Net Intelligent software which allows a system to learn to recognise features or characteristics of situations. Neural Nets could be used to identify patterns and trends in data.
      OLAP OnLine Analytical Processing: DSS which provides multidimensional query and analysis facilities.
      PRDBMS Post Relational Database Management System: As RDBMS but implemented on an architecture purpose built for fast data access.
      RDBMS Relational Database Management System: A DBMS that provides tools for joining tables together and selecting items from within tables.
      Scrubbing The cleaning of source data to remove inconsistencies or inaccuracies.
      SMP Symmetric Multi-Processors: A computer with multiple processors that have a common memory and share disk storage units.
      SQL Structured Query Language: A simplified programming language for querying databases.

      For more information please e-mail us or call Consulting on 44 (0) 20 7251-4646  

      back to briefing papers
      © Copyright 1998, CEC Europe Limited