Data Warehousing & BIS
by Stuart Cooke
Introduction
This paper is intended to provide business and IS managers with an insight into the realities of Data Warehousing. The paper starts by explaining what Data Warehousing entails and why there is now so much interest in it. It then goes on to provide advice on the major benefits and pitfalls, finally outlining CEC Europes approach to a) defining a Data Warehousing strategy, and b) how to go about vendor selection and implementation.
New Data for
Old
The flattening of organisational structures which has taken
place in most companies over the past three years and the move
towards collaborative working mean that more employees than ever
before are directly involved in corporate decision making.
Unfortunately, the vital information needed to support this
process has been far from comprehensive as traditional methods of
information retrieval have failed to keep pace and existing data
sources have proved conflicting and unreliable. Recently however,
the convergence of these cultural shifts and the appearance of
new data storage and analysis technologies has stirred great
interest in Data Warehousing. For the first time organisations
are starting to move away from the use of technology simply to
obtain ever greater operational efficiency to using it to achieve
greater flexibility, sharper market response and as a means of
innovation.
Data Warehousing is not new; the concept has been around for many years but only recently has the technology been available to support it. Data Warehousing is the embodiment of two simple concepts:
-
The integration of a wide variety of corporate and external data sources to provide a consistent, accurate, logical, shared picture of business information.
-
The provision of flexible Decision Support Tools to analyse that information.
Conceptually, a Data Warehouse looks like this:

Information Sources always include the core
operational systems which form the backbone of day-to-day
activities, such as accounting, order entry and inventory
control. It is these systems which have traditionally
provided management information to support decision
making. There is however a limit to what they can do in
this area as they are always designed first and foremost
for operational performance and tend not to provide
information in a format useful for interpretative
analysis.
Decision Support Tools are used to analyse the
information stored in the warehouse, typically to
identify trends and new business opportunities. These
tools could be a simple spreadsheet, a graphical query
tool or even a neural network. Neural nets are often able
to identify associations, trends and niches which the
end-user would never have thought of looking for
themselves.
The Data Warehouse itself is the bridge between
the operational systems and the decision support tools.
It holds a copy of much of the operational system data in
a logical structure which is more conducive to analysis.
The Data Warehouse, which will be refreshed in scheduled
bursts from operational systems and from relevant
external data sources, provides a single, consistent view
of corporate data, leaving operational systems
unaffected.
-
Data Warehouses are notoriously difficult to cost justify. The primary benefit of a Data Warehouse is the capability for better decision making; this is pretty difficult to quantify at the outset of a Data Warehousing project. The major benefits that will accrue therefore are likely to be longer term and implementors should not generally expect to see direct short term cost benefits. There are, however, some very good reasons for adopting Data Warehousing, not least the fact that your business is likely to be left out in the cold if you dont:
Data Warehousing delivers the capability for better, faster decision making. - Data Warehousing enables the identification of business trends, patterns and new opportunities. Organisations that have adopted Data Warehousing have been able to identify new markets and opportunities that were previously hidden from them.
- Because Data Warehousing enables a better understanding of an organisations information, it focuses attention on the quality of that data and provides the business with a real incentive to improve it.
- Because operational systems are relieved of the need to support enquiries, they become quicker and easier to operate.
- Because the Data Warehouse is not required to support operational systems, high volumes of operational transactions will not slow its response.
- Because operational systems are relieved of the need to support enquiries, they can be developed and delivered more quickly.
- Because operational systems are relieved of the need to support enquiries, they are no longer impacted by changes in the way in which information is analysed and delivered.
-
A Data Warehouse can consume a great deal of computing and human resources: more than most projects anticipate.
- A Data Warehouse development which has too broad a scope and therefore too long a lead time will suffer from a loss of momentum and credibility which will be hard to recover.
- A Data Warehouse established on an inadequately defined business information model (meta-model) will fail.
- A Data Warehouse established without the full support and involvement of the business community will fail.
- A Data Warehouse project that does not fully address the inevitable and fundamental issue of business data ownership will fail.
- The learning curve for the users of a Data Warehouse can be steep. Typically, it must include an understanding of new concepts, terminology, data structures and data definitions.
- The integration of the multiplicity of data sources which exist in most organisations can be problematic at several levels: Physical connectivity, conflicting data definitions, timeliness and the quality of the information itself.
- There is a multiplicity of vendors offering partial or total Data Warehousing solutions. A Data Warehousing project which selects too many vendors on the basis that each one is best of breed in their respective field is likely to fail.
-
You should be careful to ensure your development is not over ambitious. Given the lead times associated with delivery of business benefits, consider creating subsets of the ultimate warehouse focused around specific subject areas such as customers or products. These subsets, known as Data Marts, can be implemented incrementally within the context of a commonly understood meta-model.
- You should always ensure that the project is sponsored by a senior business manager.
- You should involve the business community from the outset, ensuring strong participation from every corner of the business, even those who are not expected to be users at the outset.
- You should focus the implementation around a commonly agreed and understood Business Information Model.
- You should address the issues of data ownership and terminology as they occur, and not attempt to defer them to a later date.
- You should take the opportunity to improve the quality of the data which resides in your operational systems. It is not uncommon to offer financial incentives to staff to improve the quality of information on, for example, customers or suppliers prior to implementation.
- You should go to great lengths to minimise the actual number of data sources which have to be accessed by the warehouse.
- You should strike a balance between the number of vendors from which you could source components and the number you actually need. Going for a single vendor solution is not however always the best option.
Experience suggests:
Experience suggests:
The CEC Approach
A successful outcome requires a structured approach:

Step 1: Visioning
Establish the business objectives to be addressed, the key
performance indicators to be achieved and the business benefits
to be delivered. Involve the Executive Sponsor and a Visioning
Team of senior business and IT managers.
Step 2: Technology Analysis
Examine all potential data sources, both internal and
external. Determine the optimum set and identify the steps that
will be required for cleaning, enhancement and extraction.
Step 3: Solution Options
Review the marketplace to identify the likely providers of
hardware, software and services.
Step 4: Information Analysis
Create a comprehensive Business Information Model for the
warehouse. Estimate the frequency, volumes and ownership of data
going into the warehouse. Involve a diverse selection of
participants from the business community and obtain consensus.
Step 5: Solution Evaluation
Evaluate vendors to determine best fit.
Step 6: Implementation Programme
Develop an implementation programme and deliver.
| Data Mining | The progressive analysis of data by identifying associations, patterns and clusters. |
| Datamart | A small Data Warehouse, typically focused around a single subject area such as customers or products. |
| DBMS | Database Management System: The complete collection of software that manages access to, and updating of, a database. |
| DSS | Decision Support System: A refined management information system where the emphasis is on providing business managers with key information for corporate decision making. |
| EIS | Executive Information System: A component of a DSS which intelligently identifies business critical information, trends and patterns by applying criteria and conditions to data sources. The EIS can automatically notify the user when these criteria have been met. |
| Information Catalogue | An aid to locating information in the Data Warehouse such as pre-defined queries and reports. The catalogue could also invoke a number of DSS tools. |
| Metadata | The data definitions of the Data Warehouse. This describes information on the origin and history of the data in the warehouse. |
| Middleware | The software and hardware which acts as a glue cementing the components of a Data Warehouse together. |
| MPP | Massively Parallel Processors: A computer with multiple processors, each with its own memory and disk storage units. |
| Neural Net | Intelligent software which allows a system to learn to recognise features or characteristics of situations. Neural Nets could be used to identify patterns and trends in data. |
| OLAP | OnLine Analytical Processing: DSS which provides multidimensional query and analysis facilities. |
| PRDBMS | Post Relational Database Management System: As RDBMS but implemented on an architecture purpose built for fast data access. |
| RDBMS | Relational Database Management System: A DBMS that provides tools for joining tables together and selecting items from within tables. |
| Scrubbing | The cleaning of source data to remove inconsistencies or inaccuracies. |
| SMP | Symmetric Multi-Processors: A computer with multiple processors that have a common memory and share disk storage units. |
| SQL | Structured Query Language: A simplified programming language for querying databases. |
For more information please e-mail us or call Consulting on 44 (0) 20 7251-4646
back
to briefing papers
© Copyright 1998, CEC Europe Limited