OLAP (On-line Analytical
Processing) is characterized by relatively low volume of
transactions. Queries are often very complex and involve aggregations. For OLAP
systems a response time is an effectiveness measure. OLAP applications are
widely used by Data Mining techniques. In OLAP database there is aggregated,
historical data, stored in multi-dimensional schemas (usually star schema).
Multidimensional databases
Multidimensional structure is defined as “a variation of the relational
model that uses multidimensional structures to organize data and express the
relationships between data”. The structure is broken into cubes and the cubes
are able to store and access data within the confines of each cube. “Each cell
within a multidimensional structure contains aggregated data related to
elements along each of its dimensions”. Even when data is manipulated it remains easy
to access and continues to constitute a compact database format. The data still
remains interrelated. Multidimensional structure is quite popular for
analytical databases that use online analytical processing (OLAP) applications
(O’Brien & Marakas, 2009). Analytical databases use these databases because
of their ability to deliver answers to complex business queries swiftly. Data
can be viewed from different angles, which gives a broader perspective of a
problem unlike other models.
Aggregations
It has been claimed that for complex queries OLAP cubes can produce an
answer in around 0.1% of the time required for the same query on OLTP relational data. The most important mechanism in OLAP which
allows it to achieve such performance is the use of aggregations. Aggregations are built from the fact table by
changing the granularity on specific dimensions and aggregating up data along
these dimensions. The number of possible aggregations is determined by every
possible combination of dimension granularities.
The combination of all possible aggregations and the base data contains the
answers to every query which can be answered from the data.
Because usually there are many aggregations that can be calculated, often
only a predetermined number are fully calculated; the remainder are solved on
demand. The problem of deciding which aggregations (views) to calculate is
known as the view selection problem.
View selection can be constrained by the total size of the selected set of
aggregations, the time to update them from changes in the base data, or both.
The objective of view selection is typically to minimize the average time to
answer OLAP queries, although some studies also minimize the update time. View
selection is NP-Complete. Many approaches to the problem
have been explored, including greedy
algorithms, randomized search, genetic
algorithms and A* search algorithm.
Types
OLAP systems have been traditionally categorized
using the following taxonomy.
Multidimensional
'MOLAP' is the 'classic' form of OLAP and is
sometimes referred to as just OLAP. MOLAP stores this data in an optimized
multi-dimensional array storage, rather than in a relational database.
Therefore it requires the pre-computation and storage of information in the
cube - the operation known as processing.
Relational
ROLAP
works directly with relational databases. The base data and the dimension
tables are stored as relational tables and new tables are created to hold the
aggregated information. Depends on a specialized schema design.This methodology
relies on manipulating the data stored in the relational database to give the
appearance of traditional OLAP's slicing and dicing functionality. In essence,
each action of slicing and dicing is equivalent to adding a "WHERE"
clause in the SQL statement.
Hybrid
There is no clear agreement across the industry as
to what constitutes "Hybrid OLAP", except that a database will divide
data between relational and specialized storage. For example, for some vendors,
a HOLAP database will use relational tables to hold the larger quantities of detailed
data, and use specialized storage for at least some aspects of the smaller
quantities of more-aggregate or less-detailed data.