O’Leary—REAL-D: A Schema for Data Warehouses 57
and MDM. For example, in figure 4, the product dimension is a resource. Agents in the REA are
also agents in MDM.
Data warehouses require unique and different information than the REA/REAL schema. First, time
period is captured as a dimension rather than in a single attribute for time as in REA/REAL models
(McCarthy 1979, 1980, 1982; Denna et al. 1993; Hollander et al. 1996). Second, a wider range of
information about location is captured, e.g., store, city, region and district. Third, some dimensions are
not homogeneous as they are in REA/REAL models, in that agents are mixed with locations (e.g.,
figure 4). Fourth, the concern of the REA model with economic unit was subordinated to agent and was
one of control, whereas the primary concern of data warehouses is one of marketing information. The
remainder of this section addresses each of these differences.
Time Period as a Dimension
In figure 4, an entire dimension defines time periods. The existence of this data as a dimension does
not necessarily increase data entry demands. For example, some of data in the dimensions will be
generated automatically such as calendar conversions for day to week to month to quarter to year,
which lets rollups be automatic (Raden 1996a). In addition, special-case time ranges, including promo-
tion periods and seasons, can be built into the database. As a result, REA-based databases could form
the basis of replicated databases in data warehouses, where the replication automatically expands the
available data to a broader based schema as in figure 4.
Location as a Dimension
In data warehouses, location (e.g., market) is often a dimension. Although Denna et al. (1993) and
Hollander et al. (1996) suggest that if location can be inferred then information about it does not need to be
captured, the size of the data warehouse and the need for rapid query response argue for explicit rather
than inferred information to facilitate rollups, even if the information is redundant or derivable. Location
information can be generated automatically, in the same manner as time information. As a result, if store
numbers or register numbers are unique, they can be used to generate location information.
Homogeneity of Dimensions in REA and Data Warehouse Models
A schema table for a dimension (resource, event, agent, location) is defined as totally homoge-
neous if that table contains only information directly relating to that dimension (e.g., resources) and not
any other dimensions (e.g., agents). While the REA/REAL literature illustrates resources, agents, and
locations as totally homogeneous, under the data warehouse formulation the dimension tables are not
totally homogeneous according to the same criteria. For example, in figure 6a the dimension for store
key contains information about agents and locations. Why does the data warehouse version not main-
tain resource, agent, and location homogeneity? In these examples (e.g., Meredith and Khader 1996),
there is direct concern for being able to answer marketing queries directly (e.g., about the sales organi-
zation marketing representative, office, district and region). Therefore, the data warehouse design schema
concern is not with the homogeneity of agent or location information but instead with cumulating
nonhomogeneity, which, like time, cascades into definable categories. For example, the salesperson is
the lowest level in sales, all the sales personnel in a store have sales that accumulate to the store sales, all
the store sales accumulate to the district sales, all the district sales accumulate to all the region sales, etc.
Consequently, a dimension has cumulating nonhomogeneity if for some resource, agent, or location,
data are included for more than one dimension (such as sales personnel or store location).
Incorporating cumulating nonhomogeneity in the schema minimizes the number of physical joins
that must be made and provides a simpler schema (e.g., Raden 1996a). In the context of drill down for
location and time breakdowns for very large databases, minimizing physical joins can be a critical
design objective. The notion of cumulating nonhomogeneity can be extended to other dimensions re-
lated to other events such as purchasing.
On the one hand, employing cumulating nonhomogeneity could be viewed as an implementation
compromise because it intermixes information, e.g., agent and location information (McCarthy and