Virginia T. Dobey
Peter L. Eirich
Keywords: interoperability, physical environment, natural environment, environmental representation, EDM, environmental data model, data model, normalization
ABSTRACT: During the past several years, the concept of an “Environmental Data Model” (EDM) has arisen within the simulation community for the development of environmental representations. An EDM, in the form of a logical data model, describes the environmental data elements found in either a specific environmental data source or in an application. This paper cites studies and experiments revealing that EDMs, in the way they are presently constituted, and even when supplemented by tools such as equivalence classes and environmental ontologies, were not adequate to permit computer-based analyses to fully discover the potential extent of application interoperability that could be expected. The paper then explores the potential for applying data normalization principles to the EDMs, converting them into a set of more rigorously-defined and application-neutral logical data models, in order to increase the extent of interoperability relationships that can be derived, using automated procedures, from the EDMs.
NOTE: This short paper is a précis of 04S-SIW-114, “Environmental Data Models: Necessary but Not Sufficient for Interoperability”, from the 2004 Spring Interoperability Workshop held in Arlington, VA. To reduce length, the footnotes and the majority of the references from the original paper have been eliminated in this web-based version. Please refer to the full paper on the SISO web site: www.sisostds.org/doclib/doclib.cfm?SISO_RID_1005585
1. Interoperability
The general focus of this paper is interoperability with regard to exchanges of descriptions of the environment among computing applications. The specific focus is evaluating whether or not Environmental Data Models (EDMs), as presently constituted, are a proper tool to address and assess such interoperability. This paper also suggests ways in which the extent of practical interoperability among the EDMs might be improved.
2. The Environmental Data Coding Specification
One significant effort developed to achieve interoperable environmental data exchange is the SEDRIS set of technologies and their corresponding suite of international standards. These include the Environmental Data Coding Specification (EDCS), which facilitates the interoperable exchange of environment descriptions and data by specifying a common set of environmental concepts in a dictionary format. The EDCS concepts can be used for describing the semantics of environmental objects and their characteristics, independent of any specific representation of such data.
3. EDMs
An EDM is a Logical Data Model that describes the data contents either of a specific environmental data source, reflecting the way it is organized for transmission (product), or of the environmental elements found within an application, reflecting the manner in which it is implemented. Because the use of an application-independent dictionary of concepts is essential to determine data interoperability, EDMs use the EDCS as their underlying dictionary of concepts. Over forty EDMs have been prepared [1].
By using a common style and notation, EDMs are intended to simplify the analysis of data requirements and data interoperability in specific systems and data products. “EDM operations are facilitated by the Common Data Model Framework (CDMF), a collection of tools that support the maintenance, analysis and comparison of EDMs.”[1]
4. Interoperability analysis of terrain-based EDMs
Reference [1] provides some hard, quantitative numbers for identification interoperability across multiple environmental descriptions. One analysis conducted in [1] was an examination of ten terrain-oriented EDMs derived either from military requirements for environmental data or from simulations and environmental data systems in use within DoD. Reference [1] found that very little commonality among the EDMs could be identified using only the “native” form of the EDMs, even with the advantages of having used the EDCS to provide a common, shared lexicon among all the EDMs. Although the analysts knew that all of the EDMs supported some basic environmental concepts (e.g., roads, rivers) this known commonality was not readily discoverable, for the most part, by the CDMF toolset used for the analysis. For example, there was not a single feature identified by these means to be in common among all ten (or even nine of the ten) EDMs, and only 17 of 1899 features within the analysis could be found in any six of the EDMs. This conclusion is surprising because the use of a common, application-independent set of concepts (i.e., the EDCS) for the capture of application-specific and product-specific concepts should have been expected to yield a much greater degree of identified interoperability.
Following this analysis, a similar analysis was repeated after the incorporation of: (a) an EDCS “hierarchical ontology” derived from the concept cross-references found within the text of the EDCS concept definitions, (b) manually identified equivalence classes, and (c) numerical confidence levels for matching [1]. As a result, the identified commonality greatly increased. As reference [1] indicates: “Now, 28 features are common to all EDMs, 37 are common to nine, and only 66 are unique to a single EDM.” This relative improvement resulting from these three CDMF enhancements attests to the potential payoff from employing one or more of these three added elements:
However, even this net improvement was still not sufficient for practical applications. As reference [1] indicates: “Even using the Equivalence Classes and Ontology to grow the intersection of these EDMs, the resulting REDM is far too small to be useful as a Reference EDM for either M&S or C4I.” Clearly, the problem of achieving environmental data interoperability embodies some significant challenges. A DMSO-sponsored study [2] indicated that the context-specific nature of EDM contents was a contributor for the extent of overlap discovered among the environmental feature elements in the EDMs, as identified in the reference [1], not being as great as might have been expected.
The “state-of-practice” technologies for environmental data interoperability, however useful they may be, are evidently not sufficient -- when applied to the set of EDMs -- for adequately determining the potential for interoperability inherent among the applications reflected by the EDMs. However, the work reported in reference [1] does indicate that some combination of incremental technical enhancements can, at least, substantially improve the relative extent to which this interoperability can be established. Virtually all of the results should further be improved by implementing a consistent methodology for converting the EDMs into application-independent concepts.
5. Normalization Applied to Logical Data Models and EDMs
The classical “logical” relational data model remains the most effective application-independent means for the presentation of data to be interchanged among, or used in common by, multiple applications. The model must be consistent across all application areas, and extensible such that new data can be defined without altering previously defined data. Normalization, a long-established technique in the area of database design, appears to be applicable for improving the usability of Logical Data Models as a basis for determining the potential for data interoperability.
Normalization is the process of reworking an application-independent relational data model into a form in which each entity captures only one concept, and each attribute describes a single characteristic of its entity. Normalization ensures that each “fact” is located in only one place in the database in order to ensure consistency of data access and updating. Applying the principles of data structure normalization to the application-specific EDM logical models, while by no means providing a “cure-all” for achieving interoperability among the applications on which the EDMs are based, should nevertheless provide an important step toward achieving interoperability.
6. Normalization Impacts on EDMs
Regardless of how beneficial normalization might be in principle, if applying it in practice to the existing set of EDMs were to cause little substantial change in the structure and organization of the EDMs, then, one could argue that there could be little ultimate benefit from pursuing EDM normalization. In an effort to determine the extent of the potential benefits for an EDM provided by normalization, the authors attempted to normalize the August 2003 version of the Ultra-High Resolution Building (UHRB) EDM for the OneSAF Objective System, based on standard information engineering design principles. Then, the authors examined and compared both the original and normalized versions of the model to determine the extent of changes that were due to the normalization procedure. Conclusion: a significant restructuring was found to occur in the model as a result of normalization.
This result is not surprising, considering that the actual purpose of an EDM has been to document the manner in which the environment is represented within either a model, a simulation, or a producer data source. As such, an EDM may be expected to track closely the data organization and structures used within that application or data source. Such structures are typically application-performance-oriented by design, and are therefore often not normalized.
7. Summary and Implications
The completion to date of over forty EDMs accomplishes a valuable step toward achieving environmental data interoperability by providing documentation where none otherwise existed. By documenting application-specific database content and application design information that has, historically, been unavailable by any other means, the set of EDMs serves a necessary function.
Unfortunately, even though documentation of internal application data, as presented by EDMs, is valid as a step toward data interoperability, the evidence reviewed in this paper indicates that EDMs do not go far enough to put interoperability within reach. To provide a sufficient basis for environmental data interoperability, EDMs must move forward from their current application-specific, aggregated-object character to an application-independent, relational representation of data, one that is constructed at a more fundamental, more “atomic” level of information representation.
References
[1] Dr. Dale D. Miller, Annette Janett, Melissa E. Nakanishi: “Environmental Data Modeling: REDM, DREDM, Ontology and Metrics”, Paper No. 03S-SIW-132 in Proc. Simulation Interoperability Workshop, Orlando, FL March, 2003. [2] EDCS Special Study (internal working documents), Defense Modeling and Simulation Office (DMSO), August-December 2003. [Study team: Dr. Robert Richbourg (chair), Virginia Dobey, Peter Eirich, Peggy Gravitz (supporting), Annette Janett.]
Author Biographies
VIRGINIA T. DOBEY is a former Navy Commander (Special Duty—Oceanography) who has been involved in environmental software development, testing, and implementation since 1975. Instrumental in initiating the incorporation of environmental data into the Department of Defense data standardization program, she has been involved in U.S. and international data standardization efforts since 1993. Ms. Dobey holds a Master of Science in Meteorology and Oceanography degree from the Naval Postgraduate School.
PETER L. EIRICH is a member of the Senior Professional Staff at The Johns Hopkins University Applied Physics Laboratory (JHU/APL), where he works on projects involving environmental data standards, product data modeling, M&S strategies, and VV&A. He received B.S. and M.S. degrees in Electrical Engineering from MIT in 1970, an Engineer’s Degree in Electrical Engineering from MIT in 1974, and a Master of Science in Management from MIT’s Sloan School of Management in 1974.