U.S. Department of Agriculture

Departmental Management

USDA Seal

USDA Data Repository Working Group Report


JULY 1995

INTRODUCTION

Mission

To build a base of information about data repositories that will help frame alternatives for a USDA data repository and potential transition scenarios.

Scope

The Data Repository Working Group adhered to USDA policy and standards, such as DR 3400-4 "Data Administration Policy" and the USDA Data Elements Standards Manual.

The efforts of the working group were exercised within the guidelines of Federal policy and standards, such as those developed by NIST, OMB, and GSA.

The results of this effort will be factored into the final USDA Data Management Strategic Plan.

Membership

Members of the Data Repository Working Group were selected from the Data Management Sub-Council by the Data Management Sub-Council Co-chairs, Bob Whiting, OIRM and John Antlitz, APHIS. Kathleen Rundle, OIRM was appointed Working Group Leader.

Working Group members are Larry Heffel, FCS; Dan Keller, FS; Jeff Kerby, CFSA; John King, FISVIS; Bob Larson, FS; Ed McLaughlin, OP/MAP; Bill Pettingill, NFC; Alan Rich, FCS; Kathleen Rundle, OIRM/NCC; and Paul Whitmore, InfoShare.

Responsibilities and Authorities

The Working Group has prepared this report which describes the current data management environment within USDA and provides an outline of how a centralized USDA Data Repository can be beneficial to the USDA business community.

Definition of Terms

The following are definitions of data dictionaries and other metadata collection tools used in data administration. Some of these topics are still in a state of flux and mean different things to different vendors.

A Data Dictionary is a tool which lists all data items that are used, their definitions, how and where they are used, and who is responsible for them.

Data Encyclopedias grew out of Data Dictionaries. They contain more than just data because they were developed by CASE tool vendors to comprise modeling information. They usually have graphical modeling tools, frequently including code generators. They tend to be developed by vendors to support their own modeling tools and therefore are not open to other vendors.

The Data Repository grew out of the data encyclopedias and is a "hub" communicator between all design, development, administration and graphic reporting tools. Repositories tend to be independent and open to many vendors. Repositories are more powerful than a Data Dictionary and more open than an Encyclopedia.

A Data Warehouse is a database that contains a snapshot or instance of the actual data where reporting needs can be satisfied. Data Warehouses generally contain extracted and/or summarization of production data for these purposes. So, the data warehouses are files or databases, not metadata information about them.

[TOP]

Current Environment

The Data Management Working Group attempted to contact a Sub-Council member from each agency/area to obtain data history throughout USDA. The representatives were asked to provide a history of data administration and the current state of data administration in their agencies. The following represents the current data administration environment within agencies that responded to the request by the Data Repository Working Group.

Consolidated Farm Service Agency

The Federal Crop Insurance Reform and Department of Agriculture Reorganization Act of 1994, Public Law No. 103-354, signed by the President on October 13, 1994, permitted the Secretary of Agriculture to accomplish the reorganization of USDA. The Consolidated Farm Service Agency (CFSA) was established having responsibility for agricultural price and income support programs, production adjustment programs, and the conservation reserve and agricultural conservation programs formerly performed by the Agricultural Stabilization and Conservation Service (ASCS), supervision of the Federal Crop Insurance Corporation (FCIC), farm-related agricultural credit programs formerly performed by the Farmers Home Administration (FmHA), and such other programs related to farm services as are assigned. Therefore, the CFSA data administration history is a combined background of FCIC and the farm service related programs of ASCS and FmHA.

Farmers Home Administration (FmHA) data administration began in the late 1970's on systems developed in the St. Louis, Missouri and Washington, D.C. offices. Mainframe systems used in 2300 field offices. Standard abbreviations have been used and enforced from the beginning. Separate standards were used to develop systems in St. Louis and Washington. The St. Louis office used standardized elements in the Computer Associates Integrated Data Dictionary (CA-IDD) for mainframe systems using CA-IDMS software. The Washington office developed 3B2 systems using Prelude and Oracle.

Federal Crop Insurance Corporation (FCIC) consisted of applications developed in Kansas City, Missouri, which supported sixteen (16) field offices and also some reinsurance companies. Separate standards were used to develop the CA-IDMS and Focus systems. The CA-IDMS software uses standardized data elements stored in CA-IDD. Standard abbreviations were derived from commercial software.

Agricultural Stabilization and Conservation Service (ASCS) data administration started in the early 1980's. It supports a dispersed computer system in 2800 county offices using identical software and data base structures. The primary software is COBOL running on a System/36 platform. All files use standardized data elements stored in CA-IDD software. Query/36 provides County Offices with a pseudo-relational query capability using IBM's IDDU product. The development attitude in the 1980's was to "crank out the code, data administration is low priority". Data administration functions were typically assigned to lower grade individuals. Randomly constructed business names and definitions, redundant data elements, and inconsistent (6 character) IDDU names caused data redundancy and inefficiency problems that resulted in a changed philosophy toward data administration in late 1990. A business name foundation was adopted to consolidate redundant elements and an online CA-IDD interface was developed to enter business definitions. IDDU names were related to corresponding COBOL names and inconsistent element names wereconsolidated using synonyms to old names. The IDDU names were standardized throughout the systems. When the new names were implemented all existing queries became obsolete. The complaint from the county office employees was, "Why didn't you do it sooner?". A data element glossary report was developed emphasizing business name. Currently there are 13,000 data elements with definitions and domain; 6,000 of these elements have IDDU names.

CFSA is currently working on a legacy normalization project to prepare for the re-engineered Integrated Management Information System (IMIS) and Core Accounting System supporting FISVIS. The purpose of the normalization project is to upgrade legacy files to first normal relational format and to maintain recognition to existing legacy files.

Financial Information System Vision and Strategy (FISVIS)

FISVIS went through an extensive six month effort at building a financial information architecture. The James Martin Information Engineering (IE) methodology was followed and the CASE tool used was Knowledge Ware's Application Developer's Workbench (ADW). A business area analysis further defined details. The Chief Financial Officer (CFO) will declare the Information Architecture as the Departmental standard for anyone dealing with financial data.

A COTS package was purchased in December 1994. It is not feasible (affordable) to require the vendor to change the data elements in the COTS package and therefore FISVIS has mapped the COTS data elements to the Information Architecture entities and attributes. This was followed by the difficult task of searching National Finance Center (NFC) record layouts, tables, etc. in search of legacy data elements that must be incorporated or fed into the new COTS package. Data in existing sources must be converted to map to the information architecture and from the information architecture to the COTS package elements. FISVIS is currently using PARADOX to inventory and map the data elements. They are also in partnership with MAP and INFOSHARE for data standardization.

A repository tool is needed to help administer the FISVIS standards program. The Chief Financial Officer plans to set up a standards administration group in the Washington, D.C. area. The group will administer the Information Architecture and the Financial Management Standards Manual.

Food and Consumer Service

The Food and Consumer Service (FCS) began an aggressive Software Renewal Program (SRP) in the late 1980's. The effort began with a logical data design that produced data structures and definitions for FCS's most important data elements.

The logical data design was then used to drive the SRP function design. FCS has been very pleased with the results of its "data-driven design." The SRP replaced 44 stand-alone applications, developed during the 1970's and 1980's, with four comprehensive, integrated, on-line, menu-driven systems that radically transformed FCS's information architecture to better support the Agency's vast program mission and immense, $40-billion annual fiduciary responsibility.

The SRP, successfully implemented in stages, is now being extended to support EDI and EBT technologies. The extensions are known within FCS as "SRP-X."

The National Data Bank (NDB) system, one of the four systems developed under the SRP, provides a repository of 1,500 of the Agency's most important data elements. The NDB, which contains specially designed data administration software, provides a variety of analysis and management reporting capabilities. Definitions have been developed for all NDB data elements and stored in the CA-IDMS Integrated Data Dictionary (IDD). The definitions are also available via function-key screen display on the LAN version of the NDB.

FCS does not have an extensive common-use data structure for cross-sharing with other agencies within USDA-having more in common, for example, with other welfare organizations within Health & Human Services (HHS). One example, however, where FCS was hampered by a lack of Departmental data standards was the tri-agency PCIMS project where all three agencies had different commodity code structures.

FCS used a CASE product, IDMS Architect, for database design implementation/documentation for the SRP project. FCS has recently adopted James Martin's Information Engineering (IE) methodology and Texas Instrument's Information Engineering Facility (IEF) CASE software tool suite as Agency standards.

Forest Service

The purpose and need for a national Forest Service (FS) repository are well established. The purpose of developing a Forest Service repository has been well thought out during the development of the Forest Service Methodology (March 1992) and through the work of the Repository Team Report (October 1992). This body of work needs to be reviewed, revised to account for changes in technology and understanding, and then implemented.

The Repository Team Report describes a vision for the Repository as a client/server application development environment. The main emphasis lies in coordination of application development, integrating applications data and configuration management, control of application systems and data base metadata.

This vision supports information sharing, business process reengineering and data administration as a product of the application coordination and development work. Including strategic level models is the first step in application coordination and data integration.

Highlights of the vision for the repository include:

Plans are for the repository to provide configuration management for application development, enforce Forest Service Methodology as the standard, enforce standard naming conventions, reduce both data and application redundancy and decrease development life cycle time.

At present, the Washington Office (WO) is supporting a repository through the implementation of FS ATLAS. This instance of Oracle CASE*Dictionary is used primarily for documenting completed applications. There are no attempts being made to manage shared elements between applications or enforce data integration.

Several, but not all, Regions are maintaining instances of Oracle CASE*Dictionary. A few Regions are using CASE*Dictionary as a development platform in support of the FS Methodology, but no attempt is being made to integrate these dictionaries with national data integration efforts.

The Forest Service is a long way from complete implementation of the repository vision. Barriers include performance problems over a WAN, object sharing between applications and general acceptance by the application development community.

A staged approach to implementation will allow Forest Service to do what they can right away while establishing strategic direction for full implementation of the repository vision.

InfoShare

The InfoShare Data Team was formed in March 1993 and consists of 13 representatives from Ft. Collins, St. Louis, Kansas City, and Washington. It is the only surviving InfoShare team from the original InfoShare Program (3/93 -9/94). Since the USDA reorganization, three of the four active InfoShare agencies are now CFSA.

A list of active InfoShare Data Team projects follows:

The InfoShare Data Team is presently using the CA-IDD as an interim repository. Plans are to convert to the R&O Rochade Data Repository to be shared by InfoShare, FISVIS, MAP and other interested USDA organizations in late 1995.

Rochade will be used to coordinate enterprise information all the way from regulations to operational data models that relate to shared objects and programs. It will be used to manage consistency of unnecessary redundant data across different applications.

National Finance Center (NFC)

The Data Base Management Branch (DBMB), Data Administration Section (DAS) is responsible for the maintenance and integrity of three data dictionary environments: CA-IDMS Integrated Data Dictionary (CA-IDD), DB2 DB2DICT, and Oracle Dictionary.

The largest of these dictionaries is the CA-IDD. DAS maintains 33 data dictionaries that comprise the 20 CA-IDMS Central Versions the branch supports. The CA-IDD is a fully functional dictionary where the physical data element names are stored. It also houses all components of an application database: schema, subschema, files, programs,etc., and serves as a repository for the metadata. Current standards are in place for naming various components, such as dialog, programs, and maps. It provides a cross-reference of the data elements between CA-IDMS and FOCUS for use within the FOCUS reporting system. Due to the CA-IDD's somewhat rigid inquiry capabilities, DAB designed an interface inquiry database named DICTVIEW. This database provides data element information in a more readable, organized and user-friendly format. It also provides the metadata on the physical data element names, as well as a cross-reference of schema and record usages, and FOCUS data element names. This application can easily be modified to incorporate any other functions and information currently stored in the CA-IDD.

The DB2 dictionary is three products combined into one (DB2DICT) to comprise a basic data encyclopedia. It is a DB2 Corporate dictionary, On-line Request Form and Object Code (physical design) Generator. The DB2DICT corporate dictionary serves as a front end to the DB2 system catalog. Therefore, entries into the development DB2 system catalog must first adhere to the criteria set forth by the data encyclopedia. For security reasons, the dictionary functions are separated by project leader, management and database administrator responsibilities.

The main function of the corporate dictionary is to maintain column integrity throughout the DB2 environment. It provides column search capabilities based on partial column name, type, application, and many other options. Its editing capabilities enforce column name uniqueness, column types and lengths, labeling and descriptions. It protects tables and columns from non-owner update. It features electronic signature verification. The developer is responsible for ensuring that column names and descriptions are meaningful for all users.

The on-line Request Form function of DB2DICT is used as a document to request the migration of DB2 tables, indexes, views, and foreign keys. The process of approving this form is done through electronic signatures. The signature of the project leader, project manager, and project DBA are required. Through the use of a utility program the on-line form can be printed or stored in partitioned data set. This utility program can also provide a cross check for columns that have been updated but not approved by a DBA.

DB2DICT is also a CASE tool that generates SQL code used to create DB2 tables, columns, indexes, views, foreign keys and descriptions. DB2DICT ensures integrity between related DB2 object relationships, referential integrity, correct syntax, labels and descriptions. The generated SQL creates the DB2 object definitions that will be added to the DB2 system catalog.

The third data dictionary maintained is the Oracle dictionary. The Oracle dictionary stores the metadata for database tables and columns, categorized by application. It currently references only the applications in the UNIX environment. It is accessible to Data Base Administrators through SQL forms, query, entry, update and report generating utilities. Dictionary reports are available to developers through the UNIX Dictionary.

[TOP]

Repository Environment

The goal of Corporate Data Repository technology is to provide a point of access and control, coordination of application development, integration of applications data and configuration management. The model driven repository supports the information management process and indicates where data comes from, how often it's accessed, how it should be translated or transformed, who sees it, from what business processes it is derived, and which critical success factors it supports. Combined with a good user access interface, it is a tool that provides coordinated enterprise information.

The data repository addresses the critical requirements of synchronization, mapping, timing and data transfer which include the following:

[TOP]

Benefits of a Corporate Data Repository

Implementation of corporate data repository technology within USDA would help break down barriers to enterprise information. Responsibility for integrated enterprise information belongs to both the business and technical communities. The role of the repository for the technician is to facilitate the building and maintenance of applications and access to data. For the business, the corporate repository provides a consolidated view of USDA programs. It also provides a road map to navigate the complex networks, models and sometimes confusing software intended to facilitate access to data. The repository supports all phases of development from requirements gathering and model design, to user access, operations and maintenance.

The corporate repository and supporting CASE and reporting tools can also be relied upon to produce documentation and cross-reference reporting. Documentation and reports can be used to identify additional information regarding individual attributes, keys, summary levels, frequencies and historical data requirements.

The repository can be used to document the origin and destination of all data elements in the various data models, as well as the business rules that relate to their validation, calculation, transformation, values, etc. Benefits include:

Improved customer service benefits all USDA customers, from the farmer receiving deficiency or crop insurance payments to the National Forest Service to the concerned taxpayer. USDA employees can provide quicker, accurate, more complete responses to questions, concerns, services or needs of the customer. Consistent terminology, references and representation of data (e.g., name, address, codes,...) result in accurate, consistent answers which ultimately increases customer confidence and satisfaction. The consolidated information will allow "one-stop shopping" to make it easier for one office to address most customer needs. Systems such as the Government Information Locator Service (GILS) and agency Internet home pages will make it easier for USDA employees and customers to access data which supports critical USDA missions. Fraud and abuse of USDA programs will be easier to detect because of quicker access to accurate information concerning eligibility programs and benefit delivery (e.g. food stamps, EBT, etc.). This information will also assist with flagging debarred contractors and allow cross vendor management.

The overall benefits of implementing data repository technology for USDA are realized at all levels from the Secretary level on down. The following paragraphs identify more specific benefits at different levels.

The Secretary, and staff, Senior Executive Service (SES), and Senior Information Resource Management Officials (SIRMO) throughout the organization gain a global view of USDA and its resources. A corporate repository serves as a map to gain organizational information for administrative, financial and program areas. Congressional questions asking "How many, how much and where..." may be answered with the repository serving as a directory to consolidated, integrated, consistent information.

USDA's managers gain more efficiency and effective use of facilities, personnel and equipment resources. Repository technology assists with configuration management and cost benefit analysis. The directory holds the key to programs and applications, tested and being used in all areas of USDA. This saves costly resources by allowing information sharing and reducing redundancy. Program sharing leads to information sharing on resources such as data, processes and objects. Common terms and definitions promote sharing and reduce stovepipe systems. Having quick access to this type of data allows each area to do accurate impact analysis on their programs. It may quickly be determined which areas will be affected by a seemingly "simple" change.

The application developers and project managers within USDA benefit from repository technology that promotes uniformity, reuse of objects, and consistent life cycle development. Their awareness of "available/existing" corporate information, activities and programs saves programmer time and increases skills by strengthening the networks among technicians. Programmers typically save documentation as the last "chore" which is sometimes never completed because of new deadlines. The repository will automatically keep consistent and accurate information about the data, processes and objects. All of these benefits allow project managers to facilitate the planning and analysis phase of new projects.

[TOP]

Business Requirements

In order for USDA to succeed in establishing an effective corporate data repository environment and to realize the maximum potential benefits, some business requirements must be taken into consideration. They are as follows:

[TOP]

Summary

Office of Information Resources Management (OIRM) should complete the Data Management Strategic Plan to define a plan, leadership, and adequate resources to support implementation of the Data Management Framework, of which the repository is only one part.

An analysis should be conducted to determine business functions and technology required to support the Department's business and IRM functions, objectives, and activities. Certain IRM functions, such as data administration, have yet to be fully defined by all USDA agencies. Therefore, the Department repository requirements will evolve as the coordinated data administration functions are more clearly defined.

Prior to establishing a corporate data repository for USDA, a cost benefit analysis must be conducted to determine USDA's return on investment. Establishing, administering and maintaining a corporate repository will require personnel and budget resources which are already constrained. The business value must be considered prior to establishing a program of this magnitude. OIRM should possibly consider some type of phased approach to implementing a corporate repository such as initially requiring only that shared data elements be registered.

Once a decision has been reached to implement a corporate repository, the business analysis should be reviewed and taken into consideration to complete a technical requirements analysis.

[TOP]
Left Hand [Return to the USDA Home Page]
Left Hand [Return to the USDA Departmental Management Home Page]
Left Hand [Return to USDA Data Management Program Home Page]