U.S. Department of AgricultureDepartmental Management |
![]() |
|---|
The efforts of the working group were exercised within the guidelines of Federal policy and standards, such as those developed by NIST, OMB, and GSA.
The results of this effort will be factored into the final USDA Data Management Strategic Plan.
Working Group members are Larry Heffel, FCS; Dan Keller, FS; Jeff Kerby, CFSA; John King, FISVIS; Bob Larson, FS; Ed McLaughlin, OP/MAP; Bill Pettingill, NFC; Alan Rich, FCS; Kathleen Rundle, OIRM/NCC; and Paul Whitmore, InfoShare.
A Data Dictionary is a tool which lists all data items that are used, their definitions, how and where they are used, and who is responsible for them.
Data Encyclopedias grew out of Data Dictionaries. They contain more than just data because they were developed by CASE tool vendors to comprise modeling information. They usually have graphical modeling tools, frequently including code generators. They tend to be developed by vendors to support their own modeling tools and therefore are not open to other vendors.
The Data Repository grew out of the data encyclopedias and is a "hub" communicator between all design, development, administration and graphic reporting tools. Repositories tend to be independent and open to many vendors. Repositories are more powerful than a Data Dictionary and more open than an Encyclopedia.
A Data Warehouse is a database that contains a snapshot or instance of the actual data where reporting needs can be satisfied. Data Warehouses generally contain extracted and/or summarization of production data for these purposes. So, the data warehouses are files or databases, not metadata information about them.
[TOP]Farmers Home Administration (FmHA) data administration began in the late 1970's on systems developed in the St. Louis, Missouri and Washington, D.C. offices. Mainframe systems used in 2300 field offices. Standard abbreviations have been used and enforced from the beginning. Separate standards were used to develop systems in St. Louis and Washington. The St. Louis office used standardized elements in the Computer Associates Integrated Data Dictionary (CA-IDD) for mainframe systems using CA-IDMS software. The Washington office developed 3B2 systems using Prelude and Oracle.
Federal Crop Insurance Corporation (FCIC) consisted of applications developed in Kansas City, Missouri, which supported sixteen (16) field offices and also some reinsurance companies. Separate standards were used to develop the CA-IDMS and Focus systems. The CA-IDMS software uses standardized data elements stored in CA-IDD. Standard abbreviations were derived from commercial software.
Agricultural Stabilization and Conservation Service (ASCS) data administration started in the early 1980's. It supports a dispersed computer system in 2800 county offices using identical software and data base structures. The primary software is COBOL running on a System/36 platform. All files use standardized data elements stored in CA-IDD software. Query/36 provides County Offices with a pseudo-relational query capability using IBM's IDDU product. The development attitude in the 1980's was to "crank out the code, data administration is low priority". Data administration functions were typically assigned to lower grade individuals. Randomly constructed business names and definitions, redundant data elements, and inconsistent (6 character) IDDU names caused data redundancy and inefficiency problems that resulted in a changed philosophy toward data administration in late 1990. A business name foundation was adopted to consolidate redundant elements and an online CA-IDD interface was developed to enter business definitions. IDDU names were related to corresponding COBOL names and inconsistent element names wereconsolidated using synonyms to old names. The IDDU names were standardized throughout the systems. When the new names were implemented all existing queries became obsolete. The complaint from the county office employees was, "Why didn't you do it sooner?". A data element glossary report was developed emphasizing business name. Currently there are 13,000 data elements with definitions and domain; 6,000 of these elements have IDDU names.
CFSA is currently working on a legacy normalization project to prepare for the re-engineered Integrated Management Information System (IMIS) and Core Accounting System supporting FISVIS. The purpose of the normalization project is to upgrade legacy files to first normal relational format and to maintain recognition to existing legacy files.
A COTS package was purchased in December 1994. It is not feasible (affordable) to require the vendor to change the data elements in the COTS package and therefore FISVIS has mapped the COTS data elements to the Information Architecture entities and attributes. This was followed by the difficult task of searching National Finance Center (NFC) record layouts, tables, etc. in search of legacy data elements that must be incorporated or fed into the new COTS package. Data in existing sources must be converted to map to the information architecture and from the information architecture to the COTS package elements. FISVIS is currently using PARADOX to inventory and map the data elements. They are also in partnership with MAP and INFOSHARE for data standardization.
A repository tool is needed to help administer the FISVIS standards program. The Chief Financial Officer plans to set up a standards administration group in the Washington, D.C. area. The group will administer the Information Architecture and the Financial Management Standards Manual.
The logical data design was then used to drive the SRP function design. FCS has been very pleased with the results of its "data-driven design." The SRP replaced 44 stand-alone applications, developed during the 1970's and 1980's, with four comprehensive, integrated, on-line, menu-driven systems that radically transformed FCS's information architecture to better support the Agency's vast program mission and immense, $40-billion annual fiduciary responsibility.
The SRP, successfully implemented in stages, is now being extended to support EDI and EBT technologies. The extensions are known within FCS as "SRP-X."
The National Data Bank (NDB) system, one of the four systems developed under the SRP, provides a repository of 1,500 of the Agency's most important data elements. The NDB, which contains specially designed data administration software, provides a variety of analysis and management reporting capabilities. Definitions have been developed for all NDB data elements and stored in the CA-IDMS Integrated Data Dictionary (IDD). The definitions are also available via function-key screen display on the LAN version of the NDB.
FCS does not have an extensive common-use data structure for cross-sharing with other agencies within USDA-having more in common, for example, with other welfare organizations within Health & Human Services (HHS). One example, however, where FCS was hampered by a lack of Departmental data standards was the tri-agency PCIMS project where all three agencies had different commodity code structures.
FCS used a CASE product, IDMS Architect, for database design implementation/documentation for the SRP project. FCS has recently adopted James Martin's Information Engineering (IE) methodology and Texas Instrument's Information Engineering Facility (IEF) CASE software tool suite as Agency standards.
The Repository Team Report describes a vision for the Repository as a client/server application development environment. The main emphasis lies in coordination of application development, integrating applications data and configuration management, control of application systems and data base metadata.
This vision supports information sharing, business process reengineering and data administration as a product of the application coordination and development work. Including strategic level models is the first step in application coordination and data integration.
Highlights of the vision for the repository include:
At present, the Washington Office (WO) is supporting a repository through the implementation of FS ATLAS. This instance of Oracle CASE*Dictionary is used primarily for documenting completed applications. There are no attempts being made to manage shared elements between applications or enforce data integration.
Several, but not all, Regions are maintaining instances of Oracle CASE*Dictionary. A few Regions are using CASE*Dictionary as a development platform in support of the FS Methodology, but no attempt is being made to integrate these dictionaries with national data integration efforts.
The Forest Service is a long way from complete implementation of the repository vision. Barriers include performance problems over a WAN, object sharing between applications and general acceptance by the application development community.
A staged approach to implementation will allow Forest Service to do what they can right away while establishing strategic direction for full implementation of the repository vision.
A list of active InfoShare Data Team projects follows:
Rochade will be used to coordinate enterprise information all the way from regulations to operational data models that relate to shared objects and programs. It will be used to manage consistency of unnecessary redundant data across different applications.
The largest of these dictionaries is the CA-IDD. DAS maintains 33 data dictionaries that comprise the 20 CA-IDMS Central Versions the branch supports. The CA-IDD is a fully functional dictionary where the physical data element names are stored. It also houses all components of an application database: schema, subschema, files, programs,etc., and serves as a repository for the metadata. Current standards are in place for naming various components, such as dialog, programs, and maps. It provides a cross-reference of the data elements between CA-IDMS and FOCUS for use within the FOCUS reporting system. Due to the CA-IDD's somewhat rigid inquiry capabilities, DAB designed an interface inquiry database named DICTVIEW. This database provides data element information in a more readable, organized and user-friendly format. It also provides the metadata on the physical data element names, as well as a cross-reference of schema and record usages, and FOCUS data element names. This application can easily be modified to incorporate any other functions and information currently stored in the CA-IDD.
The DB2 dictionary is three products combined into one (DB2DICT) to comprise a basic data encyclopedia. It is a DB2 Corporate dictionary, On-line Request Form and Object Code (physical design) Generator. The DB2DICT corporate dictionary serves as a front end to the DB2 system catalog. Therefore, entries into the development DB2 system catalog must first adhere to the criteria set forth by the data encyclopedia. For security reasons, the dictionary functions are separated by project leader, management and database administrator responsibilities.
The main function of the corporate dictionary is to maintain column integrity throughout the DB2 environment. It provides column search capabilities based on partial column name, type, application, and many other options. Its editing capabilities enforce column name uniqueness, column types and lengths, labeling and descriptions. It protects tables and columns from non-owner update. It features electronic signature verification. The developer is responsible for ensuring that column names and descriptions are meaningful for all users.
The on-line Request Form function of DB2DICT is used as a document to request the migration of DB2 tables, indexes, views, and foreign keys. The process of approving this form is done through electronic signatures. The signature of the project leader, project manager, and project DBA are required. Through the use of a utility program the on-line form can be printed or stored in partitioned data set. This utility program can also provide a cross check for columns that have been updated but not approved by a DBA.
DB2DICT is also a CASE tool that generates SQL code used to create DB2 tables, columns, indexes, views, foreign keys and descriptions. DB2DICT ensures integrity between related DB2 object relationships, referential integrity, correct syntax, labels and descriptions. The generated SQL creates the DB2 object definitions that will be added to the DB2 system catalog.
The third data dictionary maintained is the Oracle dictionary. The Oracle dictionary stores the metadata for database tables and columns, categorized by application. It currently references only the applications in the UNIX environment. It is accessible to Data Base Administrators through SQL forms, query, entry, update and report generating utilities. Dictionary reports are available to developers through the UNIX Dictionary. [TOP]
The data repository addresses the critical requirements of synchronization, mapping, timing and data transfer which include the following:
The corporate repository and supporting CASE and reporting tools can also be relied upon to produce documentation and cross-reference reporting. Documentation and reports can be used to identify additional information regarding individual attributes, keys, summary levels, frequencies and historical data requirements.
The repository can be used to document the origin and destination of all data elements in the various data models, as well as the business rules that relate to their validation, calculation, transformation, values, etc. Benefits include:
The overall benefits of implementing data repository technology for USDA are realized at all levels from the Secretary level on down. The following paragraphs identify more specific benefits at different levels.
The Secretary, and staff, Senior Executive Service (SES), and Senior Information Resource Management Officials (SIRMO) throughout the organization gain a global view of USDA and its resources. A corporate repository serves as a map to gain organizational information for administrative, financial and program areas. Congressional questions asking "How many, how much and where..." may be answered with the repository serving as a directory to consolidated, integrated, consistent information.
USDA's managers gain more efficiency and effective use of facilities, personnel and equipment resources. Repository technology assists with configuration management and cost benefit analysis. The directory holds the key to programs and applications, tested and being used in all areas of USDA. This saves costly resources by allowing information sharing and reducing redundancy. Program sharing leads to information sharing on resources such as data, processes and objects. Common terms and definitions promote sharing and reduce stovepipe systems. Having quick access to this type of data allows each area to do accurate impact analysis on their programs. It may quickly be determined which areas will be affected by a seemingly "simple" change.
The application developers and project managers within USDA benefit from repository technology that promotes uniformity, reuse of objects, and consistent life cycle development. Their awareness of "available/existing" corporate information, activities and programs saves programmer time and increases skills by strengthening the networks among technicians. Programmers typically save documentation as the last "chore" which is sometimes never completed because of new deadlines. The repository will automatically keep consistent and accurate information about the data, processes and objects. All of these benefits allow project managers to facilitate the planning and analysis phase of new projects. [TOP]
An analysis should be conducted to determine business functions and technology required to support the Department's business and IRM functions, objectives, and activities. Certain IRM functions, such as data administration, have yet to be fully defined by all USDA agencies. Therefore, the Department repository requirements will evolve as the coordinated data administration functions are more clearly defined.
Prior to establishing a corporate data repository for USDA, a cost benefit analysis must be conducted to determine USDA's return on investment. Establishing, administering and maintaining a corporate repository will require personnel and budget resources which are already constrained. The business value must be considered prior to establishing a program of this magnitude. OIRM should possibly consider some type of phased approach to implementing a corporate repository such as initially requiring only that shared data elements be registered.
Once a decision has been reached to implement a corporate repository, the business analysis should be reviewed and taken into consideration to complete a technical requirements analysis. [TOP]
[Return to the USDA Home Page]
[Return to the USDA Departmental Management Home Page]
[Return to USDA Data Management Program Home Page]