Author:
Friday, January 29th, 2010

Introduction to Metadata

Metadata or meta data or meta-data also sometimes called metainformation is €œdata about data.€ Metadata is an emerging practice in the field of librarianship, information science, information technology and GIS. It can be applied to a vast array of objects including both physical and electronic items such as raw data, books, CDs, DVDs, images, maps, database tables, and web pages. Since the emergence of the Dublin Core metadata set and the internet, use of metadata has experienced a considerable growth in popularity as businesses and other organizations seek to organize rapidly growing volumes of data and information.

Importance of metadata in data warehouse

Good metadata is essential to the effective operation of a data warehouse and it is used in data acquisition/collection, data transformation, and data access. Acquisition metadata maps the translation of information from the operational system to the analytical system. This includes an extract history describing data origins, updates, algorithms used to summarize data, and frequency of extractions from operational systems. Transformation metadata includes a history of data transformations, changes in names, and other physical characteristics. Access metadata provides navigation and graphical user interfaces that allow non-technical business users to interact intuitively with the contents of the warehouse. And on top of these three types of metadata, a warehouse needs basic operational metadata, such as procedures on how a data warehouse is used and accessed, procedures on monitoring the growth of the data warehouse relative to the available storage space, and authorizations on who is responsible for and who has access to the data in the data warehouse and data in the operational system.

Significance in data warehouse

Metadata is your control panel to the data warehouse.  It is data that describes the data warehousing and business intelligence system:

  • Reports
  • Cubes
  • Tables (Records, Segments, Entities, etc.)
  • Columns (Fields, Attributes, Data Elements, etc.)
  • Keys
  • Indexes

Metadata is often used to control the handling of data and describes:

  • Rules
  • Transformations
  • Aggregations
  • Mappings

The power of metadata is that enables data warehousing personnel to develop and control the system without writing code in languages such as: Java, C# or Visual Basic.  This saves time and money both in the initial set up and on going management.
Data Warehouse Metadata
Data warehousing has specific metadata requirements.  Metadata that describes tables typically includes:

  • Physical Name
  • Logical Name
  • Type: Fact, Dimension, Bridge
  • Role: Legacy, OLTP, Stage,
  • DBMS: DB2, Informix, MS SQL Server, Oracle, Sybase
  • Location
  • Definition
  • Notes

Metadata describes columns within tables:

  • Physical Name
  • Logical Name
  • Order in Table
  • Datatype
  • Length
  • Decimal Positions
  • Nullable/Required
  • Default Value
  • Edit Rules
  • Definition
  • Notes

How can Data Warehousing Metadata be managed?
Data warehousing and business intelligence metadata is best managed through a combination of people, process and tools.

The people side requires that people be trained in the importance and use of metadata.  They need to understand how and when to use tools as well as the benefits to be gained through metadata.

The process side incorporates metadata management into the data warehousing and business intelligence life cycle.  As the life cycle progresses metadata is entered into the appropriate tool and stored in a metadata repository for further use.

Metadata can be managed through individual tools:

  • Metadata manager / repository
  • Metadata extract tools
  • Data modeling
  • ETL
  • BI Reporting

Metadata Manager / Repository
Metadata can be managed through a shared repository that combines information from multiple sources.

The metadata manager can be purchased as a software package or built as “home grown” system.  Many organizations start with a spreadsheet containing data definitions and then grow to a more sophisticated approach.
Extracting Metadata from Input Sources
Metadata can be obtained through a manual process of keying in metadata or through automated processes. Scanners can extract metadata from text such as SQL DDL or COBOL programs. Other tools can directly access metadata through SQL catalogs and other metadata sources.

Picking the appropriate metadata extract tools is a key part of metadata management.

Many data modeling tools include a metadata extract capability – otherwise known as “reverse engineering”.  Through this tool, database information about tables and columns can be extracted.  The information can then be exported from the data modeling tool to the metadata manager.

1 Star2 Stars3 Stars4 Stars5 Stars (No Ratings Yet)
Loading ... Loading ...
You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.

One Response

  1. Hello! Someone in my twitter group shared this site so I came to check it out. I’m definitely enjoying the information. I’m book-marking! Fantastic style and design.

Leave a Reply


 

Spam Protection by WP-SpamFree Plugin