In the course of reading Kenney and Rieger’s “Moving theory into practice”, I came upon a recommendation of this site:
http://www.getty.edu/research/conducting_research/standards/introimages/index.html
It’s an overview of important concepts and facets of developing and maintaining a digital collection written in 2003. It is geared toward images rather than documents, but the principles are essentially the same. It first gives a really concise tutorial on the elements of a digital image including the resolution, color, file format, etc… I knew a lot of it but it was a great refresher with some new stuff for me. The meat of the site, though, was the workflow section. It covered selecting scanners, image capture, metadata schema (as distinct from format), quality control, delivery, security, and long-term management and preservation.
I made notes as I read and here are some highlights of my thoughts as well as direct excerpts from the text…
The National Digital Library Program of the Library of Congress, the California Digital Library, and the Colorado Digitization Program are some examples of groups that have made available their own standards, guidelines, and best practice recommendations for all aspects of imaging projects, and these can be immensely helpful. When considering adopting a standard, it is important to consider how well established vendors support it and the depth of its user base.
“”Good” metadata was defined in 2001 by the Digital Library Forum as fulfilling the following criteria: it is appropriate to the materials digitized and their current and likely use; it supports interoperability; it uses standard controlled vocabularies to populate elements where appropriate; it includes a clear statement on the terms of use of the digital object; it supports the long-term management of digital objects; and it is persistent, authoritative, and verifiable.”
Metadata can be divided into three broad types, which may be simply defined as follows: descriptive, which describes content; administrative, which describes context and form and gives data-management information; and structural, which describes the relationships between parts and between digital files or objects.
CBIR, a concept I’ve looked into in another project, was mentioned as being able to retrieve images based on color, iconic shape, or a specified position of elements within the image. It’s ironic to me that the most common way of wanting to search for an image – that is, by its content – is the area most prone to subjective description. But CBIR strives to build that bridge.
I’ve struggled in my own mind with the pros and cons of controlled vocabularies and user-generated search terms. I TOTALLY get the controlled vocabulary argument, but I think that’s my inner elitist talking. There’s the popular argument (which appeals to the control-freak in me) of making a standardized system agreed upon and adhered to by all making cataloging, archiving, and retrieval an exercise in repetition, stability, and predictability. Then there’s the less-spoken reason of Information for Informed People and all that hoo-ha. But I also see the efficacy of the latter choice. I mean, what’s the purpose of a collection if not to be accessible? researchable? Enjoyed and appreciated by as wide an audience as possible both for the benefit of the audience and for the continued purposefulness of the collection. It’s one thing to enjoy your ivory tower but without letting down your golden hair, you’re not likely to be able to afford the upkeep for long.
I like the idea of keeping an unedited master as well as an edited one. We apparently have lots of room, so why not? I also caught hold of several technical aspects to be included in technical metadata.. compression specifics (if relevant) being one that I hadn’t thought of.
I’m ashamed to admit that I wasn’t aware there was more than one kind of TIFF. Other concepts, like migration schedules, I was happy to learn about with no guilt whatsoever.
Capture resolution, intended display resolution, file format, spi, color schemas, compression details, rights information, content description, acquisition date, digitization date, the traditional cataloging details of a physical item, and more….. descriptive, administrative, and technical metadata should all be included. The TIFF format allows this range… though I’d like to look more into PNG. If memory serves, I think it was popular a few years back but not widely adopted despite some clear advantages. And the imperative of interoperability would preclude this if so. Plus, David said that they have gone TIFF already.
I’ve developed several questions as I read this.. what kind of legal clearances might be an issue, what DAM software do we use? Have we already settled on a standard or modification of one? Would image reproduction standards be different for books and images? Are they already at OU? What type of scanner do we use currently? Just how interoperable do we want to be? Remember the story of the re’printing’ of our material on another libraries’ website…
Standardized filenaming .. an absolute must.
XML schema, the Metadata Object Description Schema (MODS), designed to both transmit selected data from existing MARC 21 records (so-called because they result from the harmonization of the Canadian and U.S. MARC formats in readiness for the twenty-first century) and enable the creation of original resource description records.
Dublin Core, developed as a core set of semantic elements for categorizing Web-based resources for easier search and retrieval, has become popular in the museum and education communities. The schema is deliberately simple, consisting of fifteen optional, repeatable data elements, designed to coexist with, and map to, other semantically and functionally richer metadata standards. Dublin Core’s simplicity makes it an excellent medium of exchange, and thus a basis for interoperability. The metadata harvesting protocol of the Open Archives Initiative (OAI), known as OAI-PMH, which provides a mechanism for harvesting or gathering XML-formatted metadata from diverse repositories, mandates Dublin Core as its common metadata format. The Dublin Core Metadata Initiative (DCMI) is also developing administrative and collection-level metadata element sets, and user communities have developed element qualifiers relevant to their own fields, which both enrich and complicate the standard.
The Research Libraries Group’s (RLG) Preservation Metadata Elements are intended to set out the minimum information needed to manage and maintain digital ?les over the long term and, unlike the schemas described above, capture technical, rather than descriptive, information. This element set may be combined with any descriptive element set to describe an image file.
MPEG-7, or Multimedia Content Description Interface, is an XML-based standard developed by the Motion Picture Experts Group (MPEG) to describe multimedia and audiovisual works and is likely to grow in importance over the next few years. It supports textual indexing (e.g., the use of controlled vocabularies for data elements such as subjects and genres) and nontextual or automatic indexing, such as shape recognition and color histogram searching. It also supports hierarchical or sequential description.
I think checksum quality control is inexpensive, easy to set-up and reasonably indicative of corrupt files, if present. Though I also think that random and prescribed periodic checks of scans is advisable.
I appreciate the need to make embedded metadata more accessible to search engine spiders so as to further improve user access. I need to look into more current thought on this.
I agree with the logical conclusion of strategies for preservation they present .. that is to capture high-quality images in standard file formats with sufficient identifying information for future viability and with LOCKSS. …also the use of open-standards and system-independent formats.
Finally, the OAIS Reference Model is of great interest to me and I tracked down this site..
http://www.oclc.org/programs/publications/reports/2009-04.pdf
which is a VERY recently published report of a survey done in oct/nov of last year about metadata creation workflows. I’m eager to see what the current thought is.