The Travelin' Librarian

Come on people!

Time for another rant.

If you've ever stuck a CD into your computer and had your player program magically tell you the name of the album, the artist, and all the track information, then you're familiar with the Gracenote (formerly known as the CD Database, CDDB). This is a generally useful collection of data about CDs totally created by volunteers. How is the data gathered? Well, whenever you put a CD into your computer and the program fails to fill in the information, you can fill it in and then submit the data back up to the larger collective. Here's my beef: I'm sick and tired of the number of basic mistakes in the data. The number of mistakes I've run into recently leads me to believe there is no quality control at all. Here's my first example:

Incorrect:

In this case someone thought that the name of the album was "Chronicles (Disc 4)" which was "disc 1 of 1" in a set of one. Sorry folks, but the name of the Album is "Chronicles" and this happens to be disc 4 of a 6-disc set. Additionally, this is a book on CD. They don't have composers! (Nor, folks is the name of the person reading the book considered a composer. The number of times I've seen that.)

Correct:

My other example:

Incorrect:

In this case someone has decided that the author's name is "Cynthia Lennnon", the name of the album is "Lennon 1" and that this is a "Compilation CD". Well, I'd forgive the misspelling of her name (I'm hardly one to be able to complain about that,) but it carries through all nine CDs. They got the "disc 1 of 9" correct but still insisted on misnaming the album itself. And finally, compliation CD are albums with multiple artists, typically a different one for each track. Sorry, this is another book on CD, there is only one artist. (If it was an anthology by multiple authors then this option should be checked.)

Correct:

Just had to get that off my chest.

Labels: metadata, rant

Microsoft Photo Info

Here's one for the metadata librarians: The Microsoft Photo Info download allows you to "Easily view and change 'metadata' properties in digital photographs from within Windows Explorer."

Labels: metadata, microsoft, photography, windows

IL2006: Tuesday Keynote

Clifford Lynch, Executive Director, Coalition of Networked Information
Challenges of Cyberinfrastructure & Choices for Libraries
9:00-9:45am

Will not be doing a musical performance this morning
Observations about scholarship/teaching/learning are changing & implications of policy changes
What do these changes open up for librarians?
Cyberinfrastructure

most of rest of the world you can talk to people about e-science
practice of science has been transformed by

high performance computation

high performance networking
large scalle management/org/reuse of data

2002 report, Atkins commission, how is science & engineering in the US changing

what changes need to be made?
"cyberinfrastructure"
data management
data visualization
people!

National Virtual Observatories

People not interested in IP issues w/ astronomy
metadata is free/bulit-in to observational equipment
enormous sky sruveys patch together from many different sources
no longer about getting observational time
algorhythms are being written to analyze data instead of needing more observational data
opens up astronomy to school kids
[I read about the democratization of astronomy in The Long Tail last night...]

how do we get data resued and preserved?
how do we assist the scientists to mark this data consistently?
first focused on engineering
all of this technology can also be applied to the humanities and the social sciences
american council of learned societies report coming out soon on this issue

these approaches need to be used in not just the hard sciences

there are controversies about whether these technologies are changing the way humanites are studied
"phisics changes one funeral at a time"
questions

human subjects
privacy
intellectual property
access to evidence

Could we digitize all the literature of all the cultures that have ever existed? Images?
Mass digitization projects

Microsoft
Google

European Digital Library

What about the "non-published" stuff? (Museums)

what are the roles and responsibilites of museums of publically stored materials?
Most stuff is pre-1923 / out of copyright
they're monitizing those items
seems inappropriate to some
"public trust"
digitize materials to make them available to the society at large

Special collections

papers of persons and institutions
important to researchers
collections are changing in character / going digital
Salman Rushdie's papers & e-mail
items are being created in digital form

Problem of scale

study of older times, there's a paucity of evidence
modern times, too much information

What's coming out of this

needs are shifting from getting the tech to work to informatics

organize data
backup data
confidentiality

tend to focus on big projects

large projects
large teams
highly organized
big $

what about the projects with small groups working on small issues

small staff
small $
how we support these people

deal with on a diciplinary basis or institutional basis?

Will end up with a patchwork of solutions to this problem

will be dynamic not static

fashions, interests and budgets wax & wane

Roles of libraries in all this

big research universities & info tech workforce 15yrs ago vs now

then: worked for central IT
now: more than half now in departments, schools, labs, etc. / closer to researchers & teachers

facing demands for data curration

more want to share & reuse data
shifting norms re: information sxchange
retiring faculty / what to do with all this data i've accumulated?

institutions finding that there's "value" to the data

data mgt & sharing plan in grant proposals
how will it be preserved
how will it be shared
institutions making sure that these rules are adhered to

data lost in gulf disasters of last year

was there backups?

ACRL report on all this due out soon

who's supposed to be doing the work?

new professional

mythological

"data scientist"

what do these people need to know

general?
diciplinary?

can we do this for each dicipline or more generalists or hybrid
major workforce issues
sale of problem is large
we're going to need a lot of people to work on this
are these people librarians?

libraries as institutions

big research libraries

most profoundly changed already
strugging to keep up w/ amount of data via budgets
access issues
main role has been to apy for journals
journals now electronic
access has shifted out of the library
some people therefore believe access to these sources is free
policy choices?

already overstressed, can't deal with it
humanities strategy, hard sciences are on their own
need to move resource away from published lit & into more active engagement with the scholarly process

three very different pathways
different institutions will take different paths
movement into more inter-institutional collaboration
rapid rise of virtual organizations

cross multiple boundaries

other libraries

huge demand for access
will see in many different areas

undergrads
k-12

will effect many libraries

Nature of personal history is changing

issue for any cultural memory orgainzation, not just libraries
scope of those interests are getting broader
rise of amature observational science

bottany
astronomy
biology
geology

libraries of all types need to be mindful of all the changes this type of research is bringing

will force strategic change

Labels: metadata, microsoft

IL2006: Innovative Uses of Web 2.0 Technologies

Karen Coombs, University of Houston Jason Clark, Montana State University Karen: Incorporating Web 2.0 into Library Web Sites

What is Web 2.0

Services to collaborate & share
movement toward more dynamic & interactice web

examples

social software
blogs
del.icio.is
wikis
folksonomies
rss
APIs
AJAX

Radical Decentralization

Web site updated and created by many different people
wikis & blogs
librariy web site allows any staff to update any content

Small Pieces Loosely Joined

Combination of different technologies

wikis
blogs
CMS

Library's CMS made up of modules for different content types

content is resuable throughout the site

any piece of the CMS can be replaced as needed

Perpetual Beta

deploy systems early and make constant improvements
users are part of the development process
deploy new systems to a small group of staff to test and help us refine
gather constant input and make continuous improvements

Remixable Content

APIs allow content to be incorporated into other systems
library web site can incorporate content from external sources
content which is part of the library's site can be used on multiple pages
AJAX to add database link to any page, blog, wiki

User as contributor

allows users to add and update content

class wikis
wiki model for CMS

instutitional repositories for scholarly content from faculty, students and staff
library hosts blogs
user tagging and review content in catalog

Rich User Experience

multimedia, interactivity, GUI-style application experience

video
sound
screencasts

personalization and customization
space for collaboration and interaction

chat
VoIP

Demo of UofH's CMS

Jason: Social Tagging and Folksonomies in Practice

Agenda

examples
define
suggest applications
pros & cons
where can you learn more

Examples

del.icio.is
amazon
flickr
technorati

Definitions

Tagging

assigning descriptive metadata

Tag

The descriptive metadata

Folksonomies

taxonomy created by folks

Library use cases

find additional access points in library catalogs
assign friendly terms to indexes and databases
create communities of practice around library articles
organize a series of web pahes for a library guide
give users opportunities to label library web pages

Library applications

tags.library.upenn.edu
WPOPAC

Social Tagging: Why does it work?

embracessocal nature of the web
curency
scales to large datasets
offers a broader discovery model
adaptable
maps and displays simple relationships between items

What's the Hitch?

lack of precision
lack of true hierarchy
vulnerable to "gaming" of the system
lack of a controlled vocabulary
users can be wrong

When to use it?

establish an architecture of participation
organize resources for a company intranet
allow a class to collaborate and buils a reference guide
build and refine library controlled vocabulary
anytime there is a browse or search function

Reference list...

ZoomCloud
TagCloud
tagsonomy.com (blog)
FreeTag
unalog

Final thoughts

design matters
scale matters
a new source of data

Labels: beta, del.icio.us, metadata, rss, video, wikis

CIL2006: Exploiting the Value of Structured Metadata

Lorcan Dempsy, OCLC
10:30-11:15am

Structured data, Web 2.0, libraries

Releasing value
We have a lot of classical bibliogrpahical data
Web 2.0

Flat applications

APIs

Rich interaction

Ajax

Data is the new functionality

make the data work harder

Participation

Social services
Mobilizing the edge
Contributing to create additional value

Co-creation
folksonomies

Lightweight service composition

Audience Level Web service

human and machine readable interface that resolves OCLC record numb er or ISBN to probable audeince level
uses type-of-library holdings data in WorldCat to calulate audience levels for books representented in WorldCat
ARL=1.0, Academic=0.66, Public=0.33, School=0

Greasemonkey script to expose in Amazon and Open World Cat

Shows "audience level" result in the "Product Details" of the book's on Amazon.com
Funny example: "The Bibliography of Canadian Bibliographies"

Hints at level only, not definitive
Examples

The Selfish Gene = 0.6
The World is Flat = 0.5
Theories of the Information Society = 0.71

Ajax - rich interaction

Live Search

Quick searches target with each additional keystroke of search term/phrase
retreives ordered, FRBR-instpred results
Narrow-by Dewey attributes
Catalog of Phoenix PL

Indexed every three-word combination

Display results as you type

Ranked by holdings

"satisficing engine" (good enough, asap)

Many biblical references as has the most holdings

Top categories list on the side based on DDC

LCSH Live

Make data harder

Fiction Finder

Interface that supports searching browsing of fiction materials in WorldCat
retrieves ordered, FRBR-inspired results
Faceted browse
New interface available 1st qtr 2006
aplhabetical browse by genre
retrieves works, ordered by holdings
click on work, get aggregate details from multiple editions in multiple libraries
Narrow by type, language
sort by different methods (newest, oldest, etc)
Pick edition, link to WorldCat to find copies in libraries
Exposes data such as literary form and setting
links to related works, author, etc.

Audience Level

Participation

Reviews WikiD
Not covered due to time limitations

Labels: metadata

IL05: Wednesday Keynote

Google Print: Making the Virtual World Real

Rich Wiggins, Michigan State University

Cartoon: Why Google must never be bought by Microsoft
The idea: The library of congress metaphor

Schoolgirl in Carthage, TN accessing the contents of LoC (Al Gore)

Other projects

Small group of items, digitize it all
Words & songs of Woddie Guthrie
Library of the first ladies
Worthwhile

Extends access to all web users
Preserves fragile content

Why not all of LoC?

LoC numbers

What are you measuring
What resolution
What color depth
What format

LoC books only

20-28 million itmes
2-7 million unqiqe bound volumes
17-20 terabytes

The idea

Disk is cheap
Digital imaging is getting cheaper
Broadband is relatively cheap
Labor can be relatively cheap

Automation can help

The germ of the idea

Technology is rapidly improving
Flatbed scanner is the wrong tool

Cost

Aprox 0.05 or 0.01 per page/image
$10-12/hour labor, mileage, meals, lodging

Digitize the LoC

Aprox $2.5 billion dollars

OCR

Getting better and faster
Digitize it now, OCR on demand

Storage costs are plummeting

RAID arrays
Under 50 cents per gigabyte

Inventory/cataloging costs

Physical shelf space, $40/item
If it’s worth purchasing, it’s worth digitizing

Barrier: Rights Management

Once digitized, can we deliver it?
The paradox of latent value
Aprox 1/3 of LoC print collection is now in the public domain

Barrier: “The benefit doesn’t justify the cost”

It’s more cost effective to digitize everything than “just the good stuff”

Encourages preservation

Deacidification
Fire, digital is backup

Benefit: access
Benefit: Improved digitizing technology

The “ideal” book scanner

Benefit: Standards

Open XML
Cross document metadata

Benefits: Large-Scale Rights Management

20 million volume collection will force the issue of fair use
Today, Disney defines fair use

Digital library projects: Think Big!

Google project teaches this

Apollo Program Analogy
Google’s vision will be realized by a forward thinking company and not the government
Why trust Google

They’re smart
They’re agile and innovative
They show no fear
They’re worth $100 billion
They won’t do this alone

Google: Catalyst for Digitization or Library Destruction?

Roy Tennant

Roy: More access is better. Easier access is better. There’s more room for players and that’s a good thing. It’s good that Google is digitizing things. There’s room for everyone to be involved.

Cartoon: Google, Devil or Merely Evil?
Scary monster #1: A Copyright Cataclysm

Libraries have long enjoyed “fair use” protection
Google’s attempt to shield themselves under fair use may ruin it for us all

Scary monster #2: Closed Access to Open Material

They’re probably going to fix this problem
Google print copies are locked to certain ones from certain publishers
Public domain books are locked into showing just a certain number of pages
Link to buy book but no link to the library

Scary monster #3: Blind Wholesale Digitization

Large research collections are not weeded by policy
“We keep all kinds of crap”
Outdated material
“no more a good thing than buying books based on color”
Copyright will restrict access to up-to-date, recent material
Users will end up with the old crap since it’s more available
Open Content Alliance is focusing on collections

Scary monster #4: Ads

Most g]Google profit coming from ads
Needs eyeballs
Ads for antidepressants next to Hamlet
Viagra next to Lolita
They’re responsible to their stockholders, not the public

Scary monster #5: Secrecy

Agreements between Google and libraries have mostly been kept secret
The libraries could not talk to each other
U of Mich revealed after FOIA request
OCA is more open
Rumors indicate UM has best agreement from lib perspective, others have less favorable agreements
But we don’t know, nobody’s talking

Scary monster #6: Longevity

Google, Enron, WorldCom in common?

Public companies motivated by profit
Two are now gone
Size doesn’t not shield you

What do Google and libraries have in common

Both on Earth

Harvard library, 400 years old
Google 7 years old
Who should we trust?

Adam Smith, Product Manager for Google Print and Google Scholar

Welcome all comments, it’s what makes our products better
Better to have the information out there to see how people use and access it
Walk a difficult path to make many parties happy
Want to make the information accessible – at least discoverable
Copyright is an issue
This is just a small piece of the puzzle as ambitious as this project sounds
Welcome other efforts and they’re positive for the community
Publisher program uses a destructive scanning technology
Library version is non-destructive, they created it, and secret
New version of Google privacy policy has just been released

Labels: cartoons, disney, metadata

Social Software & Sites for PLs

Jenny Levine, The Shifted Librarian Jessamyn West, Librarian.net Flickr, Tagging, and the F-Word (Jessamyn)

Features

Easy upload
[M: Jessamyn just said "groks"]
easy find
easy share

Tagging

Metadata by me
...by my family & friends
...by anyone
Tagging vs Classification

Can co-exist
Must regognise the differences
it's not a fight

Folksonomy

user created metadata
grassroots community classification of digital assets
flat namespace
not mutually exclusive with other systems
helps with scalability problems
involves the users in the problems
does have the "synonym problem"

del.icio.us (Jenny)

social bookmarking
the bookmarking version of flickr
tagged boomkarks
RSS feeds of tags and users
You can search your bookmarks but others can't search your bookmarks
Use to research new topics

These are the sites are reading and are important enough to bookmark

Hacks

ToRead
ToRent
ForName (private = for:username)
Download media in iTunes

del.icio.us for your library

LaGrangeParkLibrary (for the ref desk)
Thomas Ford Memorial Library (aaron schmidt, displaying the feed back onto the Web site)

Floksonomies sites

CiteULike (accademic)
last fm (music)
43 things (what do you want to do, meet others who want to do the same thing)
43 places (where do you want to visit)
Technorati (blogs)
MetaFilter
Yahoo! Search
Yummy! (hosts PDFs)
Amazon.com search inside the book concordanance
bookswelike.net
LibraryThing

Labels: apple, del.icio.us, itunes, metadata, rss

See what you get for complaining?

I hear someone complaining that they don't understand much of what I post about so therefore I've just got to post something über techie in response...

The eXtensible Past: The Relevance of the XML Data Format for Access to Historical Datasets and a Strategy for Digital Preservation

Abstract:
This article reports on the X-past project carried out by the Netherlands Historical Data Archive (NHDA). The main goal of the project has been to investigate how the XML data format can improve the durability of and access to historical datasets. The X-past project furthermore investigated whether it would be possible to provide access to historical datasets by means of the "Open Archives Initiative—Protocol for Metadata Harvesting" (OAI-PMH). Within the framework of the X-past project a prototype information system has been developed and a number of users have been asked to report on usability issues concerning this system.

Thanks Rosario

Labels: metadata

IL04: "technology and collaboration"

I ended up having to sneak out a few minutes early from this presentation as it ended at 11.15 and I'm giving my Data Visualization talk at 11.30. As expected from anything in which Stephen Abrams (VP of Innovation for SRSI and the current president of the Canadian Libraries Association) speaks this presentation was a hoot. (Sorry, that's the best word I can come up with for describing that hilarious Canadian.) The presentation was basically an overview of collaboration technologies that can and are being used in libraries today. His simple truth: "What matters is not what you have but how you use it." Another interesting point: People who tend to be more liberal or open minded want information that challenges their perceptions, more conservative people want information that confirms their perceptions. The new collaborative technologies that are out there include, on the user side: Web conferencing, presence management, real-time translation, real-time speech-to-text, collaboration sites and wikis. On the information provider side there's Web services, RSS feeds, learning objects, digitization, and faceted metadata. Another one of his best points: "Collaboration is an environment, not and end in itself."

The single best thing I learned from Stephen at this conference was "I may have made up the word, but since I've added it to the spell-checker, that makes it official."

Labels: metadata, rss

Day off

I've taken the day off but don't think I'm just sleeping. (Don't I wish) I'm working on the Web design book and I've just finished Chapter 10: Metadata.

Labels: metadata

Web design book update

I wrote half of the metadata chapter on the flight from Baltimore to Atlanta. (Yes, Baltimore to Denver via Atlanta.) I would have tried to finish it on the Atlanta to Denver flight but I just finally zonked out while waiting to take off and didn't wake up for most of the flight. (Some of you know what it takes to get me to sleep on a plane.)

I also got an e-mail late Friday from my editor wanting a date for a first complete draft. He suggested April 1st and I'm tempted to shoot for that. I'll have a better idea if that's reasonable on Monday. (At least two more complete chapters will help appease him ;-)

I've also figured out that I need to rewrite some small bits form chapter's I've already turned in. He's cool with that as he not really going to start the editing process 'till all the chapters are turned in.

Labels: metadata

Web design book update

Five more pages completed. If I keep on this schedule chapter four, Basic XHTML Markup, should be completed by the end of the weekend. (No, I'm not writing the chapters in order.)

Actually here's how the book stands. ~ means it's partically written, * means I've turned in a full draft.

Chapters

Introduction
Introduction to XHTML
Minimal XHTML Document *
Basic XHTML Markup ~
Hyperlinks *
Lists *
Tables *
Forms
Frames
Metadata
Introduction to CSS *
The Mechanics of CSS *
Text Formatting
The Box Model
CSS & Lists ~
CSS & Links *
CSS & Tables
CSS & Forms
CSS Positioning
CSS Media Types

Appendices

XHTML Doctypes
Transitional vs. Strict DTD *
Moving from HTML to XHTML
XHTML Transitional DTD *
XHTML Strict DTD *
XHTML Frameset DTD *
XHTML Element & Attribute Reference
Character Entity Reference
Directory Structures & Relative Hyperlinks *
CSS Property & Value Support Reference

Labels: metadata

The Travelin' Librarian

Site menu:

Monday, June 04, 2007

Come on people!

Tuesday, March 13, 2007

Microsoft Photo Info

Tuesday, October 24, 2006

IL2006: Tuesday Keynote

Monday, October 23, 2006

IL2006: Innovative Uses of Web 2.0 Technologies

Thursday, March 23, 2006

CIL2006: Exploiting the Value of Structured Metadata

Wednesday, October 26, 2005

IL05: Wednesday Keynote

Monday, October 24, 2005

Social Software & Sites for PLs

Wednesday, February 16, 2005

See what you get for complaining?

Wednesday, November 17, 2004

IL04: "technology and collaboration"

Monday, April 05, 2004

Day off

Saturday, March 13, 2004

Web design book update

Friday, February 20, 2004

Web design book update

Recent bookmarks

Archives

Powered By