April 7th, 2013 by Michael Sauers

Anil DashI’ve not yet watched the video I’m embedding here yet, and probably won’t until I get back home, but last night I did read David Winberger’s live blog of the talk and given some of these quotes from that post it’s something I’ll be watching ASAP.

“We have a lot of software that forbids journalism.” He refers to the IoS [iphone operating system] Terms of Service for app developers that includes text that says, literally: “If you want to criticize a religion, write a book.” You can distribute that book through the Apple bookstore, but Apple doesn’t want you writing apps that criticize religion. Apple enforces an anti-journalism rule, banning an app that shows where drone strikes have been.

A decade ago, metadata was all the rage among the geeks. You could tag, geo-tag, or machine-tagFlickr photos. Flickr is from the old community. That’s why you can still do Creative Commonssearches at Flickr. But you can’t on Instagram. They don’t care about metadata. From an end-user point of view, RSS is out of favor. The new companies are not investing in creating metadata to make their work discoverable and shareable.

And this is true for all things that compete with the Web. The ideas locked into apps won’t survive the company’s acquisition, but this is true when we change devices as well. “Content tied to devices dies when those devices become obsolete.” We have “given up on standard formats.” “Those of us who cared about this stuff…have lost,” overall. Very few apps support standard formats, with jpg and html as exceptions. Likes and follows, etc., all use undocumented proprietary formats. The most dramatic shift: we’ve lost the expectation that they would be interoperable. The Web was built out of interoperability. “This went away with almost no public discourse about the implications of it.”

Posted in Internet Tagged with: , , , ,

June 4th, 2007 by Michael Sauers

Time for another rant.

If you’ve ever stuck a CD into your computer and had your player program magically tell you the name of the album, the artist, and all the track information, then you’re familiar with the Gracenote (formerly known as the CD Database, CDDB). This is a generally useful collection of data about CDs totally created by volunteers. How is the data gathered? Well, whenever you put a CD into your computer and the program fails to fill in the information, you can fill it in and then submit the data back up to the larger collective. Here’s my beef: I’m sick and tired of the number of basic mistakes in the data. The number of mistakes I’ve run into recently leads me to believe there is no quality control at all. Here’s my first example:


In this case someone thought that the name of the album was “Chronicles (Disc 4)” which was “disc 1 of 1” in a set of one. Sorry folks, but the name of the Album is “Chronicles” and this happens to be disc 4 of a 6-disc set. Additionally, this is a book on CD. They don’t have composers! (Nor, folks is the name of the person reading the book considered a composer. The number of times I’ve seen that.)


My other example:


In this case someone has decided that the author’s name is “Cynthia Lennnon”, the name of the album is “Lennon 1” and that this is a “Compilation CD”. Well, I’d forgive the misspelling of her name (I’m hardly one to be able to complain about that,) but it carries through all nine CDs. They got the “disc 1 of 9” correct but still insisted on misnaming the album itself. And finally, compliation CD are albums with multiple artists, typically a different one for each track. Sorry, this is another book on CD, there is only one artist. (If it was an anthology by multiple authors then this option should be checked.)


Just had to get that off my chest.

Tagged with: ,

March 13th, 2007 by Michael Sauers

Here’s one for the metadata librarians: The Microsoft Photo Info download allows you to “Easily view and change ‘metadata’ properties in digital photographs from within Windows Explorer.”

Tagged with: , , ,

October 24th, 2006 by Michael Sauers

Clifford LynchClifford Lynch, Executive Director, Coalition of Networked Information
Challenges of Cyberinfrastructure & Choices for Libraries

  • Will not be doing a musical performance this morning
  • Observations about scholarship/teaching/learning are changing & implications of policy changes
  • What do these changes open up for librarians?
  • Cyberinfrastructure
    • most of rest of the world you can talk to people about e-science
    • practice of science has been transformed by
      • high performance computation
      • high performance networking
      • large scalle management/org/reuse of data
    • 2002 report, Atkins commission, how is science & engineering in the US changing
      • what changes need to be made?
      • “cyberinfrastructure”
      • data management
      • data visualization
      • people!
    • National Virtual Observatories
      • People not interested in IP issues w/ astronomy
      • metadata is free/bulit-in to observational equipment
      • enormous sky sruveys patch together from many different sources
      • no longer about getting observational time
      • algorhythms are being written to analyze data instead of needing more observational data
      • opens up astronomy to school kids
      • [I read about the democratization of astronomy in The Long Tail last night…]
    • how do we get data resued and preserved?
    • how do we assist the scientists to mark this data consistently?
    • first focused on engineering
    • all of this technology can also be applied to the humanities and the social sciences
    • american council of learned societies report coming out soon on this issue
      • these approaches need to be used in not just the hard sciences
    • there are controversies about whether these technologies are changing the way humanites are studied
    • “phisics changes one funeral at a time”
    • questions
      • human subjects
      • privacy
      • intellectual property
      • access to evidence
    • Could we digitize all the literature of all the cultures that have ever existed? Images?
    • Mass digitization projects
      • Microsoft
      • Google
      • European Digital Library
    • What about the “non-published” stuff? (Museums)
      • what are the roles and responsibilites of museums of publically stored materials?
      • Most stuff is pre-1923 / out of copyright
      • they’re monitizing those items
      • seems inappropriate to some
      • “public trust”
      • digitize materials to make them available to the society at large
    • Special collections
      • papers of persons and institutions
      • important to researchers
      • collections are changing in character / going digital
      • Salman Rushdie’s papers & e-mail
      • items are being created in digital form
    • Problem of scale
      • study of older times, there’s a paucity of evidence
      • modern times, too much information
  • What’s coming out of this
    • needs are shifting from getting the tech to work to informatics
      • organize data
      • backup data
      • confidentiality
    • tend to focus on big projects
      • large projects
      • large teams
      • highly organized
      • big $
    • what about the projects with small groups working on small issues
      • small staff
      • small $
      • how we support these people
    • deal with on a diciplinary basis or institutional basis?
      • Will end up with a patchwork of solutions to this problem
      • will be dynamic not static
      • fashions, interests and budgets wax & wane
  • Roles of libraries in all this
    • big research universities & info tech workforce 15yrs ago vs now
      • then: worked for central IT
      • now: more than half now in departments, schools, labs, etc. / closer to researchers & teachers
    • facing demands for data curration
      • more want to share & reuse data
      • shifting norms re: information sxchange
      • retiring faculty / what to do with all this data i’ve accumulated?
    • institutions finding that there’s “value” to the data
      • data mgt & sharing plan in grant proposals
      • how will it be preserved
      • how will it be shared
      • institutions making sure that these rules are adhered to
      • data lost in gulf disasters of last year
        • was there backups?
    • ACRL report on all this due out soon
  • who’s supposed to be doing the work?
    • new professional
      • mythological
      • “data scientist”
    • what do these people need to know
      • general?
      • diciplinary?
    • can we do this for each dicipline or more generalists or hybrid
    • major workforce issues
    • sale of problem is large
    • we’re going to need a lot of people to work on this
    • are these people librarians?
  • libraries as institutions
    • big research libraries
      • most profoundly changed already
      • strugging to keep up w/ amount of data via budgets
      • access issues
      • main role has been to apy for journals
      • journals now electronic
      • access has shifted out of the library
      • some people therefore believe access to these sources is free
      • policy choices?
        • already overstressed, can’t deal with it
        • humanities strategy, hard sciences are on their own
        • need to move resource away from published lit & into more active engagement with the scholarly process
      • three very different pathways
      • different institutions will take different paths
      • movement into more inter-institutional collaboration
      • rapid rise of virtual organizations
        • cross multiple boundaries
    • other libraries
      • huge demand for access
      • will see in many different areas
        • undergrads
        • k-12
      • will effect many libraries
  • Nature of personal history is changing
    • issue for any cultural memory orgainzation, not just libraries
    • scope of those interests are getting broader
    • rise of amature observational science
      • bottany
      • astronomy
      • biology
      • geology
    • libraries of all types need to be mindful of all the changes this type of research is bringing
      • will force strategic change

Tagged with: ,

October 23rd, 2006 by Michael Sauers

Karen Coombs, University of Houston
Jason Clark, Montana State University

Karen: Incorporating Web 2.0 into Library Web Sites

  • What is Web 2.0
    • Services to collaborate & share
    • movement toward more dynamic & interactice web
  • examples
    • social software
    • blogs
    • del.icio.is
    • wikis
    • folksonomies
    • rss
    • APIs
    • AJAX
  • Radical Decentralization
    • Web site updated and created by many different people
    • wikis & blogs
    • librariy web site allows any staff to update any content
  • Small Pieces Loosely Joined
    • Combination of different technologies
      • wikis
      • blogs
      • CMS
    • Library’s CMS made up of modules for different content types
      • content is resuable throughout the site
    • any piece of the CMS can be replaced as needed
  • Perpetual Beta
    • deploy systems early and make constant improvements
    • users are part of the development process
    • deploy new systems to a small group of staff to test and help us refine
    • gather constant input and make continuous improvements
  • Remixable Content
    • APIs allow content to be incorporated into other systems
    • library web site can incorporate content from external sources
    • content which is part of the library’s site can be used on multiple pages
    • AJAX to add database link to any page, blog, wiki
  • User as contributor
    • allows users to add and update content
      • class wikis
      • wiki model for CMS
    • instutitional repositories for scholarly content from faculty, students and staff
    • library hosts blogs
    • user tagging and review content in catalog
  • Rich User Experience
    • multimedia, interactivity, GUI-style application experience
      • video
      • sound
      • screencasts
    • personalization and customization
    • space for collaboration and interaction
      • chat
      • VoIP
  • Demo of UofH’s CMS

Jason: Social Tagging and Folksonomies in Practice

  • Agenda
    • examples
    • define
    • suggest applications
    • pros & cons
    • where can you learn more
  • Examples
    • del.icio.is
    • amazon
    • flickr
    • technorati
  • Definitions
    • Tagging
      • assigning descriptive metadata
    • Tag
      • The descriptive metadata
    • Folksonomies
      • taxonomy created by folks
  • Library use cases
    • find additional access points in library catalogs
    • assign friendly terms to indexes and databases
    • create communities of practice around library articles
    • organize a series of web pahes for a library guide
    • give users opportunities to label library web pages
    • Library applications
      • tags.library.upenn.edu
      • WPOPAC
  • Social Tagging: Why does it work?
    • embracessocal nature of the web
    • curency
    • scales to large datasets
    • offers a broader discovery model
    • adaptable
    • maps and displays simple relationships between items
  • What’s the Hitch?
    • lack of precision
    • lack of true hierarchy
    • vulnerable to “gaming” of the system
    • lack of a controlled vocabulary
    • users can be wrong
  • When to use it?
    • establish an architecture of participation
    • organize resources for a company intranet
    • allow a class to collaborate and buils a reference guide
    • build and refine library controlled vocabulary
    • anytime there is a browse or search function
  • Reference list…
    • ZoomCloud
    • TagCloud
    • tagsonomy.com (blog)
    • FreeTag
    • unalog
  • Final thoughts
    • design matters
    • scale matters
    • a new source of data

Posted in video Tagged with: , , , , ,

March 23rd, 2006 by Michael Sauers
Lorcan Dempsy, OCLC

Structured data, Web 2.0, libraries
  • Releasing value
  • We have a lot of classical bibliogrpahical data
  • Web 2.0
    • Flat applications
      • APIs
    • Rich interaction
      • Ajax
    • Data is the new functionality
      • make the data work harder
    • Participation
      • Social services
      • Mobilizing the edge
      • Contributing to create additional value
      • Co-creation
      • folksonomies
  • Lightweight service composition
    • Audience Level Web service
      • human and machine readable interface that resolves OCLC record numb er or ISBN to probable audeince level
      • uses type-of-library holdings data in WorldCat to calulate audience levels for books representented in WorldCat
      • ARL=1.0, Academic=0.66, Public=0.33, School=0
    • Greasemonkey script to expose in Amazon and Open World Cat
      • Shows “audience level” result in the “Product Details” of the book’s on Amazon.com
      • Funny example: “The Bibliography of Canadian Bibliographies”
    • Hints at level only, not definitive
    • Examples
      • The Selfish Gene = 0.6
      • The World is Flat = 0.5
      • Theories of the Information Society = 0.71
  • Ajax – rich interaction
    • Live Search
      • Quick searches target with each additional keystroke of search term/phrase
      • retreives ordered, FRBR-instpred results
      • Narrow-by Dewey attributes
      • Catalog of Phoenix PL
        • Indexed every three-word combination
        • Display results as you type
        • Ranked by holdings
        • “satisficing engine” (good enough, asap)
        • Many biblical references as has the most holdings
        • Top categories list on the side based on DDC
      • LCSH Live
  • Make data harder
    • Fiction Finder
      • Interface that supports searching browsing of fiction materials in WorldCat
      • retrieves ordered, FRBR-inspired results
      • Faceted browse
      • New interface available 1st qtr 2006
      • aplhabetical browse by genre
      • retrieves works, ordered by holdings
      • click on work, get aggregate details from multiple editions in multiple libraries
      • Narrow by type, language
      • sort by different methods (newest, oldest, etc)
      • Pick edition, link to WorldCat to find copies in libraries
      • Exposes data such as literary form and setting
      • links to related works, author, etc.
    • Audience Level
  • Participation
    • Reviews WikiD
    • Not covered due to time limitations

October 26th, 2005 by Michael Sauers

Google Print: Making the Virtual World Real

Rich Wiggins, Michigan State University

  • Cartoon: Why Google must never be bought by Microsoft
  • The idea: The library of congress metaphor
    • Schoolgirl in Carthage, TN accessing the contents of LoC (Al Gore)
  • Other projects
    • Small group of items, digitize it all
    • Words & songs of Woddie Guthrie
    • Library of the first ladies
    • Worthwhile
      • Extends access to all web users
      • Preserves fragile content
    • Why not all of LoC?
  • LoC numbers
    • What are you measuring
    • What resolution
    • What color depth
    • What format
  • LoC books only
    • 20-28 million itmes
    • 2-7 million unqiqe bound volumes
    • 17-20 terabytes
  • The idea
    • Disk is cheap
    • Digital imaging is getting cheaper
    • Broadband is relatively cheap
    • Labor can be relatively cheap
      • Automation can help
  • The germ of the idea
    • Technology is rapidly improving
    • Flatbed scanner is the wrong tool
  • Cost
    • Aprox 0.05 or 0.01 per page/image
    • $10-12/hour labor, mileage, meals, lodging
  • Digitize the LoC
    • Aprox $2.5 billion dollars
  • OCR
    • Getting better and faster
    • Digitize it now, OCR on demand
  • Storage costs are plummeting
    • RAID arrays
    • Under 50 cents per gigabyte
  • Inventory/cataloging costs
    • Physical shelf space, $40/item
    • If it’s worth purchasing, it’s worth digitizing
  • Barrier: Rights Management
    • Once digitized, can we deliver it?
    • The paradox of latent value
    • Aprox 1/3 of LoC print collection is now in the public domain
  • Barrier: “The benefit doesn’t justify the cost”
    • It’s more cost effective to digitize everything than “just the good stuff”
  • Encourages preservation
    • Deacidification
    • Fire, digital is backup
  • Benefit: access
  • Benefit: Improved digitizing technology
    • The “ideal” book scanner
  • Benefit: Standards
    • Open XML
    • Cross document metadata
  • Benefits: Large-Scale Rights Management
    • 20 million volume collection will force the issue of fair use
    • Today, Disney defines fair use
  • Digital library projects: Think Big!
    • Google project teaches this
  • Apollo Program Analogy
  • Google’s vision will be realized by a forward thinking company and not the government
  • Why trust Google
    • They’re smart
    • They’re agile and innovative
    • They show no fear
    • They’re worth $100 billion
    • They won’t do this alone

Google: Catalyst for Digitization or Library Destruction?

Roy Tennant

Roy: More access is better. Easier access is better. There’s more room for players and that’s a good thing. It’s good that Google is digitizing things. There’s room for everyone to be involved.

  • Cartoon: Google, Devil or Merely Evil?
  • Scary monster #1: A Copyright Cataclysm
    • Libraries have long enjoyed “fair use” protection
    • Google’s attempt to shield themselves under fair use may ruin it for us all
  • Scary monster #2: Closed Access to Open Material
    • They’re probably going to fix this problem
    • Google print copies are locked to certain ones from certain publishers
    • Public domain books are locked into showing just a certain number of pages
    • Link to buy book but no link to the library
  • Scary monster #3: Blind Wholesale Digitization
    • Large research collections are not weeded by policy
    • “We keep all kinds of crap”
    • Outdated material
    • “no more a good thing than buying books based on color”
    • Copyright will restrict access to up-to-date, recent material
    • Users will end up with the old crap since it’s more available
    • Open Content Alliance is focusing on collections
  • Scary monster #4: Ads
    • Most g]Google profit coming from ads
    • Needs eyeballs
    • Ads for antidepressants next to Hamlet
    • Viagra next to Lolita
    • They’re responsible to their stockholders, not the public
  • Scary monster #5: Secrecy
    • Agreements between Google and libraries have mostly been kept secret
    • The libraries could not talk to each other
    • U of Mich revealed after FOIA request
    • OCA is more open
    • Rumors indicate UM has best agreement from lib perspective, others have less favorable agreements
    • But we don’t know, nobody’s talking
  • Scary monster #6: Longevity
    • Google, Enron, WorldCom in common?
      • Public companies motivated by profit
      • Two are now gone
      • Size doesn’t not shield you
    • What do Google and libraries have in common
      • Both on Earth
    • Harvard library, 400 years old
    • Google 7 years old
    • Who should we trust?

Adam Smith, Product Manager for Google Print and Google Scholar

  • Welcome all comments, it’s what makes our products better
  • Better to have the information out there to see how people use and access it
  • Walk a difficult path to make many parties happy
  • Want to make the information accessible – at least discoverable
  • Copyright is an issue
  • This is just a small piece of the puzzle as ambitious as this project sounds
  • Welcome other efforts and they’re positive for the community
  • Publisher program uses a destructive scanning technology
  • Library version is non-destructive, they created it, and secret
  • New version of Google privacy policy has just been released

Tagged with: , ,

October 24th, 2005 by Michael Sauers

Jenny Levine, The Shifted Librarian
Jessamyn West, Librarian.net

Flickr, Tagging, and the F-Word (Jessamyn)

  • Features
    • Easy upload
    • [M: Jessamyn just said “groks”]
    • easy find
    • easy share
  • Tagging
    • Metadata by me
    • …by my family & friends
    • …by anyone
    • Tagging vs Classification
      • Can co-exist
      • Must regognise the differences
      • it’s not a fight
  • Folksonomy
    • user created metadata
    • grassroots community classification of digital assets
    • flat namespace
    • not mutually exclusive with other systems
    • helps with scalability problems
    • involves the users in the problems
    • does have the “synonym problem”

del.icio.us (Jenny)

  • social bookmarking
  • the bookmarking version of flickr
  • tagged boomkarks
  • RSS feeds of tags and users
  • You can search your bookmarks but others can’t search your bookmarks
  • Use to research new topics
    • These are the sites are reading and are important enough to bookmark
  • Hacks
    • ToRead
    • ToRent
    • ForName (private = for:username)
    • Download media in iTunes
  • del.icio.us for your library
    • LaGrangeParkLibrary (for the ref desk)
    • Thomas Ford Memorial Library (aaron schmidt, displaying the feed back onto the Web site)
  • Floksonomies sites
    • CiteULike (accademic)
    • last fm (music)
    • 43 things (what do you want to do, meet others who want to do the same thing)
    • 43 places (where do you want to visit)
    • Technorati (blogs)
    • MetaFilter
    • Yahoo! Search
    • Yummy! (hosts PDFs)
    • Amazon.com search inside the book concordanance
    • bookswelike.net
    • LibraryThing

Tagged with: , , , ,

February 16th, 2005 by Michael Sauers

I hear someone complaining that they don’t understand much of what I post about so therefore I’ve just got to post something über techie in response…

The eXtensible Past: The Relevance of the XML Data Format for Access to Historical Datasets and a Strategy for Digital Preservation

This article reports on the X-past project carried out by the Netherlands Historical Data Archive (NHDA). The main goal of the project has been to investigate how the XML data format can improve the durability of and access to historical datasets. The X-past project furthermore investigated whether it would be possible to provide access to historical datasets by means of the “Open Archives Initiative—Protocol for Metadata Harvesting” (OAI-PMH). Within the framework of the X-past project a prototype information system has been developed and a number of users have been asked to report on usability issues concerning this system.

Thanks Rosario

Tagged with:

November 17th, 2004 by Michael Sauers

IL04: “technology and collaboration”

I ended up having to sneak out a few minutes early from this presentation as it ended at 11.15 and I’m giving my Data Visualization talk at 11.30. As expected from anything in which Stephen Abrams (VP of Innovation for SRSI and the current president of the Canadian Libraries Association) speaks this presentation was a hoot. (Sorry, that’s the best word I can come up with for describing that hilarious Canadian.) The presentation was basically an overview of collaboration technologies that can and are being used in libraries today. His simple truth: “What matters is not what you have but how you use it.” Another interesting point: People who tend to be more liberal or open minded want information that challenges their perceptions, more conservative people want information that confirms their perceptions. The new collaborative technologies that are out there include, on the user side: Web conferencing, presence management, real-time translation, real-time speech-to-text, collaboration sites and wikis. On the information provider side there’s Web services, RSS feeds, learning objects, digitization, and faceted metadata. Another one of his best points: “Collaboration is an environment, not and end in itself.”

The single best thing I learned from Stephen at this conference was “I may have made up the word, but since I’ve added it to the spell-checker, that makes it official.”

Tagged with: ,