Research Links: List of Databases, Course Reserves

Quick Search: Articles, newspapers, books and ebooks, videos and more. Results primarily available online but may also include books available in the library or articles that can be requested for email delivery from ILLiad.
Books: Print and online books available from UNLV Libraries or by ILLiad request.
Articles: Articles from academic journals, magazines and newspapers.

Library Information: Pages on library web site, for example research guides, library policies and procedures, hours and events.

The importance of stable URIs on the library website

By Alex Dolski on July 10, 2009 3:31 PM | Permalink
Our move to a CMS provides an ideal opportunity to examine issues of resource identifier (URI) stability, which virtually all of the locally-hosted resources on our website currently lack.

What is a URI?


You probably already know what a URL is, and a URI is similar. Whereas speaking of a URL emphasizes the location of a resource, speaking of a URI emphasizes the resource itself. In practice, these may be the same thing. It turns out that on the web, the one sure-fire way of uniquely identifying something is by its URL. So, that URL happens to be its URI.

But resources can move. Their URLs can change. Although a particular resource's URL will always be unique, it may not always remain the same. Unfortunately, as is often the case, when a resource's URL changes, its URI changes as well. Which means that when a newspaper (for example) has an article with a URL like this:


And then changes its CMS, causing the URL to change to:


Both the URL and URI change. That is, the identifier changes even though the resource that it represents didn't change - only its location did. A whole lot of links break as a result.

It doesn't have to be this way!

Implications of a CMS on linking

A CMS will negate internal linking issues (i.e. linking from page to page within the library website) by transparently assigning all resources unique IDs which it will use to construct links dynamically. Our resources will still have URLs, but they will change, causing some disruption (link breakage). This is likely to affect:

  • Browser bookmarks
  • Search engine rankings
  • Internal hyperlinks (from one of our pages to another of our pages)
  • Hyperlinks from other sites to pages on our site
  • Citations pointing to resources on our site

One of the ways to avoid this disruption (other than not going ahead with it) is to use HTTP redirects, which send a "30x Moved Permanently" message telling the web browser that a resource has moved and to what URL. The problem with building redirects for every resource is that:

  • It's hard to compile a list of exactly what we have to redirect to where
  • It's a pain to write a whole bunch of redirect scripts and/or set up redirect maps on the web server
  • It's messy to merge an old URI scheme in with a new URI scheme
  • Since we are continuing to maintain the old resource locations, they never fully "go away;" we have to continue to maintain the redirect table over time, tracking everything that has ever moved

Making URIs more stable

One way of stabilizing URIs is to hide resource implementation details. What does that mean? Here is an example of the URI of a resource on the library website:


This URI is bad because it is revealing; it reveals the fact that the resource it identifies is being served from an HTML file in a particular directory on our web server. It locates the resource, but does not provide a very stable identifier for it. What if we were to want to change the name of the undergrad.html file, or move it, or change the encoding of its content (for example, HTML to XML) or make it a server-side script (for example, PHP with a .php extension)? What would happen is that the resource would have to get a new URL and this URL would break.

For another example, here is a page within the Architecture Studies Library's Las Vegas Architects and Buildings Database version 2 (LVABD2):

Let's break that down. From left to right, we have:

  • UNLV library web server domain name
  • ASL website subdirectory
  • LVABD2 subfolder (called "archdb2" since it replaces an earlier version which was called "archdb")
  • An "index.php" script which is used internally by the LVABD2 web application
  • A "projects/view" parameter which tells the LVABD2 application that we want to view...
  • ...an entity in the database with ID 251

Now, this is a totally functional URL which, although not very good, is not the worst URL ever. But it requires us to ask what would happen in the future if we were to change the specific implementation of the LVABD2 application to one that did not use the same "index.php/projects/view/251" parameter scheme. In fact, we had to deal with this very issue last summer when we deployed LVABD2 to replace LVABD1. If you access the URL of the previous version:

http://www.library.unlv.edu/arch/archdb/

You'll notice that it loads a page that links to the home page of the new version. This is a file that we had to create. But it turns out that only this page redirects; none of the project view pages etc. within the old application redirect to their new equivalents. They are, in fact, all gone. It turns out that by upgrading to LVADB2 and its new URI scheme, we broke the links for everyone anywhere on the Internet who was linking to any of the resources within LVABD1 - even though the availability of the resources never changed at all. Whoops! (By the way, this is sort of my fault.)

When links break all over the place, people get annoyed. This causes us to avoid changing resource locations and implementations, which leads to our website falling behind the curve and becoming difficult to manage and use. Improvements in web technology frequently lead to gains in efficiency, productivity, ease of use, ease of management, and improvement in available features. In order to benefit from these improvements, websites have to change by adopting this new technology. But the URLs of their resources don't have to change, as long as those resources are kept separate from the technology used to provide access them.

So, getting back to the LVABD example:


What happens if we remove the implementation details of the LVABD web app from this URI? How about this hypothetical improved URI:


Now, the great thing about this URI is that it's totally abstracted away from the resource it represents. There is not really a "projects" subdirectory within the "arch" directory. There may not even be an "arch" directory at all. As a patron, why should I care what directories there are, or that the LVABD2 application is written in CakePHP or Rails or TurboGears or whatever? I don't, and in any case, it's none of my business! I'm only requesting the content corresponding to the URI that I provide. In fact, in this example, LVABD hasn't changed at all; it's still there in the same location. We have simply configured the web server to automatically serve the LVABD URL equivalent whenever it receives a request for a URL matching the pattern of the "improved" version. The user requests a resource; we figure out how to deliver it. The URI can remain stable for as long as we have a web server and electricity. It can be persistent.

In summary

  • We should think of all of the information on the library website in terms of resources, of which a web page is only one possible manifestation. (Others might include XML/JSON output for an iPhone app; vCard output for the staff directory; iCalendar output for the library calendar; etc.)
  • The CMS is going to break a great deal of links on the library website. If we take the proper precautions, this will only have to happen once. If not, we will continue to get bit again and again by broken links.
  • This is not the CMS's fault; it's our fault for not abstracting away resource implementations from their URLs and designing a stable URI scheme a long time ago.
  • In the future, we should think about mapping our resources to stable URIs and consider planning a site-wide URI schema into which our resources can fit. A CMS will greatly simplify this process, automating most of what would otherwise be painstaking manual work.

Comments

Submitted by pfinley on
This is really helpful - thanks for an excellent post!
Submitted by Akash Singh (not verified) on
The most reputed, professional Packers and Movers company in Pune.Visit:- http://www.expert5th.in/packers-and-movers-pune/ The most reputed, professional Packers and Movers company in Bangalore.Visit:- http://www.expert5th.in/packers-and-movers-bangalore/ The most reputed, professional Packers and Movers company in Hyderabad.Visit:- http://www.expert5th.in/packers-and-movers-hyderabad/
Submitted by Akash Singh (not verified) on
The most reputed, professional Packers and Movers company in Delhi.Visit:- http://www.expert5th.in/packers-and-movers-delhi/ The most reputed, professional Packers and Movers company in Gurgaon.Visit:- http://www.expert5th.in/packers-and-movers-gurgaon/ The most reputed, professional Packers and Movers company in Mumbai.Visit:- http://www.expert5th.in/packers-and-movers-mumbai/
Submitted by Thales Pırlanta (not verified) on
thanks, pırlanta tektaş https://www.thalespirlanta.com
Submitted by faisal khan (not verified) on
Great article for <a href="https://www.sagipl.com/php-development/">PHP Development</a> thanx for sharing it
Submitted by khadeer (not verified) on
nice information

Pages

Add new comment