Friday, November 23, 2012

Reading Notes - Week of November 26, 2012

Schrier, Robert A. "Digital Librarianship & Social Media: The Digital Library as a Conversation Facilitator." D-Lib Magazine 17. No. 7/8 (2011).


  • librarian have become aware that digital collections are often underutilized by their intended user base
  • one of the best ways to promote digital library collections is through social media, but many librarians do not use it correctly
  • five principles to integrate a social networking plan into a digital library context
    • listen
      • find out where the conversations are and what is being said
      • identify the people central to the conversation and engage with them
      • platforms for listening - Google Alerts, RSS feeds, Twitter, etc.
      • understand language and cultural norms
    • participation
      • social networking allows digital librarians to put a human face to their collections
      • establish trust with users
    • Transparency
      • transparency reinforces positive relationships with users
    • Policy
      • libraries should consider developing a social media policy
    • Planning
      • libraries should plan social networking ahead of time
Allan, Charles. "Using a Wiki to Manage a Library Instruction Program: Sharing Knowledge to Better Serve Patrons." C&RL News 68. No. 2 (2007).

  • creating wikis is easy
  • free wikis are not very complex, but can be effective
  • the use of wikis in the workplace is just beginning to catch on, and the deployment of wikis in libraries is starting
  • wikis in libraries are used to manage a variety of information
Arch, Xan. "Creating the Academic Library Folksonomy: Put Social Tagging to Work at Your Institution." C&RL News 68. No. 2 (2007).
  • social tagging allows an individual to create bookmarks (tags) for websites and save them online
  • websites allow these bookmarks to be shared and new resources to be discovered
  • this should be brought into libraries
I watched Jimmy Wales' TED Talk on the birth of Wikipedia.

Friday, November 16, 2012

Reading Notes - Week of November 19, 2012

 Hawking, David. "Web Search Engines: Part 1." Computer 39. no. 6 (2006): 86-88.

  • large search engines operate out of geographically distributed data centers for redundancy
  • there are hundreds of thousands of servers at these centers
  • within each data center groups of servers can be dedicated to specific functions, such as web crawling
  • large scale replication is necessary
  • the simplest crawling algorithm uses queues of URLs and a mechanism to determine whether it has seen the URL before
  • crawling algorithms must address speed, politeness, excluded content, duplicate content, continuous crawling and spam rejection
Hawking, David. "Web Search Engines: Part" Computer 39. no. 8 (2006): 88-90.

  • search engine use an inverted file to rapidly identify indexing terms
  • an inverted file is a concatenation of the posting lists for each term
  • indexers create inverted files in two phases
    • scanning - indexer scans the text of each input document
    • inversion - indexer sorts the files into term number order
  • real indexers have to deal with scaling, term lookup, compression, searching phrases, anchor texts, link popularity scores, and query-independent scoring
  • query processing algorithms
    • query processors looks up each query term and locates its posting list
Shreeves, Sarah, Thomas G. Habing, Kat Hagedorn, and Jeffery A. Young. "Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting." Library Trends 53. no. 4 (2005): 576-589.

  • the Protocol for Metadata Harvesting is a tool developed to facilitate interoperability between different collections of metadata based on common standards
  • the OAI world is divided into data providers or repositories and service providers or harvesters
  • OAI requires data providers to expose metadata in at least unqualified Dublin Core
  • the Protocol can provide access to parts of the "invisible Web" that not easily accessible to search engines
Bergman, Michael K. "The Deep Web: Surfing Hidden Value," Journal of Electronic Publishing 7. no.1 (2001).

  • deep web sources store their content in searchable databases that only produce results dynamically in response to a direct request
  • deep web is much larger than the "surface" web


Thursday, November 8, 2012

Muddiest Point - Week of November 5th, 2012

I do not have a muddiest point for this week...although that might change once I really begin to use XML.

Also, I apologize for being a bit tardy with this post. I forgot that yesterday was Wednesday!

Thursday, November 1, 2012

Reading Notes - Week of November 5, 2012

Martin Bryan. An Introduction to the Extensible Markup Language (XML)

  • What is XML?
    • XML is subset of the Standard Generalized Markup Language (SGML) designed to aid the interchange of structured documents over the Internet
    • XML files always clearly mark where the start and end of each of the logical parts (called elements) of an interchanged document occurs
    • it restricts the use of SGML constructs to ensure that fall back options are available when access to certain components of the document is not currently possible over the Internet
    • through document type definition, XML allows user to ensure that each component of document occurs in a valid place within the interchanged data stream
      • XML does not require the presence of DTD
    • XML allows users to link multiple files together to form compound documents, identify when illustrations are to be incorporated into text files, provide processing control information to support programs, and add editorial comments to files
    • XML is not designed to be a standard way of coding text
  • The Components of XML?
    • based on the concept of documents and comprised of entities
    • each entity can contain one or more element
    • each element has certain attributes which describe how it should be processed
  • How is XML Used?
    • users need to know how the markup tags are delimited from normal text and in which order the various elements should be used in
    • Systems that understand XML can provide users with lists of the elements that are valid at each point in the document, and will automatically add the required delimiters to the name to produce a markup tag
    • When a system does not understand XML, users can enter the XML tags manually for later validation. 
Uche Ogbuji. A survey of XML standards: Part 1. January 2004.

There are many different standards for XML. Be aware of this when using XML, or reviewing software that uses XML.

I have reviewed Extending Your Markup: An XML Tutorial by Andre Bergholz and W3School WML Schema Tutorial, and will refer to them when working with XML in this coming week.

Muddiest Point - Week of October 29, 2012

I have no muddiest point for this week.

Friday, October 26, 2012

Reading Notes - Week of October 29, 2012

W3 School Cascading Style Sheet Tutorial: www.w3schools.com/css/

  • CSS stands for Cascading Style Sheets
  • styles define how to display HTML elements
  • HTML was never intended to contain tags for formatting a document; rather it is intended to define the content of a document
    • when tags for formatting began to be added to HTML, CSS was created so that all formatting could be removed from an HTML document and stored in a separate CSS file

I have reviewed the rest the W3 CSS tutorial

I have reviewed the CSS Tutorial, starting with HTML and CSS

Hakon Lie and Bert Bos, Cascading Style Sheets, Designing for the Web. 2nd ed. Addison Wesley, 1999.

  • a rule is a statement about one stylistic aspect of one or more elements
  • a style sheet is a set of one or more rules that apply to an HTML document
  • a rule consists of a selector and a declaration
    • selector - link between HTML document and the style; specifies what elements are affected by the declaration
    • declaration - part of the rule that sets forth what the effect will be
  • a declaration has two parts
    • part before the colon = property
      • quality or characteristic that something possesses
    • part after the colon = value
      • precise specification of the property
  • a selector may have may have more than one declaration
  • identical selectors can be grouped together on one declaration
  • the style sheet must be "glued" to the document in order to affect the HTML


Wednesday, October 24, 2012

Friday, October 19, 2012

Reading Notes - Week of October 22, 2012

I have reviewed the W3Schools HTML tutorial and the HTML cheat-sheet.

F.E. Pratter. "Introduction to HTML." From Web Development With SAS By Example. 3rd Edition.

  • HTML = hypertext markup language 
  • a webpage is a ASCII text file with markup tags inserted to display and format the text
Doug Goans, Guy Leach, Teri M. Vogel. "Beyond HTML: Developing and Re-imagining Library Web Guides in a Content Management System." Library Hi Tech 24. no. 1. (2006): 29-53

  • content management = the process of collecting, managing, and publishing content
  • in content mangement systems, content is separated from layout and design, so users do not have to use HTML
  • CMS allows users to control how content is distributed and presented
  • there are many different types of CM systems

Friday, October 12, 2012

Notes - Week of October 15, 2012

Tyson, Jeff. "How Internet Infrastructure Works."

Every computer that is connected to the internet is part of a network, which is provided by an internet service provider (ISP). Large communication companies have points of presence (POP), which is a place where local users can access the company's network. There is no overall controlling network; there are several high-level networks connecting through network access points. Networks depend on network access points, backbones, and routers to communicate. Every machine on the internet has a unique identifying number called an IP address. The domain name system was created so IP addresses would not have to be memorized. All machines on the internet are either servers or clients. Servers have static IP addresses that rarely change.

"Dismantling Integrated Library Systems."

  • the introduction of the web forced change onto integrated library systems
  • old softwares were updated to meet customer demand
  • creating an entirely new ILS is unrealistic
  • integration with the web is key

Sergey Brin and Larry Page. "The Genesis of Google."

  • data has to move all over the world quickly, and it is very difficult to do so without lag
  • searching is tricky since search engines are not "intelligent"



Muddiest Point - Week of October 8/9, 2012

I'm still a bit confused by the concept of network architecture. Is it the relationship between devices in a network?  

Wednesday, October 3, 2012

Muddiest Point - Week of October 1, 2012

I do not have a muddiest point for this week, but I do find metadata quite fascinating!

Wednesday, September 26, 2012

Muddiest Point - Week of September 24, 2012

I am confused about what a key/primary key/foreign key is, and what the role they play in a database is.

Friday, September 21, 2012

Reading Notes - Week of September 24, 2012


Wikipedia Entry: “Database”

  • A database is an organized collection of data
  • The data is usually organized to model relevant aspects of reality in a way that supports processes requiring the information
  • Term database system implies that the data is managed to some level of quality
  • Well-known database management systems include Oracle, IBM DB2, Microsoft SQL, Server, Microsoft Access, PostreSQL, MySQL, and SQLite
  • A way to classify databases involves the type of their contents (ex. Bibliographic, document-text) or by their application area (ex. Accounting, banking)
  • Relational model – applications should search for data by content, rather than by following links
  • Relational database systems are the current dominant system
  • General purpose DBMS aim to satisfy as many applications as possible, but they are not always the best solution
  • Major database usage requirements
    • Functional requirements
    • Defining the structure of the data
    • Manipulating the data
    • Protecting the data
    • Describing processes that use the data
    • Operational requirements
    • Availability
    • Performance
    • Isolation between users
    • Recovery from failure and disaster
    • Backup and restore
    • Data independence
  • Current data models
    • Relational model
    • Entity-relationship model
    •  Object model
    • object-relational model
    • XML as a database model
  • Database design is done before building it to meet the needs of end-users within a given application tat the database is intended to support


Wikipedia Entry: “Entity-Relationship Model”
  • The entity-relationship model is an abstract way to describe a database
  • Starts in a relational database with data stored in tables
  • Data in the tables point to data in other tables
  • Two levels of the ER Model
    • Conceptual data model
    • Logical data model
  • An entity can be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified
Phlonx, " Database Normalization Process"

Database normalization relies on three forms; no repetition of elements or groups of elements, no partial dependencies on a concatenated key, and no dependencies on non-key attributes.

Wednesday, September 19, 2012

Muddiest Point - Week of September 17, 2012

I have heard that DNG is one of the best formats to use for photos since it results in almost no data loss. If this is true, why is the TIFF format still preferred for archival use?

Thursday, September 13, 2012

Reading Notes - Week of September 17, 2012

"Data Compression" from Wikipedia

  • Data compression involves encoding information using fewer bits that the original representation
  • Two types of compression
    • Lossless
      • Reduces bits by identifying and eliminating statistical redundancy
    • Lossy
      • Reduces bits by identifying marginally important information and removing it
  • Formally known as source-coding
  • Helps reduce resource usage (ex. Data storage space, transmission capacity)
  • Theoretical background
    • Lossless – Information Theory
    • Lossy – rate-distortion theory
"Data Compression Basics"

  • Part 1: Lossless Data Compression
    • Fundamental idea behind data compression is to take a given representation of information and replace it with a different representation that takes up less space, from which the original data can later be recovered
    • If the recovered information is guaranteed to be exactly identical to the original, then the compression method is described as “lossless”
    • A simple lossless compression algorithm is “run-length encoding” (RLE)
      • Replaces long runs of characters with a single character and the length of the run
    • Lempel-Ziv compressor family
    • Entropy coding
      • Assigns codes to blocks of data in a way that the length of the code is inversely proportional to the statistical probability of the block of data
    • Prediction and error coding
  • Part 2: Lossy Compression of Stills and Audio
    • important to distinguish data from information
    • fundamental idea behind lossy compression is preserving meaning rather than preserving data
    • by allowing for some deviation from the source data when encoding patterns, lossy compression greatly reduces the amount of data required to describe the “meaning” of the source media
    • lossy compression is ideally applied to information that is meant to be interpreted by a reasonably sophisticated “meaning processor”(human, image recognition software, etc.) that looks at a representation or rendering of the data rather than the data itself
Edward A. Galloway. “Imaging Pittsburgh: Creating a Shared Gateway to Digital Image Collections of the Pittsburgh Region.”

  • The main focus of the project was to create a single Web gateway for the public to access thousands of visual images held in the collections of the Pitt Archives Service Center, CMOA, and the Historical Society of Western PA
  • The content partners were responsible for selections of collections/images, describing/cataloging images, digitization, and delivering images/metadata to DRL
  • DRL was responsible for providing access to the image collections via DLXS middleware
  • Characteristics of the Web gateway
  • Conduct keyword searches across all image collections
  • Browse images
  • Read about the collections and their contents
  • Explore images by time/place/theme
  • Order image reproductions
  • Communication challenges
  • Selection challenges
  • Metadata challenges
  • Project-wide vs local needs
  • Workflow challenges
  • Website development challenges

I was not able to access "Youtube and Libraries: It Could be a Beautiful Relationship" by Paula L. Webb.