Carley's LIS 2600 Blog: 2012

Tuesday, November 27, 2012

Muddiest Point - Week of November 27, 2012

I do not have a muddiest point for this week.

Friday, November 23, 2012

Reading Notes - Week of November 26, 2012

Schrier, Robert A. "Digital Librarianship & Social Media: The Digital Library as a Conversation Facilitator." D-Lib Magazine 17. No. 7/8 (2011).

librarian have become aware that digital collections are often underutilized by their intended user base
one of the best ways to promote digital library collections is through social media, but many librarians do not use it correctly
five principles to integrate a social networking plan into a digital library context

listen

find out where the conversations are and what is being said
identify the people central to the conversation and engage with them
platforms for listening - Google Alerts, RSS feeds, Twitter, etc.
understand language and cultural norms

participation

social networking allows digital librarians to put a human face to their collections
establish trust with users

Transparency

transparency reinforces positive relationships with users

Policy

libraries should consider developing a social media policy

Planning

libraries should plan social networking ahead of time

Allan, Charles. "Using a Wiki to Manage a Library Instruction Program: Sharing Knowledge to Better Serve Patrons." C&RL News 68. No. 2 (2007).

creating wikis is easy
free wikis are not very complex, but can be effective
the use of wikis in the workplace is just beginning to catch on, and the deployment of wikis in libraries is starting
wikis in libraries are used to manage a variety of information

Arch, Xan. "Creating the Academic Library Folksonomy: Put Social Tagging to Work at Your Institution." C&RL News 68. No. 2 (2007).

social tagging allows an individual to create bookmarks (tags) for websites and save them online
websites allow these bookmarks to be shared and new resources to be discovered
this should be brought into libraries

I watched Jimmy Wales' TED Talk on the birth of Wikipedia.

Wednesday, November 21, 2012

Muddiest Point - week of November 19, 2012

I have no muddiest point for this week.

Friday, November 16, 2012

Reading Notes - Week of November 19, 2012

Hawking, David. "Web Search Engines: Part 1." Computer 39. no. 6 (2006): 86-88.

large search engines operate out of geographically distributed data centers for redundancy
there are hundreds of thousands of servers at these centers
within each data center groups of servers can be dedicated to specific functions, such as web crawling
large scale replication is necessary
the simplest crawling algorithm uses queues of URLs and a mechanism to determine whether it has seen the URL before
crawling algorithms must address speed, politeness, excluded content, duplicate content, continuous crawling and spam rejection

Hawking, David. "Web Search Engines: Part" Computer 39. no. 8 (2006): 88-90.

search engine use an inverted file to rapidly identify indexing terms
an inverted file is a concatenation of the posting lists for each term
indexers create inverted files in two phases

scanning - indexer scans the text of each input document
inversion - indexer sorts the files into term number order

real indexers have to deal with scaling, term lookup, compression, searching phrases, anchor texts, link popularity scores, and query-independent scoring
query processing algorithms

query processors looks up each query term and locates its posting list

Shreeves, Sarah, Thomas G. Habing, Kat Hagedorn, and Jeffery A. Young. "Current Developments and Future Trends for the OAI Protocol for Metadata Harvesting." Library Trends 53. no. 4 (2005): 576-589.

the Protocol for Metadata Harvesting is a tool developed to facilitate interoperability between different collections of metadata based on common standards
the OAI world is divided into data providers or repositories and service providers or harvesters
OAI requires data providers to expose metadata in at least unqualified Dublin Core
the Protocol can provide access to parts of the "invisible Web" that not easily accessible to search engines

Bergman, Michael K. "The Deep Web: Surfing Hidden Value," Journal of Electronic Publishing 7. no.1 (2001).

deep web sources store their content in searchable databases that only produce results dynamically in response to a direct request
deep web is much larger than the "surface" web

Wednesday, November 14, 2012

Muddiest Point - Week of November 12, 2012

I do not have a muddiest point for this week.

Thursday, November 8, 2012

Muddiest Point - Week of November 5th, 2012

I do not have a muddiest point for this week...although that might change once I really begin to use XML.

Also, I apologize for being a bit tardy with this post. I forgot that yesterday was Wednesday!

Thursday, November 1, 2012

Reading Notes - Week of November 5, 2012

Martin Bryan. An Introduction to the Extensible Markup Language (XML)

What is XML?

XML is subset of the Standard Generalized Markup Language (SGML) designed to aid the interchange of structured documents over the Internet
XML files always clearly mark where the start and end of each of the logical parts (called elements) of an interchanged document occurs
it restricts the use of SGML constructs to ensure that fall back options are available when access to certain components of the document is not currently possible over the Internet
through document type definition, XML allows user to ensure that each component of document occurs in a valid place within the interchanged data stream

XML does not require the presence of DTD

XML allows users to link multiple files together to form compound documents, identify when illustrations are to be incorporated into text files, provide processing control information to support programs, and add editorial comments to files
XML is not designed to be a standard way of coding text

The Components of XML?

based on the concept of documents and comprised of entities
each entity can contain one or more element
each element has certain attributes which describe how it should be processed

How is XML Used?

users need to know how the markup tags are delimited from normal text and in which order the various elements should be used in
Systems that understand XML can provide users with lists of the elements that are valid at each point in the document, and will automatically add the required delimiters to the name to produce a markup tag
When a system does not understand XML, users can enter the XML tags manually for later validation.

Uche Ogbuji. A survey of XML standards: Part 1. January 2004.

There are many different standards for XML. Be aware of this when using XML, or reviewing software that uses XML.

I have reviewed Extending Your Markup: An XML Tutorial by Andre Bergholz and W3School WML Schema Tutorial, and will refer to them when working with XML in this coming week.

Muddiest Point - Week of October 29, 2012

I have no muddiest point for this week.

Friday, October 26, 2012

Reading Notes - Week of October 29, 2012

W3 School Cascading Style Sheet Tutorial: www.w3schools.com/css/

CSS stands for Cascading Style Sheets
styles define how to display HTML elements
HTML was never intended to contain tags for formatting a document; rather it is intended to define the content of a document

when tags for formatting began to be added to HTML, CSS was created so that all formatting could be removed from an HTML document and stored in a separate CSS file

I have reviewed the rest the W3 CSS tutorial

I have reviewed the CSS Tutorial, starting with HTML and CSS

Hakon Lie and Bert Bos, Cascading Style Sheets, Designing for the Web. 2nd ed. Addison Wesley, 1999.

a rule is a statement about one stylistic aspect of one or more elements
a style sheet is a set of one or more rules that apply to an HTML document
a rule consists of a selector and a declaration

selector - link between HTML document and the style; specifies what elements are affected by the declaration
declaration - part of the rule that sets forth what the effect will be

a declaration has two parts

part before the colon = property

quality or characteristic that something possesses

part after the colon = value

precise specification of the property

a selector may have may have more than one declaration
identical selectors can be grouped together on one declaration
the style sheet must be "glued" to the document in order to affect the HTML

Wednesday, October 24, 2012

Muddiest Point - Week of October 22, 2012

I do not believe I have a muddiest point for this week.

Friday, October 19, 2012

Reading Notes - Week of October 22, 2012

I have reviewed the W3Schools HTML tutorial and the HTML cheat-sheet.

F.E. Pratter. "Introduction to HTML." From Web Development With SAS By Example. 3rd Edition.

HTML = hypertext markup language
a webpage is a ASCII text file with markup tags inserted to display and format the text

Doug Goans, Guy Leach, Teri M. Vogel. "Beyond HTML: Developing and Re-imagining Library Web Guides in a Content Management System." Library Hi Tech 24. no. 1. (2006): 29-53

content management = the process of collecting, managing, and publishing content
in content mangement systems, content is separated from layout and design, so users do not have to use HTML
CMS allows users to control how content is distributed and presented
there are many different types of CM systems

Tuesday, October 16, 2012

Muddiest Point - Week of October 15th, 2012

I have no muddiest point for this week.

Friday, October 12, 2012

Notes - Week of October 15, 2012

Tyson, Jeff. "How Internet Infrastructure Works."

Every computer that is connected to the internet is part of a network, which is provided by an internet service provider (ISP). Large communication companies have points of presence (POP), which is a place where local users can access the company's network. There is no overall controlling network; there are several high-level networks connecting through network access points. Networks depend on network access points, backbones, and routers to communicate. Every machine on the internet has a unique identifying number called an IP address. The domain name system was created so IP addresses would not have to be memorized. All machines on the internet are either servers or clients. Servers have static IP addresses that rarely change.

"Dismantling Integrated Library Systems."

the introduction of the web forced change onto integrated library systems
old softwares were updated to meet customer demand
creating an entirely new ILS is unrealistic
integration with the web is key

Sergey Brin and Larry Page. "The Genesis of Google."

data has to move all over the world quickly, and it is very difficult to do so without lag
searching is tricky since search engines are not "intelligent"

Muddiest Point - Week of October 8/9, 2012

I'm still a bit confused by the concept of network architecture. Is it the relationship between devices in a network?

Wednesday, October 3, 2012

Muddiest Point - Week of October 1, 2012

I do not have a muddiest point for this week, but I do find metadata quite fascinating!

Wednesday, September 26, 2012

Muddiest Point - Week of September 24, 2012

I am confused about what a key/primary key/foreign key is, and what the role they play in a database is.

Friday, September 21, 2012

Reading Notes - Week of September 24, 2012

Wikipedia Entry: “Database”

A database is an organized collection of data
The data is usually organized to model relevant aspects of reality in a way that supports processes requiring the information
Term database system implies that the data is managed to some level of quality
Well-known database management systems include Oracle, IBM DB2, Microsoft SQL, Server, Microsoft Access, PostreSQL, MySQL, and SQLite
A way to classify databases involves the type of their contents (ex. Bibliographic, document-text) or by their application area (ex. Accounting, banking)
Relational model – applications should search for data by content, rather than by following links
Relational database systems are the current dominant system
General purpose DBMS aim to satisfy as many applications as possible, but they are not always the best solution
Major database usage requirements

Functional requirements
Defining the structure of the data
Manipulating the data
Protecting the data
Describing processes that use the data
Operational requirements
Availability
Performance
Isolation between users
Recovery from failure and disaster
Backup and restore
Data independence

Current data models

Relational model
Entity-relationship model
Object model
object-relational model
XML as a database model

Database design is done before building it to meet the needs of end-users within a given application tat the database is intended to support

Wikipedia Entry: “Entity-Relationship Model”

The entity-relationship model is an abstract way to describe a database
Starts in a relational database with data stored in tables
Data in the tables point to data in other tables
Two levels of the ER Model

Conceptual data model
Logical data model

An entity can be defined as a thing which is recognized as being capable of an independent existence and which can be uniquely identified

Phlonx, " Database Normalization Process"

Database normalization relies on three forms; no repetition of elements or groups of elements, no partial dependencies on a concatenated key, and no dependencies on non-key attributes.

Wednesday, September 19, 2012

Muddiest Point - Week of September 17, 2012

I have heard that DNG is one of the best formats to use for photos since it results in almost no data loss. If this is true, why is the TIFF format still preferred for archival use?

Thursday, September 13, 2012

Reading Notes - Week of September 17, 2012

"Data Compression" from Wikipedia

Data compression involves encoding information using fewer bits that the original representation
Two types of compression

Lossless

Reduces bits by identifying and eliminating statistical redundancy

Lossy

Reduces bits by identifying marginally important information and removing it

Formally known as source-coding
Helps reduce resource usage (ex. Data storage space, transmission capacity)
Theoretical background

Lossless – Information Theory
Lossy – rate-distortion theory

"Data Compression Basics"

Part 1: Lossless Data Compression

Fundamental idea behind data compression is to take a given representation of information and replace it with a different representation that takes up less space, from which the original data can later be recovered
If the recovered information is guaranteed to be exactly identical to the original, then the compression method is described as “lossless”
A simple lossless compression algorithm is “run-length encoding” (RLE)

Replaces long runs of characters with a single character and the length of the run

Lempel-Ziv compressor family
Entropy coding

Assigns codes to blocks of data in a way that the length of the code is inversely proportional to the statistical probability of the block of data

Prediction and error coding

Part 2: Lossy Compression of Stills and Audio

important to distinguish data from information
fundamental idea behind lossy compression is preserving meaning rather than preserving data
by allowing for some deviation from the source data when encoding patterns, lossy compression greatly reduces the amount of data required to describe the “meaning” of the source media
lossy compression is ideally applied to information that is meant to be interpreted by a reasonably sophisticated “meaning processor”(human, image recognition software, etc.) that looks at a representation or rendering of the data rather than the data itself

Edward A. Galloway. “Imaging Pittsburgh: Creating a Shared Gateway to Digital Image Collections of the Pittsburgh Region.”

The main focus of the project was to create a single Web gateway for the public to access thousands of visual images held in the collections of the Pitt Archives Service Center, CMOA, and the Historical Society of Western PA
The content partners were responsible for selections of collections/images, describing/cataloging images, digitization, and delivering images/metadata to DRL
DRL was responsible for providing access to the image collections via DLXS middleware
Characteristics of the Web gateway
Conduct keyword searches across all image collections
Browse images
Read about the collections and their contents
Explore images by time/place/theme
Order image reproductions
Communication challenges
Selection challenges
Metadata challenges
Project-wide vs local needs
Workflow challenges
Website development challenges

I was not able to access "Youtube and Libraries: It Could be a Beautiful Relationship" by Paula L. Webb.