tuesday: workshops

All workshops will be held on Tuesday, June 3, in the John P. Robarts Library, University of Toronto, 130 St. George St.

time ID session topic location
9:00-12:00 W1 Data Management & Curation: Lessons from Government, Academia, and Research
moderators: Michele Hayslett, Stefan Kramer
presenters: see below
Robert H. Blackburn Room: Robarts Library,
4th floor, Room 4035
W2 Teaching an introductory workshop in digital preservation
presenters: Laurence Horton & Alexia Katsanidou
Reference Conference Room: Robarts Library,
4th floor, Room 4022
W3 Data Visualization and R
presenter: Ryan Womack
Map & Data Library Lab: Robarts Library,
5th floor, Room 5053
W4 openICPSR: Why, When, What and How?
presenters: Amy Pienta & Jared Lyle
Teaching Studio: Robarts Library,
4th floor, Room 4034
W5 Using open source tools for creation and sharing of DDI-Lifecycle metadata
presenters: Jannik Jensen, Anne Sofie Fink, Martin Jensby
Electronic Classroom: Robarts Library,
4th floor, Room 4033
12:30-13:30 lunch (on your own)
13:30-16:30 W6 Advanced SDA Usage for Data Librarians
presenters: Tom Piazza, Jon Stiles
Electronic Classroom: Robarts Library,
4th floor, Room 4033
W7 Reshaping Data with R
presenter: Christine Murray
Map & Data Library Lab: Robarts Library,
5th floor, Room 5053
W8 Introduction to Terra Populus: Integrated Data on Population and Environment
presenter: Tracy Kugler
Reference Conference Room: Robarts Library,
4th floor, Room 4022
(Please note: participants must bring their own laptops.)
W9 Introduction to QGIS
presenter: Nicole O. Scholtz
Teaching Studio: Robarts Library,
4th floor, Room 4034

Back to top

W1: Data Management & Curation: Lessons from Government, Academia, and Research

  • time: 9:00-12:00
  • location: Robert H. Blackburn Room: Robarts Library, 4th floor, Room 4035
  • moderators: Michele Hayslett / University of North Carolina at Chapel Hill; Stefan Kramer / American University
  • presenters:
    • Dan Gillman / U.S. Bureau of Labor Statistics
    • Marcel Hebing / DIW Berlin
    • Chuck Humphrey / University of Alberta
    • Steven McEachern / Australian Data Archive
    • Barry Radler / Institute on Aging, University of Wisconsin-Madison
    • Robin Rice / EDINA and Data Library at the University of Edinburgh
    • Kathleen Shearer / Confederation of Open Access Repositories and Research Data Canada
  • presentation: 2014_Workshops_W1_Hayslett.zip

abstract: The management, publication, and preservation of datasets have become issues of increasing importance for universities, research institutions, and government agencies. While the reasons and mandates for these activities, and the kinds of datasets collected, differ among these types of institutions, other aspects of data management throughout the research lifecycle concern all of them, including (but not limited to): the discoverability of their data; the choice of metadata standard(s) and the creation of metadata; providing visualization and interaction with data; selection and migration of data formats for long-term preservation; policy development; and storage requirements. Yet, these types of institutions tend to follow different paths in data management and curation, choose different infrastructures, metadata standards and platforms. Are these different approaches inevitably rooted in the differences between these types of organizations and their missions and culture? Or are there lessons they could learn from each other to improve their own practice? The purpose of this symposium is to explore that question. We will have presentations first, then form breakout groups along the lines of different aspects such as platform choice, policy developments, metadata creation. At the end, all will come back together to share the results of their discussion.

Back to top

W2: Teaching an introductory workshop in digital preservation

  • time: 9:00-12:00
  • location: Reference Conference Room: Robarts Library, 4th floor, Room 4022
  • presenters: Laurence Horton / The London School of Economics and Political Science; Alexia Katsanidou / GESIS – Leibniz Institute for the Social Sciences
  • presentation: 2014_Workshops_W2_Horton.zip

abstract: GESIS’s Archive and Data Management Training Center provides introductory level two-day training events in “First steps towards digital preservation”. This workshop is an overview of our training, introducing participants to the design and intended target audience of our events and showcases our digital preservation support and training.

Adopting a “train the trainers” approach, the workshop addresses those interested in conducting organizational level training. The workshop addresses archivists, librarians, repository or research data center staff, and anyone responsible for planning curation and preservation of digital assets independent of disciplinary background. Intended as a primer, the workshop requires no previous experience, introducing participants to the “organizational dimension” of digital preservation.

Participants have the chance to try our training materials and exercises on:

  • What is digital preservation and why do we need it?
  • Introduction to the OAIS Reference Model
  • Preserving information for a designated community
  • Acquisition policies and selection criteria
  • Sustainable digital preservation and cost models
  • Licensing for preservation and re-use
  • Trusted digital repositories

Benefits:

learning the conceptualization and structure of introductory workshops on digital preservation
familiarity with content of the workshop and an overview of workshop materials and exercises
use of the materials to design their own training workshops.

Back to top

W3: Data Visualization and R

  • time: 9:00-12:00
  • location: Map & Data Library Lab: Robarts Library, 5th floor, Room 5053
  • presenter: Ryan Womack / Rutgers University Libraries
  • presentation: 2014_Workshops_W3_Womack.pdf

abstract: This workshop will focus on principles and techniques for the visualization of data, with an equal emphasis on theory and implementation. Drawing on classic works by Cleveland (Visualizing Data), Tufte (The Visual Display of Quantitative Information), and Wilkinson (The Grammar of Graphics), a range of best practices for visualization will be illustrated. Recently developed techniques for large-scale, 3D, and interactive visualization will also be discussed. This discussion will be based on works such as Graphics of Large Datasets: Visualizing a Million (Unwin, Theus, and Hofmann), the Handbook of Data Visualization (Chen, Hardle, and Unwin), and Trends in Interactive Visualization: A State of the Art Survey (Liere, Adriaansen and Zudilova-Seinstra) For each of these approaches, methods for creating similar graphics in the R open-source statistical language will be demonstrated, using packages such as ggplot2, lattice, and rggobi. Interactive visualization packages such as shiny and healthvis will also be explored. Prior familiarity with R is helpful but not required.

Back to top

W4: openICPSR: Why, When, What and How?

  • time: 9:00-11:00
  • location: Teaching Studio: Robarts Library, 4th floor, Room 4034
  • presenters: Amy Pienta & Jared Lyle / ICPSR

abstract: Agencies such as NSF and NIH require data management plans as part of research proposals and the Office of Science and Technology Policy (OSTP) is requiring federal agencies to develop plans to increase public access to results of federally funded scientific research. ICPSR has been sharing and archiving social and behavioral research data for over 50 years utilizing several types of data sharing models. This session will provide a closer examination of three data management models in use at ICPSR including:

  • Fee for access model – pooled funding for data curation and preservation for access by the pooling members
  • Agency-funded model – agency or foundation funded model providing free public access
  • Fee for deposit model – fee for deposit of data to provide free public access

Finally, this session will provide a hands on demonstration of openICPSR — research data-sharing service. openICPSR data are: widely and immediately accessible at no cost to data users; safely stored by a trusted repository dedicated to long-term data stewardship; and protected against confidentiality and privacy concerns. This session will demonstrate this self-deposit system and discuss how researchers and institutions can take advantage of this new means of archiving data to comply with federal data sharing and preservation standards.

Back to top

W5: Using open source tools for creation and sharing of DDI-Lifecycle metadata

  • time: 9:00-12:00
  • location: Electronic Classroom: Robarts Library, 4th floor, Room 4033
  • presenters: Jannik Jensen, Anne Sofie Fink, & Martin Jensby / Danish National Archives

abstract: The workshop introduces the participants to the range of support offered by the DdiEditor complemented by an indexing platform for creation and sharing of DDI-Lifecycle metadata. Following the introduction there will be a hands on session inviting the participants to work with their own data sets using the DdiEditor and the Indexing platform.

The DdiEditor supports curation of data sets accommodating data managers, archivists and librarians with functionalities for:

  • Import of data, question text and response categories
  • Merge of metadata into a DDI-L document.
  • Create, update and delete: Variable, questions, concepts, codes, categories, filters, universe and instrument
  • Description of questions flow and filtering (instrumentation)

The Indexing platform supplies metadata for elaborate search services and further www- indexing. The platform offers:

  • Metadata exposed as interactive landing pages with index optimization for Google via schema.org
  • Production of user friendly codebooks (incl. graphics for frequencies and links to explore)
  • DDI URN resolution
  • API access to search functionality for external search engines and/or portals in XML or JSON

A search service will be set up especially for the workshop allowing the participants to view their data sets as landing pages, codebooks and to perform searches on all metadata uploaded during the workshop.

Back to top

W6: Advanced SDA Usage for Data Librarians

abstract: Data librarians are often called on to help users generate customized summaries of variables contained in large public datasets. Major data archives such as IPUMS and ICPSR now make many such datasets available for online analysis in SDA. Although users can easily generate simple tables using SDA, more complex analyses often require the use of other analytic procedures and the recoding of variables into more usable categories.

The purpose of this workshop is to provide data librarians with a greater facility in using the SDA programs for recoding variables and the generation of new variables, in order to be able to produce the customized summaries that are often requested. The generation of subsets of data for input into other analysis systems (like Stata, SAS, and SPSS) will also be covered. Workshop participants will practice using those procedures by making use of the U.S. Census data and some international datasets available in the IPUMS archive.

Some basic familiarity with SDA will be presumed. However, no special expertise in SDA is required.

Back to top

W7: Reshaping Data with R

  • time: 13:30-15:30
  • location: Map & Data Library Lab: Robarts Library, 5th floor, Room 5-053
  • presenter: Christine Murray / University of Pennsylvania

abstract: As researchers draw data from many sources to analyze with a variety of tools, these data may need substantial manipulation, particularly if the data are wide (all variables in one row per subject) when it needs to be long (separate rows per observation), and vice versa. This workshop focuses on converting data between wide and long formats using R, open source software for statistical computing. Participants will learn to differentiate between the two formats, understand the advantages and uses of both, and practice transposing datasets between them using the “reshape” package for R and other useful commands. Along the way, participants will gain a basic familiarity with data manipulation in R, as well as how to document their work in a reproducible way. The goal is to enable attendees to advise users on the best format for the task, and help them reformat as needed.

Prior familiarity with R is helpful but not required.

Back to top

W8: Introduction to Terra Populus: Integrated Data on Population and Environment

  • time: 13:30-16:30
  • location: Reference Conference Room: Robarts Library, 4th floor, Room 4022
  • presenter: Tracy Kugler / TerraPop
  • presentation: 2014_Workshops_W8_Kugler.zip

abstract: In this half-day workshop, Tracy Kugler will demonstrate the new Terra Populus data access system. Building on the MPC’s past experience with demographic data infrastructure projects such as IPUMS and NHGIS, Terra Populus seeks to lower the barriers for conducting interdisciplinary human-environment research by making data from different domains easily interoperable. It incorporates a variety of data types, including census microdata, census summary data, and raster data describing land cover, land use, and climate. The data access system allows users to create customized data extracts blending variables from all data types and providing the output in the user’s preferred format. In this workshop, attendees will learn about the content and data processing capabilities of Terra Populus and learn how to obtain and use the data. Attendees will create extracts, download data over the internet, and analyze it in a statistical, spreadsheet, or GIS software package. (Please note: participants must bring their own laptops.)

Back to top

W9: Introduction to QGIS

  • time: 13:30-16:30
  • location: Teaching Studio: Robarts Library, 4th floor, Room 4034
  • presenter: Nicole O. Scholtz / University of Michigan
  • presentation: 2014_Workshops_W9_Scholtz.pdf

abstract: Open source geographic information systems (GIS) tools are maturing to the point of being viable for widespread use. QGIS is a desktop GIS system that is cross-platform, free, extensible and interoperates well with other GIS tools. Data services that support desktop GIS software packages are increasingly supporting QGIS, and ideas for promoting open source desktop GIS use and support will be discussed.

In this workshop, participants will gain hands on experience with QGIS. Exercises will include working with vector and raster data, doing very basic spatial analysis, and producing maps. Participants will leave with resources for learning more QGIS. No previous GIS experience is required.