2020 Summit Program

2020 RDAP Summit Program

Summit Registration

Summit Home

Wednesday, 11 March 2020

The RDAP 2020 Summit will be held at the Santa Fe Convention Center in Santa Fe, NM.

  • 8:30-9 a.m. – Breakfast/registration
  • 9-9:15 a.m. – Welcome
  • 9:15-10 a.m. – Opening keynote
  • 10 a.m.-11 a.m. – Panel 1 – Partnerships
    • Connecting people and data via a collaborative data repository
      • This presentation will present an overview of a project to build an online, open-access, searchable, central repository for atmospheric chemistry chamber data. This project, called ICARUS (Index of Chamber Atmospheric Research in the United States), is a NSF-funded partnership between nine organizations to build and populate a data repository to support atmospheric chemistry research. The University of California, Davis, is the lead institution and is responsible for coordinating the partners and establishing the guidelines for homogenizing the data via a community standard. Atmospheric chamber data are routinely used as empirical inputs and constraints in atmospheric models, including those used to study global climate change. The ICARUS data is highly complex in terms of the data types, the relationships between components of the data and metadata, and the broad range of contributors. As part of the grant structure, at the end of the 3-year project, the database will be transitioned to the National Center for Atmospheric Research (NCAR) for long term curation, preservation, and access. This presentation will discuss our work at NCAR to effectively ingest, manage, and deliver these complex data. The focus of the presentation will be on lessons learned related to: 1) the coordination work necessary to couple the scientific and data management teams, 2) the organizational work necessary to fit the ICARUS data within our established metadata frameworks, and 3) and technical work necessary to structure the ICARUS data within our data repository infrastructure.
      • Matthew Mayernik, National Center for Atmospheric Research; Tran Nguyen, University of California, Davis; Eric Nienhouse and Steve Worley, National Center for Atmospheric Research
    • The Digital Object LifeCycle Project (DOLCe): Connecting libraries and advanced computing
      • We will present the Digital Object LifeCycle (DOLCe) project, an extension of the ongoing partnership between the University of Texas Libraries (UTL) and the Texas Advanced Computing Center (TACC) (http://guides.lib.utexas.edu/research-data-services). UTL and TACC support university researchers in working with data on their own desktops and beyond. UTL promotes FAIR guiding principles (https://www.force11.org/group/fairgroup/fairprinciples) in scientific data publication and stewards research data within a consortial data repository (https://data.tdl.org). Data literacy and inclusion are central to the workshops and consultation services UTL provides for the campus community, across disciplines and skill levels (http://guides.lib.utexas.edu/data-and-donuts). TACC is investing in lowering the barrier for access to high performance computing (HPC) resources for researchers whose disciplinary training doesn’t include advanced computation. TACC offers introductory training during UTL workshops, and is building online portals with graphical interfaces that allow traditionally underserved communities of researchers to easily access HPC systems. The DOLCe project aims to connect TACC systems and portal software with the data repository that UTL manages. This will create a pipeline for researchers using TACC to move data from processing and analysis to publication, without requiring sophisticated technical expertise. An ongoing research project within Planet Texas 2050 (https://bridgingbarriers.utexas.edu/planet-texas-2050/) exemplifies our collaboration, and is serving as a use case to help build the publication pipeline between UTL and TACC systems. This project was born out of a UTL workshop introducing GIS. An attendee reached out to UTL to co-investigate the use of GIS and TACC resources in support of her project. Together we have helped her use QGIS software within a TACC portal to analyze data, and the DOLCe project will use her dataset as a test case for publishing through the TDR.
      • Jessica Trelogan, University of Texas at Austin, and Anna Dabrowski, Texas Advanced Computing Center
    • An interdisciplinary approach to research data and computation: stories from three institutions
      • Academic libraries have historically served as a hub for access to technology. Libraries commonly house an internal information technology (IT) team to serve unique needs in acquisition, resource management, and discovery; but these internal teams are not designed to address the technical needs of researchers. As a result, there is often a separation between services available through the library and services provided by central or research IT organizations. As research needs increasingly grow into heavy data and computational spaces, these two campus communities have an opportunity to collaborate on holistic services for students and researchers. Collaboration between libraries and IT teams combines expertise in reference interviews, workflow documentation, and researcher collaboration with expertise in cyberinfrastructure and data security. Among other benefits, this provides more opportunities for training and consulting, better enabling researchers in the discovery, management, and security of data generation, publication, and use. Such benefits are amplified as these collaborations are integrated into academic organizational models to build effective teams, services, and policy through shared governance. This panel brings together a variety of perspectives about how libraries and IT organizations are and can collaborate on data services. We will present a series of case studies from Arizona State University, the University of California Berkeley, and the University of Colorado Boulder whose libraries and IT departments are collaborating in innovative ways. The panel will feature ongoing efforts around how Libraries and Research Computing groups can work together to provide infrastructure and best practices to support the full research lifecycle.
      • Anna Sackmann Amy Neeser, University of California, Berkeley; Jonathan Anderson, University of Colorado, Boulder, Matthew Harp and Jeremy Kurtz, Arizona State University
  • 11-11:15 a.m. – Break
  • 11:15 a.m.-12:15 p.m. – Posters
  • 12:15 p.m.-1:30 – Lunch
  • 1:30-2:30 p.m. – Panel 2 – Data Visualization
    • Connecting Non-scientists with Modern Data Techniques through the Data Visualization Lab at Xavier University of Louisiana*
      • The Data Visualization Lab in the University Library at Xavier University of Louisiana strives to introduce students and faculty to data science and visualization techniques, emphasizing current best practices and modern software. We launched in Fall of 2019, and have since established a program offering weekly workshop series, consultations, and open hours staffed by trained students. Xavier University of Louisiana is a primarily undergraduate liberal arts HBCU in New Orleans, LA and has the distinctions of being the only catholic and historically black university, as well as the undergraduate institution producing the most African American doctors. While the liberal arts, natural sciences, and pre-medical programs flourish at Xavier, Data and Computer Science departments are comparatively small. Both students and faculty report an interest in these fields, yet often report lacking the understanding of data science or general computer literacy to apply these skills in their work. The programming we offer bridges this gap between liberal arts and data: our workshops include coding (Python and R), spreadsheets, Tableau, databases and GIS, all targeting individuals with no prior experience. We encourage students and faculty to think like data scientists, consider data management best practices, and recognize the potential of modern software. After attending these workshops, students and faculty often begin to use our other services for their individual needs. This talk will focus on our successes in bridging the connection between Xavier’s experienced data scientists with students and faculty across the university. Beyond the services described, we also support Xavier’s new Digital Humanities minor, mentoring students and designing workshops for their needs. By focusing on a holistic view of data science – management, structuring, cleaning, analysis and visualization – we strive to debunk the myth that data-driven research can only be accomplished by career professionals
      • Alex Saltzman, Xavier University of Louisiana
    • Connecting Communities through a Passion for Data Visualization in Libraries: The Visualizing the Future Symposia
      • The Visualizing the Future (VTF) Symposia is an IMLS National Forum Grant-funded community of praxis focused on envisioning the future of data visualization services and instruction, made up of librarians and information professionals from institutions across the United States. This group aims to advance data visualization instruction and move beyond hands-on, technology-based tutorials toward a nuanced, critical understanding of visualization as a research product and form of expression. Before meeting at the first VTF Symposium in August 2019, fellows started individual projects covering a range of topics, which critically engage the practice of data visualization. These topics include how data visualization can be used to tell stories, activate empathy, and drive social change, the ethical obligations for creators of data visualizations, and the best way to impart these best practices to future generations, among others. At the August 2019 Symposium, VTF fellows discussed individual project outcomes in the context of the group’s mission and set goals for deliverables to be created by the symposia to advance data visualization services and instruction. These deliverables include a repository of example data visualizations, datasets, and other materials for use in developing instruction, resources to help new librarians get started with data visualization services, and a manifesto summarizing the role of libraries in data visualization and connecting themes of fellows’ individual projects. In this panel, members of the VTF community will discuss: 
        • The individual projects that fellows conducted leading up to the symposia and how they relate to the mission of the group and serve the data visualization community as a whole
        • The role of librarians and libraries in data visualization instruction
        • The role of data visualization and instruction in our communities and in communication of research and data to the general public
        • Current research and upcoming deliverables of the VTF Symposia
      • Jo Klein, University of North Carolina at Greensboro; Tess Grynoch, University of Massachusetts Medical School; and Alisa Rode, Bernard College
  • 2:30-2:45 p.m. – Break
  • 2:45-3:45 p.m. – Lightning Talks
    • Dataset Search: a lightweight, community-built tool to support research data discovery at small and mid-sized institutions
      • The resources required for access, storage, and preservation of research data can be overwhelming at small and mid-sized institutions. This lightning talk reports on a suggested alternative. With funding from Institute of Museum and Library Services (IMLS), our team built a prototype for a lightweight, open source Dataset Search that promotes discovery of research datasets that are hosted by third-party data repositories. Our Dataset Search complements Google Dataset Search, SHARE, DataMed, NYU Data Catalog, and other research data indexes, adding to the conversation a three-pronged focus. First, our tool promotes discovery for institution-specific research datasets, and includes analytics dashboards that allow institutions to showcase research data as a scholarly product and a driver of institutional reputation. Second, Dataset Search provides enhanced descriptive metadata for individual datasets, produced through topic mining of scholarly profiles like ORCID. Third, Dataset Search metadata is optimized for discovery by commercial search engines. The Phase 1 prototype code will be published on Github in December 2019, including automated setup and instructions for adjusting the tool for local implementation. For Phase 2, starting in January 2020, we will join the Data Discovery Collaboration Project (formerly Data Catalog Collaboration Project) to support community partnerships. We will also partner with a local research center focused on Indigineous and rural health to pilot a feature that will allow the center’s partners to discover restricted datasets relating to their communities. Dataset Search is a contribution to community-driven, community-owned infrastructure for discovery of academic institutional research data.
      • Sara Mannheimer, Jason A. Clark, James Espeland, Jakob Schultz, and Kyle Hagerman, Montana State University
    • Open Data Toolkit for Diversity, Equity, Inclusion and Access
      • Although there has been a recent surge in the visibility of conversation, projects, and emerging resources concerning data and ethics broadly defined, many scholars are still undersupported in making decisions regarding their research data that best exemplify diversity, equity, inclusion and access considerations while also prioritizing making data “as open as possible, as closed as necessary.”  In partnership with the National Center for Institutional Diversity, our team at the U-M Library is conducting an exploratory mixed-methods study examining current practices and challenges on this topic for diversity scholars. (Diversity scholars, broadly, use their scholarship to further our understanding of various social issues related to topics including identity, culture, power, and inequality). The end goal of this project is the production and distribution of a toolkit of existing and new resources that will meet needs identified through the study. (The form and content of this toolkit will be determined by the results of our research).  In this project, we have a unique opportunity to combine the Library’s research data management expertise with NCID’s Diversity Scholars Network (comprised of over 1,000 scholars). Hearing from this group of researchers will allow us to better understand inter-disciplinary practices and needs among researchers who are already explicitly engaged in thinking about the effect their work has on society, and who may or may not be considering their data and data sharing within this framework. Through this project we hope to further the conversation on approaches to data management ethics that transcends regulatory compliance; help distribute and make visible applied knowledge and existing resources (such as already-developed protocols coming out of the indigenous data sovereignty movement); and create an end product grounded in researcher needs. 
      • Rachel Woodbrook, Laura Sanchez-Parkinson, Karen Downing, Jake Carlson, Chanese Forte, and Elyse Thulin, University of Michigan
    • Digging into Data: Quantifying curation quality at two institutional repositories
      • The institutional repositories at Colorado State University and the University of Cincinnati have been accepting data since 2015 and 2014 respectively. We wanted to assess the quality of data curation over time at both institutions and began by replicating Koshoffer et al.’s comparison of metadata completeness for datasets at four institutions (2018). However, we wanted to go beyond a simple replication and glean as much information from the data as we could. We have been constructing a more comprehensive database about the datasets including data file formats, researcher discipline, date of deposit, and number of past submissions by the depositor. We are also in the process of coding for Transparency and Actionability (Van Tuyl & Whitmire, 2016). We will report on the following research questions: 1) What factors are related to the quality of data curation? 2) Are these factors the same at both institutions? 3) Are predictors of quality consistent over time? UC has an unmediated self-submission repository, but CSU mediates its data submissions, so we expect to see an effect of the mediation process. We will also discuss the methodological challenges of answering these questions: What tools and expertise were needed to obtain analyzable data? How can we strengthen connections between data librarians and the tech community supporting platforms like Samvera and DSpace? References:  Koshoffer, A, Neeser, AE, Newman, L, Johnston, LR (2018) Giving Datasets Context: A Comparison Study of Institutional Repositories that Apply Varying Degrees of Curation. International Journal of Digital Curation, 13(1), 15-34.https://doi.org/10.2218/ijdc.v13i1.632   Van Tuyl, S, Whitmire, AL (2016) Water, Water, Everywhere: Defining and Assessing Data Sharing in Academia. PLoS ONE 11(2): e0147942. https://doi.org/10.1371/journal.pone.0147942
      • Mara Sedlins and Helen Baer, Colorado State University; and Amy Koshoffer, University of Cinncinnati
    • Building Institutional and Individual Capacity around Ethical Management of Research Data at the University of Calgary
      • In April 2020, the University of Calgary will host a research data management (RDM) conference to build capacity around both practical data management skills, and principles of ethical data management. The conference is a collaboration between the University of Calgary’s Office of the Vice-President (Research), Research Services (including the Research Ethics Board) and Libraries and Cultural Resources. Programming will have two stages: 1. a series of skills workshops facilitated by campus experts on topics such as writing a data management plan, secure computing services for RDM, tools for working with data, and special considerations when working with Indigenous populations; 2. a one-day symposium focused on ethical data management. The symposium will feature presentations from several Canadian scholars and leading experts on the topics of ethics and equity of responsible data management; data management and ethics in the scholarship of teaching and learning; and ethical research data management in the context of research by and with Indigenous Peoples, including First Nations, Inuit and Métis communities.  The impetus for this conference has been the expected release of an RDM policy by Canada’s Tri Agencies: the Canadian Institutes of Health Research (CIHR); the Natural Science and Engineering Research Council (NSERC) and the Social Sciences and Humanities Research Council (SSHRC). This conference, funded in part by a SSHRC Connections Grant, allows us to develop partnerships among different university units that all have some responsibility for RDM, while also building the capacity of researchers to manage data responsibly and ethically. In keeping with the University of Calgary’s Indigenous Strategy, ii’ taa’poh’to’p, and with the principles of Ownership, Control, Access, and Possession (OCAP) for data collection with First Nations populations, we have made ethical management of research data involving Indigenous communities a cornerstone of our program.
      • Heather Ganshorn and Penny Pexman, University of Calgary
    • Developing a model for DOI Services
      • Data Services at Johns Hopkins University maintains the institutional membership in DataCite and generates Digital Object Identifiers (DOIs) for all deposits to the JHU Data Archive. In response to increasing demand from campus researchers, Data Services developed and implemented a cost-recovery model for DOI Services outside of the archive. The lighting talk will present the process for developing the service model, relate status of implementation, and describe plans for assessing the new service.
      • Mara Blake, John Hopkins University
    • Who Pays for ELNs on Campus?
      • At some institutions ELNs are paid for by the IT unit. At other institutions it is the office of the VP for research. Generally, the library takes on the administrative duties. So, what happens when there is a turnover of the personnel in the office of the VP for research, the new site license agreement is coming due and the VP is not keen on funding the ELN? I’ll discuss the options in five minutes or less.
      • Daureen Nesdill, University of Utah
    •  How can we do better? Digital information organizations and approaches to equity, diversity, and inclusion work
      • The Digital Scholarship Section (DSS) of the Association of College and Research Libraries (ACRL), created in 2017, is interested in embedding inclusion and diversity in the professional definition, organizational structures, and ways of working in Digital Scholarship, including research data management, from the outset. The DSS is committed to supporting community building through inclusive action and regularly seeks out ways to advocate for anti-oppressive practices through ongoing dialogue and trainings. As a starting point, the DSS created an Equity, Diversity, and Inclusion (EDI) committee dedicated to developing EDI-informed processes for all section members. As its first significant intervention, the EDI committee crafted a Community Agreement to help establish the values under which the section wants to operate and to give members an opportunity to share their experiences. The Community Agreement articulates specific and inclusive ways of engaging with each other, provides mechanisms for members to provide feedback (both about the community agreement and about the organization), articulates how feedback will be handled, lists possible consequences for violating our agreement, and encourages members to engage in allyship and be active bystanders. The committee was inspired by many others who have done this work, including Moving Beyond Race 101 (ACRL 2019), RDAP, DLF, Impact Seat and others. This presentation will highlight the aims and purpose of the project, the process taken to create the document, the challenges encountered, and subsequent EDI efforts on the horizon for the committee. The presenters hope to engage others in conversation about their experiences with codes of conduct and organizational approaches to building inclusivity and equity. Additionally, presenters hope this presentation, a kind of roadshow on the section’s efforts, will inspire other individuals and organizations, including RDAP, to do the vital work of creating spaces where everyone feels included and heard.
      • Heather James, Marquette University; Teresa Schultz, University of Nevada, Reno; Pamella Lach, San Diego State University; Kristen Mapes, University of Michigan; Arianne Harsell-Gundy, Duke University; Jennifer Hootman, University of Kentucky; Talea Anderson, Washington State University; Amy Gay, Binghamton University; Stephanie Pierce, University of Arkansas; and Elisandro Cabada, University of Illinois at Urbana-Champaign.
    • Communication toolkit for data sharing: facilitation for a conflict free sharing experience
      • In large projects agreeing on how to share data is not simple. Different groups of researchers generate different datasets. Often, these datasets need to be shared among the members of the larger project, especially during the synthesis part of the project at the very end. This data sharing and interdisciplinary research can actually be one of the most valuable contributions of large projects.  Conflict can arise when different researchers have different expectations about the terms of the data sharing. Who owns the data, when to share it, how to modify it, what scholarly outcomes are expected, how to publish the data, and how to give credit are some of the most common problems. To alleviate this problem we are working on a “Communication toolkit for data sharing”. The goal is to create content (slides, agreement template, and documentation) that research groups can use to clarify their expectations regarding data sharing at the beginning of the project. The work would happen in a workshop setting, facilitated by a person (who could be a data specialist) that would be in charge of running the meeting, ensuring that everybody’s voices are heard, and that the needs of all (including researchers at different points in their careers and students) are taken into account.  We are making available the template of our “Data Sharing Plan” document. This template will be a central part of the Toolkit, and can be used on its own as a list of items to consider when discussing internal data sharing. It is structured in 4 sections: project data management, roles and responsibilities,  acknowledgement of data use, workflows.
      • Clara Llebot, Oregon State University
    • Collections as/are Data[sets]: Connecting Research Data Management to Digital Collections
      • Research data management work and digital collections work follow similar workflows, principles, and practices, but are often situated outside of each other within cultural heritage organizations.    And while “Collections as Data” and related initiatives have fostered opening digital collections data for computational analysis, data management standards and practices are not always considered in these settings. What if we were intentional about connecting digital collections work and data management practices? In this session, we’ll consider why this union makes sense, how to apply emerging metadata standards like the Schema.org Dataset type and the RO-Crate (Research Object Crate) initiative to digital collections, and the benefits to organizations in finding ways to connect this work. We’ll focus the discussion by grounding our work in two implementations: An API for a digital collection encoding cultural heritage objects as components in a dataset and a Progressive Web App using a structured datastore based on dataset encoding principles to power the search and browse of items. With these case studies as a guide, our goal will be to demonstrate why data management principles and practices complement digital library work and how digital library principles can inform research data management.
      • Jason Clark, Montana State University
  • 3:45-4:45 p.m. – Panel 3 – Consortia
    • Creating Consortiums: NYDCLC, NESCLIC, NMCIT, GPN – library carpentries, data literacy, data skills
      • Libraries everywhere recognize the need for skilling up our teams with data tools. This is due to increased use of data tools in librarianship as well as to the evolution of research and scholarship toward data driven processes; yet providing those skills at an institutional level can be challenging. Our panel represents several consortiums that address this need at a community level, providing professional development and a community of practice in our field and in service to our patrons. Panelists represent regional library partnerships pursuing a Carpentries-focused solution. The Carpentries’ mission, “to build global capacity in essential data and computational skills for conducting efficient, open, and reproducible research” aligns with our needs for capacity building, and their pedagogical model supports a train-the-trainer approach for scaling. The New England Software Carpentry Library Consortium (NESCLiC) and New York Data Carpentries Library Consortium (NYDCLC) bring together libraries to share  membership with the Carpentries, allowing the regions to offer low cost workshops on various data skills. The New Mexico Cyberinfrastructure Training (NMCIT) program leverages Carpentry workshops and instructor trainings to support a statewide human infrastructure around data science skills and workforce development. The Great Plains Network (GPN) is both an internet and people connector. While there are readily available network solutions, creating human relationships is just as important, and the GPN uses Carpentries to do that. These presenters represent library consortiums in different stages of the implementation process: getting started, growing, and fully enacted. In this panel discussion, representatives from each regional partnership will present on a key part of the process, addressing both challenges and successes, providing practical insights into cooperative skills-building trainings. The panel will leave time for discussion of questions from the audience and facilitator.
      • Adrienne Canino, University of Rochester; Julie Goldman, Harvard University; Wendy Kozlowski, Cornell University; Jonathan Wheeler, University of New Mexico; and Kate Adams, Great Plains Network

Thursday, 12 March 2020

  • 8:30-9 a.m. – Breakfast
  • 9-9:05 a.m. – Opening announcements
  • 9:05-10:35 a.m. – RDAP business meeting
  • 10:35-11 a.m. – Break
  • 10:50-11:50 a.m. – Panel 4 – Data Connections 
    • We’re All in This Together: Building a Network of Data Support Professionals in the Classroom
      • In order to meet the increasing requirements for FAIR research data from legislators, funders, and universities, the Swedish National Data Service (SND) initiated a national research data system in 2018. Central to this system is that researchers have access to trained research data support staff. To ensure such access, universities have been encouraged to establish Data Access Units (DAUs) and SND has created a national network through which DAU staff can draw on common experience, exchange ideas and training material, and access SND’s domain specialists and RDM advisors. At the heart of creating the DAU network is a professional training course, “Research Data: Access, Management, and Collaboration”, developed by SND in collaboration with the Swedish School of Library and Information Science at the University of Borås. The course not only provides participants with a shared frame of reference regarding DMPs, FAIRness, and research legislation, but also aims to strengthen the DAU network by reinforcing collegial ties and building a national community of data professionals, including SND staff. The power inequality and instructor/student hierarchy inherent in traditional classroom lectures were considered counter-productive for these aims, however. This presentation will demonstrate how learning space and pedagogical approach can be used to challenge classroom hierarchies and promote community-building. Educational research has shown that Active Learning Classrooms and cooperative learning change the instructor’s role from a “sage on a stage” to a “guide on the side”, turn participants into knowledgeable subjects that actively contribute to the learning process, and have positive effects on social cohesion and learning outcomes. Course evaluations from the three times that the “Research Data” course has run will be used to provide the participants’ opinions of the course design and its social effects. Results are of value for anyone who wants to build a professional community through teaching.
      • Stefan Ekman, Swedish National Data Service
    • The Data Disconnect: How Changing Industry Data Sharing Policies Impact Business Research and Pedagogy
      • Business represents the most popular undergraduate major in the United States and is a field that heavily relies on data for both research and instruction. This reliance makes business an interesting case study in how data access informs research practice and pedagogy. Though possessing uniquely close ties to industry, there is growing concern among business faculty over increasingly restrictive data sharing policies in the private sector. In this case, the commodification and changing profit models of commercial data is straining connections, rather than facilitating them. This negatively impacts how both faculty and students conduct research and, in the latter case, learn fundamental research and data literacy skills. Yet these very skills are of immense value to industry, and, of course, to the students themselves, both for their careers and in their daily lives. To understand the implications of these data issues in business education and research, Ithaka S+R recently collaborated with 14 academic libraries to study the teaching practices and needs of business instructors. This presentation will discuss the project’s key data and research related findings, including the changing relationship between business schools and the private sector, as well as the various workarounds faculty have implemented as industry data sharing practices have changed. It will further interrogate how these changing practices intersect with data literacy and research pedagogy in business schools and how they can lead to inequitable outcomes for disadvantaged students. Finally, it will identify critical library services capable of ameliorating this problem. Given the popularity of the business major, improvements in data and research pedagogy in this field has the potential to make an outsize impact on students’ proficiency in these skills more generally.
      • Kurtis Tanaka, Ithaka S+R        
    • Making Campus Connections through Endangered Data Week Events that Highlight Ethical Issues in Research with Vulnerable Populations
      • We live in a world where both an overabundance and an absence of data exist, and members of vulnerable populations (such as undocumented immigrants, prisoners, and transgender folks) can be at risk from either end of this spectrum. Too much data can dangerously reveal someone’s status or identity, and an absence of data can undercut the creation of infrastructure required to support a group’s needs. To address this coexistence, the Transgender Library Resources group at UMN Libraries commemorated Endangered Data Week (EDW) 2019 by launching its new “Library Resources for Transgender Topics” guide and collaborating with several other University departments to organize two events:
        • 1. A panel featuring three scholars from diverse perspectives who shared their experiences with conducting research with vulnerable populations.
        • 2. A workshop that introduced participants to the process of submitting federal and state public records requests and strategies for using data received in response to these requests in scholarly publications and journalism.
      • This presentation will detail an implementation plan for hosting an EDW panel and workshop, share the steps taken to create a thoughtful and tested LibGuide, and highlight the ways in which the Libraries connected with other groups on campus over a shared interest and area of expertise. Finally, evaluation data of the events and guide will be used to direct others in their endeavors to address difficult topics, engage university communities in accessible investigations of data ethics, and build inclusive resources for their users.
      • Wanda Marsolek, Rachel Mattson, Alicia Kubas, Shanda Hunt, Shannon Farrell, and Katie Wilson, University of Minnesota, 
  • 12-1:15 p.m. – Lunch
  • 1:20-2:20 p.m. – Panel 5 – Data privacy
    • Screening for human subject disclosure risk during data curation and RDM service connections*
      • Johns Hopkins University Libraries Data Services, as a central research data management (RDM) functions, has been operating the JHU Data Archive since 2012. Like other public access institutional data repositories, we share the challenge of archiving data from human subject research. Relying on our policy of receiving de-identified datasets is no guarantee that either researchers or our curators know what qualifies as acceptably low risk of privacy disclosure for submitted collections. Protecting identifiers is also a growing concern among our university compliance offices, for whom disclosure risk management is outside their traditional purview. Our internal requirements for disclosure screening has led to new and mutually-beneficial service connections for RDM support for compliance offices, researchers, and data managers, particularly for medical research. This presentation will briefly outline those consulting and training relationships, emphasizing the challenges of developing a policy for acceptable risk thresholds for public access data, and for building expertise among research groups for meeting those standards. From this context, the presentation’s main focus is on the techniques we apply to screen submitted de-identified data for remaining risk as part of our curation process. This includes requesting subject consent forms indicating conditions for release, reviewing codebooks for higher-risk variables, and rudimentary risk calculations for data tables. We then communicate with depositors about risk levels and feasibility of adjustments they could make, or assist them with restricted repository options. We will briefly discuss software and other ideas for increasing efficiency, and welcome discussion with the RDAP audience, including about whether data curators can adequately screen data to meet the strict standards of HIPAA and privacy researchers for fully de-identified data suitable for public release. Despite these challenges, we recommend RDM services explore whether building expertise in human subject data protection can lead to vital new relationships among researchers and compliance administration.
      • David Fearon, John Hopkins University
    • Connecting Students to Their Data : Data Doubles and the Student Voice in Library Learning Analytics
      • Despite a wealth of research and hype around learning analytics (LA) in higher education and the broad adoption of LA in academic libraries (Perry et al., 2018), the voice of the most vulnerable population — students — is missing. LA data currently represents a superficial connection between the student and the library, instead of a meaningful connection where students have a say in how their data is captured and used. Further, research on library-conducted LA has revealed limited efficacy and concerning issues with data handling practices (Briney, 2019; Robertshaw & Asher, 2019). Our research has demonstrated that students are frequently unaware of the data that their institutions gather about them, imagining it only individual points such as grades, physical use of the library, or books checked out. Despite the disparity between this initial naivete about the topic and the breadth to which their institutions capture this data, students nevertheless present extremely nuanced considerations of appropriate use, access, retention, data capture and interest in engaging with their own data. The impact of this research on the data management community relates to acknowledgement of academic hierarchies, the disparity of surveillance faced by our students, and expectations around the management of student data by institutional actors. In this presentation we will review: the current state of library learning analytics; results from the what we have learned about student perceptions thus far; next stages in our research in understanding of student preferences and expectations; and opportunities for intervening locally to bring awareness to the student body. By truly connecting students with their data, we hope there is opportunity for authentic student-library engagement and improved library data management practices overall.
      • Kristin Briney, California Institute of Technology, and Abigail Goben, University of Illinois at Chicago
    • Policy as Practice: The Whys and Hows of a Library Privacy Policy Content Analysis
      • This presentation examines the impetus for conducting a profession-wide environmental scan of public-facing library privacy policies, and details a qualitative methodology for doing so. Privacy is a cornerstone value in librarianship, intended to protect patrons’ intellectual freedom. However, as patrons interact with the online systems libraries use to provide services, they create a digital footprint of identifiable data. Literature has indicated that library policies lag behind the current digital environment in which collection of patron data is ubiquitous. Further, policies often omit details about how patrons’ data is used, particularly in cases that go beyond the scope of providing a service. Far from being a neutral activity, stewardship of this data is a librarian’s professional responsibility, including disclosing when data is collected and what it is used for. Privacy policies are one way in which libraries can establish a level of transparency with patrons about their privacy priorities and practices. This presentation details a qualitative approach to examining the text of existing privacy policies from Association of Research Library institutions in the United States, using ALA’s Library Privacy Guidelines and related documents as a guide. The presenter discusses employing content analysis to examine these policies with an eye towards updating existing policies, or creating new ones. This effort aims to create connections between library staff and patrons by working to establish transparency and trust in an environment where the collection and use of patron data is often opaque.
      • Greta Valentine, University of Kansas
  • 2:20-2:35 p.m. – Break
  • 2:35-3:35 p.m. – Closing keynote
    • DataONE, Robert Sandusky, Sustainability and Governance Working Group
  • 3:40-3:50 p.m. – Closing announcements

Friday, 13 March 2020

RDAP Workshops

  • 9-11 a.m. – Workshop 1 
    • Connecting with Coding for Basic Data Processing
      • Coding allows researchers to rapidly and consistently accomplish tedious tasks, removing human error and exponentially increasing the scale of material that can be processed.  In this workshop, participants will learn how to use the command line (e.g., Bash shell) and a simple Python script to scrape content from a web site and clean up the textual data by removing HTML tags. Participants will use their own laptops for step-by-step hands-on exercises, but will not need to install any software.  Instead they will use a browser-based environment, PythonAnywhere, for the Python segment.  Other examples of scenarios in which coding is particularly applicable will also be discussed, along with considerations for learning Python. This workshop is based on materials available from the Digging Deeper, Reaching Further: Libraries Empowering Users to Mine the HathiTrust Digital Library Resources curriculum, and is aimed at absolute beginners.  Attendees who are familiar with the command line or know Python are invited to contact the instructors about floating for the workshop.  (This workshop was also offered at the Southeast Data Librarians Symposium in New Orleans in October 2019 with slightly different content.)
      • Learning Outcomes: 
        • Understand the power of coding for automating repetitive tasks
        • Learn to use several basic command line commands for web scraping
        • Learn how to be polite in scraping
        • Obtain resources for learning more about Python
      • Michele Hayslett and Barrie Hayes, University Libraries at University of North Carolina at Chapel Hill
  • 9-noon – Workshop 2 
    • Moving the ESIP Data Management Training Clearinghouse to a Training & Education Gateway for Research Data Skills Instructors and Learners
      • Karl Benedict (Director of Research Data Services in the University Libraries at the University of New Mexico), and Nancy Hoebelheinrich (Principal of Knowledge Motifs LLC and Editor of the DMTC) will introduce views of the pre-publication and post-publication annotation options that are planned for user assessments for educational resources included in the DMTC.  Workshop participants will be invited to test the utility and usability of educational resources that they bring to the workshop or that can be downloaded from the DMTC.  Discussion will facilitate the development of a community of practice for instructors including draft guidelines for creating FAIR (Findable, Accessible, Interoperable, Re-usable) educational resources.  Of particular focus in the guidelines (also incorporated in user assessment) is the addition of Accessibility features suggested by the W3C whenever possible (https://www.w3.org/wiki/WebSchemas/Accessibility).   Workshop leaders will solicit feedback from participants on these efforts, but also ask the audience to participate in brainstorming suggestions on other educational services that would be useful for a Research Data Skills Training & Education Gateway.
      • Learning Outcomes:
        • Apprise workshop participants of the expanded range and depth of educational and training resources in the DMTC, and of changes to the DMTC platform and other enhancements added by virtue of the IMLS NLG.
        • Enable the workshop participants to work with the Clearinghouse as contributors to and reviewers of the growing collection of training materials, both before and after inclusion in the Clearinghouse.
        • Engage with the data management, access, and preservation communities attending RDAP to present and discuss prospective guidelines on creating FAIR educational and training resources and on developing a Community of Practice for Research Data Skills instructors. 
        • Recruit community members for participation in the project as members of the reviewer and editorial teams, and as participants in upcoming usability testing of Clearinghouse UI enhancements.
      • Nancy Hoebelheinrich, Knowledge Motif LLC, and Karl Benedict, University Libraries at the University of New Mexico
  • 1-3 p.m. – Workshop 3
    • Mind the gap: Connecting campus-wide expertise to address disparities in RDM education
      • There is often an expectation for new, incoming graduate students to already possess  data management skills. In reality, there is a disconnect with students existing skill sets, especially first generation graduate students who are historically not familiar with this aspect of the research lifecycle. The University of Minnesota Libraries recognized the gap and developed the Research Data Management Bootcamp to create core RDM competencies for graduate students. However, providing research data management training that is relevant and engaging for graduate students across disciplines is a challenge for any institution. To meet this challenge, the University of Minnesota Libraries have partnered with other units on campus (e.g., Graduate School, Supercomputing Institute) to help with funding, planning, promotion, and instruction. We iteratively re-design the bootcamp each semester based on student feedback to help meet the challenges of providing meaningful training to this broad group of students. The bootcamp has also allowed the Libraries to foster collaboration internally in bringing together diverse skill sets, expertise, and experience from library staff. Beyond bootcamps, another approach we have taken is to identify quantitative graduate level courses that illustrated a potential need for research data management support, and partnered with instructors to offer focused course-integrated workshops in collaboration with subject liaisons. This workshop will provide a look into the content and behind-the-scenes logistics of running a research data management bootcamp, and suggestions for developing and running systematic methods class outreach. Attendees are invited to bring challenges and questions they may have related to similar programming and RDM outreach methods.
      • Learning Outcomes:
        • Attendees will learn how to share the foundations of data management with graduate students from various disciplines
        • Attendees will be provided with suggestions on how and who to partner with on their campuses to make more of an impact and increase their reach with data management education 
        • Attendees will receive recommendations for data management related topics, activities, breakout sessions, and workshops to fit the needs of researchers from various disciplines
      • Katie Wilson, Shannon Farrell, and Wanda Marsolek, University of Minnesota
  • 1-4 p.m. – Workshop 4
    • Supporting Open Science Data Curation, Preservation, and Access by Libraries
      • Openness in research can lead to greater reproducibility, an accelerated pace of discovery, and decreased redundancy of effort. In addition, open research ensures equitable access to knowledge and the ability for any community to assess, interrogate, and build upon prior work. In order for research to succeed, openness and reproducibility are required. In turn, this requires open infrastructure and distributed access; but few institutions can provide all of these services alone. Providing a trustworthy network for perpetual availability of research data is critical to ensuring reproducibility, transparency, and ongoing inquiry. There is increased attention on the importance of open research and data sharing, leading to a proliferation of platforms to store data, materials, etc. These platforms exist in a fragmented environment that lacks technical integration and coordination with local library expertise and services, hampering curation and long-term stewardship. For example, the open source OSF enables researchers to directly create and manage research projects and integrates with other tools researchers use (Google Drive, Dropbox, Box, etc.), but lacks the ability to archive that materially locally at a researcher’s institution. Long-term stewardship and preservation requires multiple copies of data archived in different locations, and creating archives seamlessly would be ideal. COS and IA propose to address the preservation and stewardship challenges by providing open, cooperative infrastructure to ensure long-term access and connection to research data, and by supporting and promoting adoption of open science practices to enhance research reproducibility as well as data sharing and reuse. In this workshop, participants will gain skills to implement reproducible research practices using OSF and other tools, and then curate and preserve research artifacts on Internet Archive. We will further demonstrate the possibilities to preserve such content in local institutional repositories or other distributed networks, uses community-developed standards and protocols.
      • Learning Outcomes:
        • Expanded awareness of issues relating to the reproducibility of research
        • Improved ability of individuals, lab groups, and institutions to manage research workflows, collaborate with team members, and increase the visibility of their research by use of open tools.
        • Enable the workshop participants to work with the Open and Reproducible Research Practices training curriculum as contributors and reviewers of the materials, providing meaningful feedback used to improve the content. Identify gaps for iterating towards a shared resource of materials that support data sharing, discover, reuse, preservation and reproducible research practices.
        • Ability to create preservation packages of content created on OSF, to be stored on Internet Archive or another repository.
        • Recruit participants for participating in a preservation network and development of methods for sharing of archived data and testing of replication
        • Discussion on how preservation data could be leveraged for data mining and computational use cases.
      • Sara Bowman, Center for Open Science