RDAP’s Response to the Federal Data Strategy – by Amy Koshoffer

In this post RDAP Treasurer Amy Koshoffer, who facilitated the association’s response to the Federal Data Strategy request for comment,  presents the RDAP response submitted in July of 2018.

The Federal Data Strategy will shape how the government manages, provides access to, and preserves federal data.  As information and data professionals, our community is uniquely qualified both to provide guidance on data quality, access, and preservation of federal data and to highlight considerations for a successful long term strategy.  In early July 2018, the RDAP Executive Board invited community members to attend two town hall meetings and virtually author an RDAP collective response to the Federal Data Strategy. As RDAP is in its formative stages, the board felt that it was important to respond to the call from the Federal Government under the name of RDAP to reflect the steps we are taking to grow and become an independent association of information and data professionals defined by a common set of values.  The process is still ongoing, and we suggest that members consider responding to follow-up calls for feedback on the practices currently proposed.  The result of the RDAP community’s efforts so far follows:

Federal Data Strategy – RDAP response (submitted 2018-07-27):

The Research Data Access and Preservation Association (RDAP) appreciates the opportunity to offer feedback on the Federal Data Strategy. As a community of data stewards, researchers, and librarians, we promote and support best practices in data stewardship, especially with regard to activities related to access and long-term preservation of research data.  As large amounts of federal data and research data are generated using taxpayer dollars, we consider it a priority to plan for the long-term preservation and free access to data maintained in thousands of formal and informal federal repositories, from large operational centers maintaining information about weather or the census to customized systems serving a particular discipline like biotechnology or agricultural research.  In order to meet the research, innovation, and growth potential of the nation, we advocate for increased resources as part of the long-term Federal Data Strategy to support the essential expertise, infrastructure, and long-term commitment to open, free, and ethical access to data.

We encourage agencies generating data to implement current best practices (such as those recommended by Data One, the Digital Curation Centre, and the ESIP Federation) across the data lifecycle, from data collection and analysis to data preservation, and to do so not only for public-facing data but for all federal data.  Well-curated data are critical building blocks with far-reaching economic, business, research, and educational impacts. Essential components of these building blocks are 1) open, sustained access to the data and 2) data-generation transparency. We encourage the Federal Data Strategy architects to clearly define their commitment to Federal Data sustainability and transparency for the future good of economic, business, research, and educational pursuits. We encourage federal agencies to commit to education programs and resources for all data users in order to promote the highest of standards for data generation and reuse.

We recognize that the government has finite resources.  However, data have the potential to be a valuable national asset. Time and resources are spent when collecting data, yet with uneven and unclear data management practices, this potentially valuable data can be rendered unusable or contribute to data loss.  We applaud the standards that federal agencies, such as NOAA and NASA, achieve. We would like to see best practices highlighted across agencies, which would help facilitate collaborative and cross-sector data work, recognizing that funding and resources must be put in place to support these efforts.  We applaud the Federal Data Strategy architects’ effort to address problems in data work and encourage federal agencies to articulate how data can be maintained for the long term, in nonproprietary formats, with rich and comprehensive documentation or metadata, and protected against loss.  We call on the Federal government to support the work of these agencies to meet their data stewardship goals, recognizing their domain expertise and the value they contribute.

The RDAP community encourages the Federal Data Strategy architects to clearly and strongly integrate the FAIR data principles of Findability, Accessibility, Interoperability, and Reusability (https://www.force11.org/group/fairgroup/fairprinciples) throughout the ten principles of the Federal Data Strategy.

Data Stewardship: Good data stewardship includes adherence to FAIR principles by ensuring Findability through encouraging use of standards for metadata, data formats, and data citations. We suggest that the commitment to information and operating procedure standards (See for example, ISO 14721 – the OAIS Reference Model – and ISO 16363 – Audit and Certification of Trustworthy Digital Repositories) should be more explicitly stated in the principles as an additional bullet point under Stewardship.  The federal government should follow its efforts under M-13-13 (“Open Data Policy—Managing Information as an Asset”, https://project-open-data.cio.gov/policy-memo/) and continue to act as a model in order to cultivate a national community of practice for curating and stewarding data. Standards should be defined and stressed throughout the data lifecycle; citations should be consistent, machine-readable data citations that include DOIs; and any dissemination system should support version control.  This is critical for research, policy, and other applications that are developed based on federal data sources. We encourage agencies to develop detailed policies about retention and venues for preservation in order to detail the chain of custody for federal data.

Data Quality: The FAIR principles align with Data Quality throughout the data lifecycle; in particular, consideration for Accessibility and Interoperability relies on well-structured and machine-readable metadata, persistent access, and standardization of data formats.  We would suggest including language that emphasizes standards at the point of data collection to ensure data quality that will result in added value.

We find the phrase “new data only when necessary,” found in Principle 6: Create Value, limiting in scope. Often the best use of data is the one not considered or anticipated. For example, nineteenth century naval logs of weather observations are now being used to help reconstruct the climate of the recent past and improve predictions of future climate. We suggest that the authors of the strategy take a more long-term view for the benefit of both current research and innovation and future endeavors.

Continuous Improvement: We believe that the principles under Continuous Improvement must also embrace this long-term approach and incorporate the FAIR principle of Reusability.  As emphasized under Stewardship and Quality, standards, documentation such as readme files and data dictionaries, and comprehensive metadata facilitate data reuse, as do the use of nonproprietary data formats and version control.   We suggest that agencies create improved documentation, such as guidelines and checklists for data collection and processing, and adopt it across agencies. Also, we suggest that agencies consider improving website design, especially for very complex and dense sites that deliver data, such as the Census of Agriculture. This would simplify access for researchers, local government officials, and business owners who wish to use government data. This commitment to improvement should be explicitly stated in the principles.

As an engaged community of information professionals, RDAP supports the openness and transparency of the research lifecycle.  As an extension of these core beliefs, we support openness and transparency related to the creation, distribution, access, and preservation of federally funded and/or collected data.  Additionally, RDAP supports the need for ongoing cross-sector partnerships, free and open access to data, and funding for ongoing data management and curation for the benefit of commerce, the research community, and the public.

Respectfully authored and submitted on behalf of the RDAP by community members

  • Jonathan Petters, Virginia Tech University Libraries, Data Management Consultant
  • Erica Mehan Johns, Cornell University, Research Data & Environmental Sci Librarian
  • Megan O’Donnell, Iowa State University, Data Services Librarian
  • Helen Tibbo, University of North Carolina School of Information and Library Science, Alumni Distinguished Professor
  • Jon Wheeler, University of New Mexico, Data Curation Specialist
  • Karen Coghlan, The National Network of Libraries of Medicine, New England Region, Education & Outreach Coordinator
  • Margaret Janz, University of Pennsylvania, Scholarly Communications and Data Curation Librarian
  • Amy Koshoffer, University of Cincinnati, Science Informationist
  • Lynn Yarmey, Research Data Alliance, Director of US Community Development
  • Amy Neeser, University of California Berkeley, Research Data Management Program Manager

Leave a Reply