Published October 12, 2020 | Version v1
Working paper Open

Units of Measure for Humans and Machines: Making Units Clear for Machine Learning and Beyond

  • 1. NIST and CODATA DRUM TG
  • 2. University of Florida and CODATA DRUM TG
  • 3. University of Southampton and CODATA DRUM TG
  • 4. CODATA
  • 5. University of Edinburgh, ISC Governing Board and CODATA Past President
  • 6. CSIRO and CODATA Executive Committee
  • 7. University of Canterbury, IUPAC and CODATA Executive Commitee
  • 8. Leiden University, GO FAIR and CODATA President
  • 9. ISC

Description

This document is a manifesto and call to action produced by the DRUM (Digital Representation of Units of Measure) Task Group as part of its efforts to mobilise representatives from International Scientific Unions and Associations to engage with this fundamentally important issue.

Why are Units Important?

The major challenges that confront human societies are global in reach and complex in nature. They do not have simple, single-discipline, solutions. The intrinsically interdisciplinary and multidisciplinary challenges require trans-sectoral cooperation between academic, commercial and governmental agencies.

For such collaboration to succeed, the essential tools for scientific exchange of information must be fit for purpose if they are to meet the challenges. Quality data is essential, and to be understandable and usable, the data must meet internationally-agreed community-endorsed conventions or standards, a key element being the clear representation of units.

Although the “collaboration imperative” is recognised by the major research funders and international organisations, they often fail to appreciate all the details that are essential to enable the required cooperation. Funding to encourage collaboration, highlighting relevance, pathways to impact, and facilitating international exchanges, are all vital but these are a tower of cards if the foundations for scientific exchange of information are not up to the required challenge.

Providing quality data is essential, but collaborative work will fail unless all those who need to use the data (and the associated information and knowledge) can actually understand it and this requires international, community agreements.

Units of measure are a key part of such agreements. Much of the global output of data lacks clear and unambiguous definitions of the units used. While the units of measure of quantitative values might be conventional and thus unstated for the original application, they are often obscure outside the originating community or discipline. This is particularly problematic when data are analysed at scale using machines. Confusion abounds. Huge efforts are needed when mining data from a neighbouring discipline.

The aim of creating and sharing data has to be to enable problems to be solved, problems that need the active collaboration of other disciplines and sectors; data without well curated, understood, and digitally communicated units is a hindrance rather than an advantage.

Now that we can exchange vast quantities of data in minimal time, there is an even greater need for clarity on all the details, including units. Computers consume numbers, but analysis needs quantities, and quantities are scaled. If the scale is different to that needed by the user, then conversion (re-scaling) is required. It is essential that the major funders of research support the international standards process, especially current efforts on digital representation of units of measure.

Work on data standards is essential. Without it, the value of much of the huge investment in research capability is lost, or at best, requires major archaeology to retrieve meaningful information from the data, something that can easily be avoided with the appropriate investment. Now!

Files

Units of Measure for Humans and Machines - Making Units Clear for Machine Learning and Beyond.pdf