Information Integrity through Metadata: Publishing Data with Context
This blog is a part of a series of posts on the importance of information integrity. Click here to read the introductory post.
Data becomes more powerful when it has context. When we can say that data was transmitted by x person at x time, the data is no longer just information – it is evidence of an action or transaction. When data is wrapped up in this contextual information (metadata), we have a record of an event or a transaction. These records (data + metadata) are a basic requirement for accountability, letting us see who has done what and when.
How can we make data more evidential, more consistently?
In June 2015, Thomson Reuters Foundation reported on a new mobile phone app ‘enabling civilians in conflict-torn countries to capture and share verifiable footage of war crimes’. Reuters reported:
Mobile phone footage of human rightsAn essential part of open government includes protecting the sacred freedoms and rights of all citizens, including the most vulnerable groups, and holding those who violate human rights accountable. T... abuses, mainly shared on social media in recent years, is often fake, impossible to verify or lacking the information necessary to be used as evidence in court, said the International Bar Association… The “EyeWitness to Atrocities” app records the user’s location, date and time, and nearby Wi-Fi networks to verify that the footage has not been edited or manipulated, before sending it to a database monitored by a team of legal experts.
The implications of tying data to metadata are clear – the metadata makes the footage into a reliable record. It documents the location, date and time – in other words, the context – of the creation or capture of the data.
How can we set up similar controls for other sorts of data given that data is collected from diverse sources, and more significantly, through diverse methods? It can be actively collected by governments, for instance through census-taking, or via sensors relaying data through telemetry (for example, the UK’s Environment Agency Realtime Flood Data). It can be supplied by citizens through civic technologies or e-government platforms. It can be extracted or aggregated from existing datasets, management information systems, or centuries old paper records.
The diversity of information gathering practices makes standardising metadata capture challenging, but there are two things we can do:
- Agree minimum metadata requirements. These should document the context of data creation or capture. EyeWitness to Atrocities does this with location, date and time. Australia has defined three core metadata elements (identifier, creator and date created) in its minimum set for government agencies. In any jurisdiction, what are the basic metadata elements needed to give data enough context to make it evidential?
- Use or develop tools for particular types of data or data capture processes. Eyewitness to Atrocities is one example of a tool for documenting the context and provenance of data. Where else could such tools be useful?
Once captured, data may be released. At this point, maintaining links between data and metadata becomes even more challenging. Datasets are, by their nature, fissiparous – inclined to break up into their constituent parts. That’s what makes them so valuable (we can repurpose them), but it’s also what makes them so open to misuse.
If we can’t link metadata to data through data’s various permutations, we need to foster a culture of critical data literacy, in which suppliers publish what metadata they have, developers reference their data sources, and consumers know to question the validity of data on the basis of the completeness and robustness of its metadata.
This is happening. There’s been a discernible improvement in the quantity and quality of metadata being published on open data portals over the last five years. Data analysts and data journalists continue to interrogate government data. But the importance of metadata needs to be flagged with end users, if they’re to navigate the available data and use it as the basis of their engagement with government and civil society. Metadata needs to be a cornerstone of data literacy education and training.