Things not Datasets

This project sets out to create a wikidata-like site which extracts data about different objects from multiple datasets (example below)

⛶ Full screen

Currently, different objects (e.g. people, companies, power plants) have data about them spread across various datasets, often with several ids referring to the same object. This project sets out to create a new dictionary schema that acts as a central node within a data catalogue, linking different datasets together based on the objects they describe. Additionally, the dictionary can be used to describe relevant attributes which should be extracted for different objects. The components of the dictionary are quite simple, with a CSV lookup table that can handle 1-to-many mappings and an extension to the specification which allows you to express which datasets different identifiers refer to.

The core change to the schema is the use of foreignKeys to link to external datasets that use ids specified in the dictionary, the attributes entry then describes the columns which should be extracted from the dataset.

"foreignKeys": [
  {
    "fields": "osuked_id",
    "reference": {
      "package": "https://raw.githubusercontent.com/OSUKED/Dictionary-Datasets/main/datasets/plant-locations/datapackage.json",
      "resource": "plant-locations",
      "fields": "osuked_id",
      "attributes": ["longitude", "latitude"]
    }
  },

So far a basic dictionary schema has been created and Python code developed to link and extract data relating to objects described in the dictionary (represented in an intermediate JSON file). This data is then used to populate a website that describes the different objects in a wikidata-like format.

Next Steps:

  1. Add a datasets page to the site which renders all of the datasets linked to the dictionary (ideally integrated or built on livemark)
  2. Add a dictionary page which provides a high-level overview of the dictionary
  3. Enable the option to link multiple dictionaries together
  4. The development of "attribute recipes", a mechanism to combine extracted attributes (from different datasets) into new data (e.g. power plant output and carbon emissions to calculate carbon intensity)
  5. Inclusion of units information - enabling automated generation of derived attributes
  6. "special data" handling, e.g. enabling spatial data to be extracted and shown on a map (ideally tapping into the LiveMark plugin system)

The concept is discussed further in this video

All attendees, sponsors, partners, volunteers and staff at our hackathon are required to agree with the Frictionless Data Code of Conduct. Organisers will enforce this code throughout the event. We expect cooperation from all participants to ensure a safe environment for everybody.

Creative Commons LicenceThe contents of this website, unless otherwise stated, are licensed under a Creative Commons Attribution 4.0 International License.

Frictionless Hackathon