Sub-project: category assignment
Related github issues
https://github.com/OpenDataScotland/the_od_bods/issues/35 Create table of expected data categories for each council https://github.com/OpenDataScotland/the_od_bods/issues/25 Tidy up inconsistent dataset tags https://github.com/OpenDataScotland/the_od_bods/issues/45 create collection of keywords relevant to categories https://github.com/OpenDataScotland/the_od_bods/issues/36 Automate table of expected data categories for each council
Related Slack threads:
https://opendatascotland.slack.com/archives/C02HEHDL8AY/p1649706946076149 (Frida de Sigley) https://opendatascotland.slack.com/archives/C02HEHDL8AY/p1648653154748209 (Jens Rasmussen) https://opendatascotland.slack.com/archives/C02HEHDL8AY/p1646175429897549 (Ash McClenaghan)
Other related links:
https://www.nature.com/articles/s42256-020-00287-7.pdf An open source machine learning framework for efficient and transparent systematic reviews
In JKAN, categories on main page are using categories key in datasets sets. We're currently dumping all dataset tags into that key-set but we'll want to replace it with one of the 14 defined categories as in jkan/_data/categories.yml
- what is the current distribution in categories?
- what is the desired distribution in categories (i.e. not everything should go into "uncategorised" it should be the catchall, but 20% reasonable?)