Jo Bates and Robert Villa have both been successful applying to the AHRC's "Digital Transformations in the Arts and Humanities" Big Data projects call.
The final outcome of the project will be an interactive website and multimedia research data archive that will allow members of the public to explore this journey in more detail, thus contributing to the public understanding of science.
The project is being developed in collaboration with Dr Yuwei Lin of the University for the Creative Arts, and the infrastructure for the interactive website will be produced by developers at Madlab in Manchester.
Such techniques are typically based on learning, where some smaller manually created training set is created and fed to an automatic technique, after which the trained technique can then be applied to the full data set. While much scientific effort has gone into improving machine learning and automatic methods of retrieval, less work has gone into the process of how to create training sets. The creation of training sets is, ultimately, a human endeavour.
The project, “Understanding the Annotation Process: Annotation for Big Data”, aims to investigate this less studied area: how can we efficiently and with the least effort create training sets which can be used by automatic techniques to learn from? The project has three main research questions:
- How do human assessors judge and assess text documents, images, and videos?
- What are the main factors which affect assessor performance (e.g. accuracy, speed, etc.)?
- What material is most easy for human assessors to judge, and which will also give the best "bang for the buck" when used as input to a machine learning system?
Without training sets, or learning data, automatic machine learning techniques cannot operate, making the creation of training sets a vital component for analysing and understanding big data collections. By learning more about the human process by which people judge material for the purposes of training set creation, we aim to create “best practice” guidelines which can be used by other researchers who require training or evaluation data.
This project is in collaboration with Dr Martin Halvey at Glasgow Caledonian University, along with Dr Jeremy Pickens (Catalyst Repository Systems), the National Fairground Archive, University of Sheffield, and the British Universities Film & Video Council.
Jo Bates - The Secret Life of a Weather Datum
The Secret Life of a Weather Datum is a 15 month research project that will explore the socio-cultural values and practices shaping, and being shaped by, the production, collation, distribution and re-use of weather data produced by the UK’s Met Office. In order to achieve this aim, the project will be following the ‘journey’ of a single weather datum from its production into three cases of re-use: climate science, weather risk markets and citizen science projects. These cases will comprise of interviews, observations, digital ethnography and policy research.The final outcome of the project will be an interactive website and multimedia research data archive that will allow members of the public to explore this journey in more detail, thus contributing to the public understanding of science.
The project is being developed in collaboration with Dr Yuwei Lin of the University for the Creative Arts, and the infrastructure for the interactive website will be produced by developers at Madlab in Manchester.
Robert Villa - Understanding the annotation process: annotation for Big data
Big data, by definition, assumes large, rapidly changing, heterogeneous collections of material and new techniques for the processing of this data. As more digital technologies are becoming ubiquitous many data collections, in fields including humanities, art, culture etc. are becoming larger. Many of the technologies associated with “big data”, used to make sense of these large collections, are related to machine learning and associated technologies.Such techniques are typically based on learning, where some smaller manually created training set is created and fed to an automatic technique, after which the trained technique can then be applied to the full data set. While much scientific effort has gone into improving machine learning and automatic methods of retrieval, less work has gone into the process of how to create training sets. The creation of training sets is, ultimately, a human endeavour.
The project, “Understanding the Annotation Process: Annotation for Big Data”, aims to investigate this less studied area: how can we efficiently and with the least effort create training sets which can be used by automatic techniques to learn from? The project has three main research questions:
- How do human assessors judge and assess text documents, images, and videos?
- What are the main factors which affect assessor performance (e.g. accuracy, speed, etc.)?
- What material is most easy for human assessors to judge, and which will also give the best "bang for the buck" when used as input to a machine learning system?
Without training sets, or learning data, automatic machine learning techniques cannot operate, making the creation of training sets a vital component for analysing and understanding big data collections. By learning more about the human process by which people judge material for the purposes of training set creation, we aim to create “best practice” guidelines which can be used by other researchers who require training or evaluation data.
This project is in collaboration with Dr Martin Halvey at Glasgow Caledonian University, along with Dr Jeremy Pickens (Catalyst Repository Systems), the National Fairground Archive, University of Sheffield, and the British Universities Film & Video Council.
Comments