A short summary of the project proposal can be read below:
In the last few years we have seen a rapid increase of available data. Digitization has become endemic. This has lead to a data deluge that left many unable to cope with such large amounts of messy data. Also because of the large number of content producers and different formats, data is not always easy to process by machines due to its its diverse quality and the presence of bias. Thus, in the current data-driven economy, if organizations can effectively analyze data at scale and use it as decision-support infrastructure at the executive level, data will lead to a key competitive advantage. To deal with the current data deluge, in the BetterCrowd project I will define and evaluate Human Computation methods to improve both the effectiveness and efficiency of currently available hybrid Human-Machine systems. Human Computation (HC) is a game-changing paradigm that systematically exploits human intelligence at scale to improve purely machine-based data management systems (see, for example, CrowdDB). This is often obtained by means of Crowdsourcing, that is, outsourcing certain tasks from the machine to a crowd of human individuals who perform short tasks (also known as Human Intelligence Tasks or HITs) that are simple for humans but still difficult for machines (e.g., understanding the content of a picture or sarcasm in text). Involving humans in the computation process is a fundamental scientific challenge that requires obtaining the best from human abilities and effectively embedding them into traditional computational systems. The challenges involved with the use of HC are both its efficiency (i.e., humans are naturally slower than machines in terms of information processing) and effectiveness (i.e., while machines deterministically compute, humans behavior may be unpredictable and possibly malicious). The objective of the BetterCrowd project is to improve Human Computation quality and scalability for Big Data processing. The project will deal with the fundamental scientific challenge of understanding how humans and machines can better interact and collaborate in computation problems. The findings will make possible to deal with more complex Big Data analytics problems. While most current Big Data solutions focus on volume, this project aims to improve HC to make it applicable to the analysis of heterogeneous data with variable quality. Thus, research goals of this project include the improvement of efficiency and effectiveness of current HC techniques making it possible to deal with high volume and velocity of data. The project is composed of two main parts. We will first look at how to improve crowdsourcing effectiveness by proposing novel techniques to detect malicious workers in crowdsourcing platforms. In the second part, we will make HC techniques scale so that they can be applied to larger volume of data focusing on scheduling tasks to the crowd.