Friday, 22 February 2019

SheffDataForGood: using your computer skills to do good deeds

Data Science knowledge provides you with a set of skills that is very much in demand by large and small organisations, and can land you a lucrative job. However, it does not need to be all about the money. There are also opportunities to use these skills to help other people. One such opportunity is provided by the group Sheffield Data For Good. They recently organised a data hack to help Roundabout, a Sheffield charity that tackles youth homelessness, and have already scheduled a second event for the 9th of March.

I was at the first data hack on the 26th of January. Around 25 people, a mix of academics, data professionals and charity workers, came to the event. Amy Evans, data and finance coordinator from Roundabout, was at the event to explain the data that we were going to work on and what Roundabout was trying to understand about their data. I cannot provide details about the data due to data protection regulations, but what struck me the most was the atmosphere and the energy on display by everyone present, and I was not the only one who thought so. Another participant, Stuart Bolton, said about the event: “A really great day and I was bowled over by the enthusiasm and energy of everyone, as much as by the technical skills and insights we developed.”


One surprise of the event was finding Sarah Miller helping organise the event. Sarah is a recent graduate from the MSc Data Science programme and currently works as BI developer for Jet2.  She said the following about her experience with Data for Good: “Helping to organise the data hack allowed me to use the skills I had been learning on the MSc Data Science course, in a real world situation. The hack and working with Data 4 Good has been such a wonderful experience, the people are welcoming, supportive, and inspiring to work with, it’s great to be part of Sheffield’s growing data community.”

I found the experience great. We were all using different tools and techniques, which taught me many useful things. We did three one hour coding sprints, coming back to discuss what we have found and what we could do on the next round. My research background is in chemistry and computer science, so this was the first time I worked on personal data. It was a big challenge, and I will not complain as much about chemical data going forward! 


Roundabout was very grateful for all the insights we provided. Amy said: “The Data Hack was such a valuable day for us. It helped us to see where we can make some changes to our datasets going forward to help us to demonstrate the impact we are having with young people in South Yorkshire, as well as confirming to us that our strengths lie in helping young people to maintain tenancies and maximise their income. We look forward to continuing to work with Data For Good in the future.”

Written by Dr Antonio de la Vega de Leon, Lecturer in Chemoinformatics

Monday, 11 February 2019

Fairness, accountability and transparency in Machine Learning? Jo Bates reports back from ACM FAT* in Atlanta, USA


A couple of weeks ago I travelled to Atlanta, USA to attend ACM FAT* - an interdisciplinary conference that addresses issues of Fairness, Accountability and Transparency in Machine Learning. Officially, I was there on the hunt for potential papers and authors to invite to submit their work to Online Information Review. However, the FAT* field is also closely related to my research interests around the politics of data and algorithms, and my teaching on the Information School’s MSc Data Science. I was keen to check out what was happening in the FAT* community, and feed my findings back into my teaching and into two new projects I am working on in this field: CYCAT & supervising a new PhD student – Ruth Beresford – whose research will investigate algorithmic bias in collaboration with the Department for Work and Pensions.

I was privileged to hear a number of great papers – the best of which engaged critically with issues of social context and justice. My two favourite papers which I highly recommend for anyone interested in these topics were:

Fairness and Abstraction in Sociotechnical Systems (Selbst et al, 2019) was a great paper that spoke to a sense of unease I have felt recently about the increasing amount of technical work attempting to solve the ‘algorithmic bias problem’. The paper – written by an interdisciplinary team of social and computer scientists - not only speaks to such concerns in an engaging and insightful way, but also offers a strong analytical framework that illuminates five “traps” that such technically-driven work often falls into:
  •          Framing: The authors begin by critiquing the ‘algorithmic framing’ common in data science. In such an abstraction, the focus of the data/computer scientist is simply on evaluating, for example, whether the model has high accuracy. They point out that such a framing is ineffective for addressing issues of bias and fairness. Expanding this algorithmic framing to a ‘data frame’ which also involves interrogating directly the data inputs and outputs for issues of bias and fairness, can address some of these issues, but also has its limitations. Instead they advocate data scientists adopting a socio-technical framing which explicitly recognises that any ML model is part of a socio-technical system – and we need to move all the decisions made by humans and human institutions into the abstraction boundary. I couldn’t agree more!
  •         Formalism: This is an important ‘trap’ for computer scientists and mathematicians to be aware of. It relates to the failure to account for the complexity of social concepts such as fairness, bias etc. The meaning of such concepts is contextual and contestable – they cannot be reduced to mathematical formalisms!
  •         The Ripple Effect: This ‘trap’ points to the lack of awareness that when technical solutions are embedded into existing social systems, they can impact upon the behaviours and values of those in the social system – often in unexpected ways.
  •         Portability: This ‘trap’ relates to the problems inherent in repurposing algorithmic solutions from one social context to another – and the inevitable problems of inaccuracy, misleading results, and potential for harm.
  •         Solutionism: And, finally - the simple observation that technologists often fail to recognise that the best solution to a problem may not involve any technology!

Putting some of these ideas into practice was my second favourite paper – which also won the prize for best technical and interdisciplinary paper - Disparate Interactions: An Algorithm-in-the-Loop Analysis of Fairness in Risk Assessments (Green and Chen, 2019).

With a focus on risk assessments being used in the US criminal justice system, the authors argue that given risk assessment tools do not actually make decisions, but are used to inform judges’ decisions, it is important to understand how people actually interpret and use the outputs of these tools. Their study is based on an experiment involving Amazon Mechanical Turk workers – rather than actual judges – however, their findings are concerning. They found their participants under-performed the risk assessment tool even when presented with the prediction of the tool; they were unable to effectively evaluate the accuracy of their own decisions or those of the tool; and, most concerning they exhibited biased interaction with the tool’s prediction whereby use of risk assessments in decision making led to participants making higher risk predictions for black defendants and lower risk predictions for white defendants. Clearly, these findings need examining ‘in the wild’, but they are concerning, and evidence the importance of a socio-technical framing as called for by Selbst et al.

While there were some excellent papers presented at FAT*, there were also a good few that fell into some of the ‘traps’ of abstraction identified in Selbst et al’s paper – and this resulted in some interesting commentary about the nature and direction of the field. For example, important questions were raised by Stanford PhD student Pratyusha Ria Kalluri, who drew upon Catherine D'Ignazio and Lauren Klein’s forthcoming book Data Feminism, to question the language of fairness, accountability and transparency, and how it relates to notions of justice. Her comments received a lot of support from attendees and online:


Pratyusha’s observations reflect many of my own – and others in the Critical Data Studies space - concerns about what it means to work across disciplinary boundaries in this field, and the politics of engaging in such work with people who may have very different agendas, assumptions, and understandings about what is at stake. It can sometimes be difficult to know how best to navigate these tensions – but it feels like 2019 could be an important moment for shaping the direction of the field.