Machine learning (ML) is increasingly being used to make important decisions in science, social sciences and engineering, with the potential to profoundly impact people’s lives. It is important to ensure that the probabilistic ML output is useful for the stated purposes of the users.
Probabilistic machine learning methods are becoming increasingly powerful tools for data analysis. However, math is only part of the puzzle in determining their accuracy and effectiveness.
To address this problem, a team of researchers has developed a classification system known as a “taxonomy of trust.”
The following list identifies the points in the data analysis process where confidence can be lost: Analysts choose which models, or mathematical representations, best represent the actual problem or question they are trying to solve. They select algorithms that fit the model and use code to run those algorithms. Each of these steps poses unique challenges to building trust. They also decide what data they collect. Building trust comes with specific challenges at each of these steps.
There are measurable ways to verify the accuracy of some components. A question that can be measured against objective standards is “Does my code contain bugs?”. Analysts face several strategies for collecting data and determining whether a model accurately represents the real world when problems are more subjective and lack obvious solutions.
The team aims to highlight issues that have already been thoroughly investigated and that require additional attention.
MIT computer scientist Tamara Broderick said: “What I like about making this taxonomy is that it really shows what people focus on. Much research naturally focuses on this level of ‘do my algorithms solve a particular mathematical problem?’ partly because it is very objective, even if it is a difficult problem. I think it’s very hard to answer ‘is it reasonable to mathematize an important applied problem in a certain way?’ because somehow it gets into a more difficult space; it’s not just a math problem anymore.”
The researchers’ categorization of breach of trust is rooted in real-world application, even if it may seem abstract. Meager, a co-author of the paper, explored whether microfinance can benefit the community. The project served as a case study for reducing the risk of trust failure in various situations.
Analysts must define a positive outcome, such as the average financial profit per company in communities where a microfinance program is implemented, to measure the impact of microfinance.
Analysts must assess whether specific case studies can reflect broader trends to put the data into context. It is also critical to put the available data into context. For example, owning goats can be considered an investment in rural Mexico.
Finally, they must define the real problems they hope to solve.
Analysts should define what they consider a positive outcome when evaluating the benefits of microfinance. For example, in economics, measuring the average financial profit per company in communities where a microfinance program has been implemented is standard practice. However, reporting an average can imply a net positive effect, even if only a few people benefit rather than the entire community.
He said. “It is difficult to measure an individual’s quality of life. People measure things like, “What is the small company’s business profit?” Or ‘What is the consumption level of a household?’ There is a chance of a discrepancy between what you ultimately find really important and what you measure. Before we get to the mathematical level, what data and assumptions do we rely on?“
The researcher said, “What you wanted was for many people to benefit from it. It sounds simple. Why haven’t we measured what we care about? But I think it’s common for practitioners to use standard machine learning tools for many reasons. And these tools can report a proxy that doesn’t always match the amount of interest.”
He added, “Someone may be hesitant to try a non-standard method because they are less confident that they will use it correctly. Or peer review may favor certain known methods, even if a researcher would like to use non-standard methods. There are many reasons, sociological. But this can be a confidence issue.”
While transforming a real-world problem into a model can be a big, amorphous problem, checking the code that executes an algorithm can feel “prosaic.” However, there is another area where trust can be built that is often overlooked.
In some cases, checking a coding pipeline that executes an algorithm can be considered outside the scope of an analyst, especially when standard software packages are available.
Testing whether code is reproducible is one way to find bugs. However, depending on the field, sharing code alongside published work is only sometimes required or the norm. As models become more complex over time, it becomes more difficult to create code from scratch. It becomes difficult to replicate a model.
The researcher said, “Let’s start with every diary asking you to release your code. Maybe it’s not quite double-checked and everything isn’t absolutely perfect, but let’s start there. as a step towards building trust.
The main findings from this text are that practitioners use standard machine learning tools for a variety of reasons and that checking the code that an algorithm executes is an often overlooked area where confidence can be strengthened. Broderick and Gelman collaborated on an analysis that predicted the 2020 U.S. presidential election using real-time state and national polls.
The team published daily updates in The Economist magazine and made their code available online for anyone to download and run. While there is no single solution to creating a perfect model, the researchers recognize that analysts can build confidence by testing code for reproducibility and sharing code alongside published work.
Broderick said, “I don’t think we expect all of these things to be perfect. but I think we can expect them to be or get better”
- Broderick, T., Zheng, et al. Towards a taxonomy of trust for probabilistic machine learning. Scientific progress. DOI: 10.1126/sciadv.abn3999