Why artificial intelligence is good, but only as good as the data fed into it

By on
Why artificial intelligence is good, but only as good as the data fed into it
The world sits on the cusp of significant changes and benefits from the use of AI.
Photo by Hitesh Choudhary on Unsplash

Artificial intelligence (AI) is already proving its potential to deliver significant benefits to Australian businesses and the wider economy.

Organisations are increasingly investing in AI because they see its potential. In the 2021 federal budget, the Australian government committed to investing more than $120 million in AI over the next four to six years through programs including the development of the National Artificial Intelligence Centre ($53.8 million over four years) and the establishment of the Next Generation AI Graduates Program ($24.7 million over six years). The government has also committed to providing $33.7 million over four years to support projects to develop AI based solutions to national challenges, and $12 million over five years to catalyse AI opportunities by co-funding up to 36 competitive grants to develop AI solutions that address local or regional problems.

However, despite the increased investment in and use of AI across industries and businesses, there are lingering concerns over the technology’s capacity to deliver on expectations. According to our recent 2021 Digital Readiness Survey, more than 86 per cent of Australian and New Zealand-based organisations reported an increase in the use of AI from two years ago, but only 25 per cent said their confidence in AI had significantly increased.

The organisations surveyed voiced concerns of potential barriers in AI delivering on expectations, including the complexity of AI projects, the availability of employees possessing the required skills for AI projects, and a lack of internal expertise to develop AI. However, another major consideration is that AI models are really only as good as the data fed into them. This means that without access to clean, high-quality data, AI may fail to produce the expected results.

Starting fresh with AI

AI is incredibly powerful at turning data into insights. This power is applied to data regardless of quality or biases, which means that any unintentional biases found in the data will only be emphasised by the AI algorithms. This makes the quality of data the number one predictor of how successful an AI project will be. Data quality must be exceptional for an AI system to work even reasonably well because even with only slightly polluted data, AI may produce poor results. Getting high-quality data can be more challenging than it seems. The skills required to identify and use clean data and understand what constitutes clean data may differ depending on industry and use case.

For example, an IT team may use AI technologies to help identify network outages before they occur. If the average IT system has an uptime of 99 per cent, then it’s the data that pertains to the other one per cent of the time that is important for training purposes. So, the data set that is fed into the model must be tuned to identify that target one per cent. Getting the right data into the model can be challenging, but it's essential.

This creates an urgent need for professionals working with AI technologies to understand how to identify clean, relevant data and know what clean data would be based on their needs.

On a fundamental level, clean data should not include any personally identifiable information or other sensitive data. This isn’t just important to protect people’s privacy; some of this data could skew the model and result in biased decisions. The type of data that can pollute an AI model can include demographic data, names, years of experience, and known anomalies.  

Therefore, to help ensure data is clean and appropriate for use, data should have parameters removed that are not relevant to the final classification. For example, within hospitals and healthcare, data that includes patient identification information and certain diseases may result in a false correlation between demographic information and disease patterns. To avoid this, data scientists must make the parameters of the project clear and ensure only relevant data is included in the model.

Clean data clears the path for AI

With such significant investment in AI technologies, it’s essential that the tools being used and produced provide a great return on investment. The skills, experience, and know-how of the IT professionals and data scientists involved in AI projects can determine how successful those projects will be and, therefore, whether they'll deliver a strong return. Being able to ensure the data used for AI models is clean and relevant is a key skill that's required before starting an AI project.  

IT professionals and data scientists must be able to not only identify and provide clean data but also understand how to identify results that have been skewed by biased data. They can then retrain the model using more appropriate data, leading to ever-improving results from AI projects.

The world sits on the cusp of significant changes and benefits from the use of AI. It’s up to the data scientists and IT professionals driving these projects to ensure that outcomes are unbiased and clear-eyed.

Ramprakash Ramamoorthy is director of research at ManageEngine.

Copyright © BIT (Business IT). All rights reserved.

Most Read Articles


What would you like to see more of on BiT?
How To's
Photo Galleries
View poll archive

Log In

  |  Forgot your password?