DotData's AI Builds Machine Learning Models All by Itself

Illustration of a laptop with a network on top of it, and the logo for the Python programming language coming out of it.
Illustration: Shutterstock

By: Tekla S. Perry

Demand for data scientists and engineers has, for the past couple of years, been off the charts. The number of openings for machine learning and data engineers posted on recruiting web sites continues to grow by double digits annually, and those working in the field have been commanding ever-higher salaries.

Joining the ranks of these desperately sought after techies takes serious coding chops, definitely expertise in Python, along with familiarity with other languages. That combination—of job openings for data engineers along with the dominance of Python, means Python regularly makes the charts of most in-demand coding languages.

So anyone contemplating a future in data science or machine learning needs to build up software engineering skills, right?

Wrong, says Ryohei Fujimaki, founder and CEO of dotData. Fujimaki has, for nearly a decade, been working to use AI to automate much of the job of the data scientist.

We can, he says, “eliminate the skill barrier. Traditionally, the job of building a machine learning model can only be done by people who know SQL and Python and statistics. Our system automates the entire process, enabling less experienced people to implement machine learning projects.”

DotData—which is currently offering its tools as a cloud-based service—came out of NEC. Fujimaki, then a research fellow at the company, started thinking about automating machine learning in 2011 as a way to make the 100 or so data scientists on his research team more productive. He got sidetracked for a few years, focused on commercializing an algorithm designed to make machine learning transparent, but in 2015 returned to the machine learning project.

“A typical use case for machine learning in the business world is prediction,” he said, “predicting demand of a product to optimize inventory, or predicting the failure of a sensor in a factory to allow preventive maintenance, or scoring a list of possible customers.”

“The first step in developing a machine learning model for prediction is feature engineering—looking at historical patterns and coming up with hypotheses,” he says. Feature engineering generally requires a team of people with a multitude of skill sets—data scientists, SQL experts, analysts, and domain experts. Typically, only after this team comes up with a set of hypotheses does machine learning step in, combining all those hypotheses to figure out how to best weigh them to come up with accurate predictions.

In dotData’s system, AI takes over that first step, coming up and testing its own hypotheses from a set of historical data.

So, he says, “you don’t need domain experts or data scientists, and as a subproduct AI can explore many more hypotheses than human experts—millions instead of hundreds in a limited time window.”

Fujimaki’s group at NEC in 2016 let Japan’s Sumitomo Mitsui Banking Corp. (SMBC) test a prototype against a team using traditional data science tools. “Their team took three months, our process took a day, and our results were better,” he says. NEC spun off the group in early 2018, remaining as a shareholder. Right now DotData has about 70 employees, about 70 percent of those are engineers and data scientists, along with a few dozen customers, Fujimaki says.

“In the near future,” Fujimaki says, “80 percent of machine learning projects can be fully automated. That will free up the most skilled, computer-science-PhD-type of data scientists, to focus on the other 20 percent.”

Demand for data scientists overall won’t drop from what it is today, Fujimaki predicts, though the double-digit growth may slow. The job, however, will become more focused. “Data scientists today are expected to be superman, good at too many things—statistics, and machine learning, and software engineering.”

And a new role is likely to emerge, he predicts. “Call it the business data scientist, or the citizen data scientist. They aren’t machine learning people, they are more business oriented. They know what predictions they need, and how to use those predictions in their business. It will be useful for them to have basic knowledge of statistics, and to understand data structures, but they won’t need deep mathematical understanding or knowledge of programming languages.

“We can’t eliminate the skill barrier, but we can significantly lower it. And here will be many more potential people who will be able to do this.”

This article originally appeared in IEEE Spectrum on 5 March 2020.