Ilya Sutskever Warns A.I. Is Running Out of Data—Here’s What Will Happen Next

"We've achieved peak data and there will be no more," said the OpenAI co-founder.

Dec 16, 2024 - 21:24
 0
Ilya Sutskever Warns A.I. Is Running Out of Data—Here’s What Will Happen Next
A man in a gray suit raising his right arm.A man in a gray suit raising his right arm.

Ilya Sutskever, an OpenAI co-founder and the company’s former chief scientist, has played a key role in ushering in the technology’s most pivotal breakthroughs. Underpinning such developments were troves of text, images and videos scraped from the Internet to train A.I. models and enhance their capabilities—data that will soon run out, according to the researcher.

“Data is the fossil fuel of A.I.,” said Sutskever while speaking at the Conference on Neural Information Processing Systems (NeurIPS) in Vancouver on Dec. 13. “We’ve achieved peak data and there will be no more.” This means that pre-training—the process of feeding models with mass amounts of information—”will unquestionably end,” added the researcher, who noted that A.I. developers are already looking into alternative solutions like synthetic data or models that improve responses by taking longer to think about potential answers.

Sutskever, 38, first made a name for himself in 2012 when he helped develop the convolutional neural network architecture AlexNet. He also helped establish OpenAI in 2015 and oversaw the ChatGPT-maker’s research efforts before departing earlier this year to launch his own startup, Safe Superintelligence. The startup recently raised $1 billion from investors like Andreessen Horowitz and Sequoia Capital.

What’s next for A.I.?

While factors like compute power and algorithms—key aspects for A.I. model training—have continued to improve, data simply cannot keep pace, Sutskever said. “We have but one internet.”

Data-hungry A.I. developers, which have already sucked up mass amounts of online information from the internet, are starting to hit roadblocks from website owners. Between 2023 and 2024, 5 percent of all data and 25 percent of data from the highest quality sources were restricted across major A.I. datasets, according to a study from the Data Provenance Initiative.

As their data wells run dry, A.I. leaders are desperately searching for pre-training replacements. Synthetic data, or data generated by A.I. models themselves, have been suggested as a solution by the likes of OpenAI CEO Sam Altman. Altman has also pointed to the reasoning capabilities of the company’s new o1 model, which thinks through various responses before answering queries, as a roadmap to improving A.I. capabilities in the future.

As they gain enhanced reasoning capabilities, A.I. systems will become more “agentic,” said Sutskever, echoing the belief of other tech leaders that autonomous A.I. agents are the field’s next big focus. Through his startup, Sutskever himself is currently focused on achieving a safe form of “superintelligence,” a type of A.I. that thinks, reasons and can surpass human intelligence.

A.I. that learns to reason and think on its own will inevitably give way to less predictable behavior from models. Such behavior can already be seen in chess A.I. models, said Sutskever, which “are unpredictable to the best human chess players.”

Future A.I. systems “will understand things from limited data, they will not get confused,” said Sutskever. “I’m not saying how, by the way, and I’m not saying when—I’m saying that it will.”

What's Your Reaction?

like

dislike

love

funny

angry

sad

wow

CryptoFortress Disclosure: This article does not represent investment advice. The content and materials featured on this page are for educational purposes only.