Currently, the development of master's in law programs relies mainly on deploying advanced and expensive graphics processing units (GPUs) from the likes of Nvidia and Advanced Micro Devices in data centers for projects that handle large volumes of raw data, giving well-funded big tech companies and well-funded startups a major advantage, according to Yang. Entrance to the Hong Kong Polytechnic University's Hung Hom campus, where artificial intelligence scientist Yang Hong-hsia is a professor in the School of Computing. Photo: Sun Yong
Yang said she and her colleagues are proposing a “model-over-model” approach to LLM development, which entails a distributed paradigm in which developers train small models across thousands of specific domains, such as code generation, advanced data analysis and specialized AI agents.
These smaller models are then evolved into larger, more comprehensive LLMs (also called foundational models).Yang noted that this approach could potentially reduce the computational requirements at each stage of the LLM development.
Domain-specific models, typically limited to 13 billion parameters (a machine learning term referring to the variables present in an AI system during training that help establish how data prompts lead to desired outputs), can achieve performance on par with or better than OpenAI's latest GPT-4 model while using far fewer GPUs, around 64 to 128 cards.
This paradigm makes LLM development more accessible to university labs and small and medium-sized enterprises, Yang said. Evolutionary algorithms then evolve these domain-specific models, eventually building a comprehensive foundational model, he said.
The successful launch of such an LLM development in Hong Kong would be a major win for the city as the city strives to become a hub of innovation and technology.
Yang Hongxia, a leading artificial intelligence expert, previously worked on AI models at ByteDance, the U.S. company that owns TikTok, and at the University of Mok, the research arm of Alibaba Group Holding. Photo: Polytechnic University Hong Kong's dynamic atmosphere and access to AI talent and resources make it an ideal place to conduct research into this new development paradigm, Yang said. She added that Polytechnic President Teng Chin-hsien shared this vision.
Yang said her team has already verified that they can assemble small AI models that can outperform state-of-the-art LLMs in certain fields.
“There is a growing consensus in the industry that, with high-quality, domain-specific data and continuous pre-training, outperforming GPT-4/4V is well achievable,” she said. Multimodal GPT-4/4V, which analyzes user-provided image inputs, is the latest feature OpenAI has made broadly available.
Yang said the next step is to build a more comprehensive infrastructure platform to attract more talent to the AI community, with several releases by the end of this year or early next year.
“In the future, a few large cloud-based models will dominate, but smaller, cross-sectional models will also thrive,” she said.
A PhD holder from Duke University in North Carolina, Yang has published over 100 papers in top conferences and journals and holds over 50 patents in the U.S. and mainland China. He played a key role in the development of Alibaba's 10 trillion parameter M6 multi-modal AI model.