HONG KONG/LONDON - China and the United States are unquestioned top dogs in the global artificial intelligence (AI) arena, with the scrap between them growing ever-fiercer by the day. Seemingly innocuous forces are, however, quietly tipping the scales. The idiosyncrasies of the Chinese language and China’s edge in data and unlimited, stable power resources are slowly but steadily forging the sword for the country’s eventual triumph in the AI melee.
These two elements - the one an heirloom of cultural wisdom, with the other two forming a fertile bed to spur the growth of AI - are undoubtedly mighty weapons for China to wield to win the future AI contest.
China’s data hoard: unearthing cultural gems
China has the world’s largest cohort of internet users, and the data traffic they generate each day is a river in full spate, much like the vast streams whose hydropower makes up almost one-fifth (and counting) of total national energy generation. Together these rich sources - endless data and boundless power - present a veritable cornucopia to nourish AI development. The richness and complexity of Chinese information provide a huge space for AI learning unrivaled in its length and breadth.
Chinese, an antediluvian language with profound cultural connotations and a unique evolutionary trajectory, also holds signal advantages in natural language processing:
“Text normalization is a method for standardizing text to prepare it for the tokenization, vectorization and classification steps. With [English], the first step would be to convert all text to lowercase. Because Chinese characters are not capitalized to begin with, there’s no need for that data cleaning step. Next comes stemming or lemmatization.Compared to English, there is also no concept of a stem in Chinese. Therefor
The content herein is subject to copyright by The Yuan. All rights reserved. The content of the services is owned or licensed to The Yuan. Such content from The Yuan may be shared and reprinted but must clearly identify The Yuan as its original source. Content from a third-party copyright holder identified in the copyright notice contained in such third party’s content appearing in The Yuan must likewise be clearly labeled as such.