一臺(tái)無(wú)師自通的電腦,在沒有任何人類專家輸入的前提下,成為了極其復(fù)雜的棋盤游戲圍棋的世界頂級(jí)高手。
DeepMind, Google’s artificial intelligence subsidiary in London, announced the milestone in AI less than two years after the highly publicised unveiling of AlphaGo, the first machine to beat human champions at the ancient Asian game. Details are published in the scientific journal Nature.
在高調(diào)推出AlphaGo不到兩年后,谷歌(Google)旗下位于倫敦的人工智能公司DeepMind宣布了人工智能(AI)技術(shù)的又一里程碑,AlphaGo是在這項(xiàng)古老的亞洲游戲上擊敗人類冠軍的第一臺(tái)機(jī)器。科學(xué)期刊《自然》(Nature)發(fā)表了相關(guān)細(xì)節(jié)。
Previous versions of AlphaGo learned initially by analysing thousands of games between excellent human players to discover winning moves. The new development, called AlphaGo Zero, dispenses with this human expertise and starts just by knowing the rules and objective of the game.
前幾代AlphaGo最初通過(guò)分析成千上萬(wàn)場(chǎng)優(yōu)秀人類玩家間的對(duì)決來(lái)發(fā)現(xiàn)制勝招數(shù)。新開發(fā)的AlphaGo Zero則根本不需要人類專長(zhǎng),只要知道游戲規(guī)則和目標(biāo)就可以投入游戲。
“It learns to play simply by playing games against itself, starting from completely random play,” said Demis Hassabis, DeepMind chief executive. “In doing so, it quickly surpassed human level of play and defeated the previously published version of AlphaGo by 100 games to zero.”
“它學(xué)游戲僅僅是通過(guò)跟自己玩,從完全的隨機(jī)玩游戲開始,”DeepMind首席執(zhí)行官杰米斯•哈薩比斯(Demis Hassabis)說(shuō)。“在玩的過(guò)程中,它很快就超過(guò)了人類的水平,并以100比0的戰(zhàn)績(jī)擊敗了在論文中介紹過(guò)的上一代AlphaGo。”
His colleague David Silver, AlphaGo project leader, added: “By not using human data in any fashion, we can create knowledge by itself from a blank slate.” Within a few days, the computer had not only learned Go from scratch but surpassed thousands of years of accumulated human wisdom about the game.
他的同事、AlphaGo項(xiàng)目負(fù)責(zé)人戴維•西爾弗(David Silver)補(bǔ)充稱:“我們不以任何方式使用人類數(shù)據(jù),就可以讓它從一塊白板創(chuàng)造知識(shí)。”在幾天時(shí)間里,AlphaGo不僅學(xué)會(huì)了下圍棋,而且還勝過(guò)了人類歷經(jīng)數(shù)千年在該游戲上累積的智慧。
The team developed a new form of “reinforcement learning” to create AlphaGo Zero, combining search-based simulations of future moves with a neural network that decides which moves give the highest probability of winning. The network is constantly updated over millions of training games, producing a slightly superior system each time.
該團(tuán)隊(duì)開發(fā)了一種新的“強(qiáng)化學(xué)習(xí)”形式來(lái)創(chuàng)造AlphaGo Zero,將基于搜索的未來(lái)走法模擬與神經(jīng)網(wǎng)絡(luò)相結(jié)合,決定如何出招才能獲得最高的獲勝概率。該網(wǎng)絡(luò)用數(shù)百萬(wàn)場(chǎng)培訓(xùn)游戲不斷更新,每次更新都會(huì)帶來(lái)稍稍增強(qiáng)的系統(tǒng)。
Although Go is in one sense extremely complex, with far more potential moves than there are atoms in the universe, in another sense it is simple because it is a “game of perfect information” — chance plays no part, as with cards or dice, and the state of play is defined entirely by the position of stones on the board.
盡管圍棋在某種層面上非常復(fù)雜,具有比宇宙中的原子更多的潛在走法,但從另一個(gè)層面來(lái)說(shuō)它也是簡(jiǎn)單的,因?yàn)樗且环N“完美信息的游戲”——它不會(huì)像撲克牌或骰子一樣與機(jī)會(huì)有關(guān),而且棋局完全由棋子的位置決定。
The game involves surrounding more territory than your opponent. This aspect of Go makes it particularly susceptible to the computer simulations on which AlphaGo depends. DeepMind is now examining real-life problems that can be structured in a similar way, to apply the technology.
下圍棋需要占據(jù)比對(duì)手更多的地盤。圍棋的這個(gè)特征讓它特別容易受到AlphaGo所依賴的計(jì)算機(jī)模擬的影響。DeepMind正在考慮將該技術(shù)應(yīng)用于那些能以類似方式結(jié)構(gòu)化的現(xiàn)實(shí)生活問題。
Mr Hassabis identified predicting the shape of protein molecules — an important issue in drug discovery — as a promising candidate. Other likely scientific applications include designing new materials and climate modelling.
哈薩比斯指出,它很有希望應(yīng)用于預(yù)測(cè)蛋白質(zhì)分子形狀-——藥物發(fā)現(xiàn)中的一個(gè)重要問題。其他可能的科學(xué)應(yīng)用包括設(shè)計(jì)新材料和氣候建模。