AlphaGo Zero自學成才，輕易擊敗上一代AlphaGo

所屬教程:雙語閱讀

瀏覽:

2017年11月28日

手機版

掃描二維碼方便學習和分享

A self-taught computer has become the world’s best player of Go, the fiendishly complex board game, without any input from human experts.

一臺無師自通的電腦，在沒有任何人類專家輸入的前提下，成為了極其復雜的棋盤游戲圍棋的世界頂級高手。

DeepMind, Google’s artificial intelligence subsidiary in London, announced the milestone in AI less than two years after the highly publicised unveiling of AlphaGo, the first machine to beat human champions at the ancient Asian game. Details are published in the scientific journal Nature.

在高調(diào)推出AlphaGo不到兩年后，谷歌(Google)旗下位于倫敦的人工智能公司DeepMind宣布了人工智能(AI)技術的又一里程碑，AlphaGo是在這項古老的亞洲游戲上擊敗人類冠軍的第一臺機器?？茖W期刊《自然》(Nature)發(fā)表了相關細節(jié)。

Previous versions of AlphaGo learned initially by analysing thousands of games between excellent human players to discover winning moves. The new development, called AlphaGo Zero, dispenses with this human expertise and starts just by knowing the rules and objective of the game.

前幾代AlphaGo最初通過分析成千上萬場優(yōu)秀人類玩家間的對決來發(fā)現(xiàn)制勝招數(shù)。新開發(fā)的AlphaGo Zero則根本不需要人類專長，只要知道游戲規(guī)則和目標就可以投入游戲。

“It learns to play simply by playing games against itself, starting from completely random play,” said Demis Hassabis, DeepMind chief executive. “In doing so, it quickly surpassed human level of play and defeated the previously published version of AlphaGo by 100 games to zero.”

“它學游戲僅僅是通過跟自己玩，從完全的隨機玩游戲開始，”DeepMind首席執(zhí)行官杰米斯•哈薩比斯(Demis Hassabis)說。“在玩的過程中，它很快就超過了人類的水平，并以100比0的戰(zhàn)績擊敗了在論文中介紹過的上一代AlphaGo。”

His colleague David Silver, AlphaGo project leader, added: “By not using human data in any fashion, we can create knowledge by itself from a blank slate.” Within a few days, the computer had not only learned Go from scratch but surpassed thousands of years of accumulated human wisdom about the game.

他的同事、AlphaGo項目負責人戴維•西爾弗(David Silver)補充稱：“我們不以任何方式使用人類數(shù)據(jù)，就可以讓它從一塊白板創(chuàng)造知識。”在幾天時間里，AlphaGo不僅學會了下圍棋，而且還勝過了人類歷經(jīng)數(shù)千年在該游戲上累積的智慧。

The team developed a new form of “reinforcement learning” to create AlphaGo Zero, combining search-based simulations of future moves with a neural network that decides which moves give the highest probability of winning. The network is constantly updated over millions of training games, producing a slightly superior system each time.

該團隊開發(fā)了一種新的“強化學習”形式來創(chuàng)造AlphaGo Zero，將基于搜索的未來走法模擬與神經(jīng)網(wǎng)絡相結合，決定如何出招才能獲得最高的獲勝概率。該網(wǎng)絡用數(shù)百萬場培訓游戲不斷更新，每次更新都會帶來稍稍增強的系統(tǒng)。

Although Go is in one sense extremely complex, with far more potential moves than there are atoms in the universe, in another sense it is simple because it is a “game of perfect information” — chance plays no part, as with cards or dice, and the state of play is defined entirely by the position of stones on the board.

盡管圍棋在某種層面上非常復雜，具有比宇宙中的原子更多的潛在走法，但從另一個層面來說它也是簡單的，因為它是一種“完美信息的游戲”——它不會像撲克牌或骰子一樣與機會有關，而且棋局完全由棋子的位置決定。

The game involves surrounding more territory than your opponent. This aspect of Go makes it particularly susceptible to the computer simulations on which AlphaGo depends. DeepMind is now examining real-life problems that can be structured in a similar way, to apply the technology.

下圍棋需要占據(jù)比對手更多的地盤。圍棋的這個特征讓它特別容易受到AlphaGo所依賴的計算機模擬的影響。DeepMind正在考慮將該技術應用于那些能以類似方式結構化的現(xiàn)實生活問題。

Mr Hassabis identified predicting the shape of protein molecules — an important issue in drug discovery — as a promising candidate. Other likely scientific applications include designing new materials and climate modelling.

哈薩比斯指出，它很有希望應用于預測蛋白質(zhì)分子形狀-——藥物發(fā)現(xiàn)中的一個重要問題。其他可能的科學應用包括設計新材料和氣候建模。