About Official Parallel Corpus


The United Nations has released its official Parallel Corpus, made up of manually translated documents, between the years of 1990 to 2014, in each of the UN’s six official languages: Arabic, English, Spanish, French, Russian and Chinese.

This official release by the United Nations marks the first time such high-quality parallel corpora are available in the public domain in Arabic and Russian. Progress in natural language research is driven by the availability of data, and particularly in the field of statistical machine translation (SMT), which thrives on large quantities of parallel text – original documents paired with their translations into a second or more languages. Typically, researchers count on multinational institutions such as the European Union, or governments of multilingual countries like Canada or Hong Kong.