PAN Localization : Bahasa Indonesia Language Resources and Translation System

Automatic machine translation has the potential to make online content accessible in local Asian languages. However, the technology used for automatic machine translation - the statistical method - has not been well tested in Asian languages. The statistical method involves "training" a large amount of text in the source language and sentence-by-sentence translation in the target language. The system "learns" to align portions of text and apply them to new text using the parallel corpus.

This grant will allow an Indonesian team of researchers to develop 100 000 words of parallel text from a core English corpus, PENN Treebank, distributed by Linguistic Data Consortium at the University of Pennsylvania.

In addition to producing a working prototype for English-to-Bahasa-Indonesia machine translation, the project is expected to improve on existing work in the area of machine translation for Asian languages and develop expertise that will be transferred to the whole Pan Asia Networking Localization network (PANL10n).

Projet nᵒ

105009

État du projet

Fermé

Date de début

Lundi, mars 31, 2008

Date butoir

Mercredi, février 3, 2010

Durée

20 mois

Agent(e) responsable du CRDI

Ng Lee Hoon, Maria

Financement total

CAD$ 55,400

Pays

Extrême-Orient, Indonésie, Asie centrale, Asie du sud

Chargé(e) de projet

Sarmad Hussain

Institution

National University of Computer and Emerging Sciences

Pays d' institution

Pakistan

Site internet

http://www.nu.edu.pk