[Home] [Query] [EMEA] [EUconst] [Europarl v3] [Europarl v2] [KDE4] [KDE] [KDEdoc] [OpenOffice] [OpenSubtitles] [PHP] [WordAlignDB] [Tools]

OPUS - an open source parallel corpus

OPUS is an attempt to collect translated texts from the web, to convert and align the entire collection, to add linguistic annotation, and to provide the community with a publicly available parallel corpus. OPUS is based on open source products and is also delivered as an open source package. We used several tools to compile the current corpus. (Manual corrections have not been made.)

The OPUS collection is growing! Check this page from time to time to see new data arriving ...
Contributions are very welcome! Please contact j.tiedemann@rug.nl

News

Search & Browse Tools

Downloads & Samples:



Publications

Jörg Tiedemann, to appear
News from OPUS - A Collection of Multilingual Parallel Corpora with Tools and Interfaces [pdf]
To appear in N. Nicolov and K. Bontcheva and G. Angelova and R. Mitkov (eds.) Recent Advances in Natural Language Processing (vol V), John Benjamins, Amsterdam/Philadelphia
Jörg Tiedemann, Lars Nygaard, 2004
The OPUS corpus - parallel & free. [pdf]
In Proceedings of the Fourth International Conference on Language Resources and Evaluation (LREC'04). Lisbon, Portugal, May 26-28.
Jörg Tiedemann, to appear
OPUS - an open source parallel corpus. [pdf]
In Proceedings of the 13th Nordic Conference on Computational Linguistics, University of Iceland, Reykjavik, 2003.
Jörg Tiedemann, 2007,
Building a Multilingual Parallel Subtitle Corpus. [pdf]
In Proceedings of CLIN 17, Leuven, Belgium, 2007.
Jörg Tiedemann, 2007,
Improved Sentence Alignment for Movie Subtitles. [pdf]
In Proceedings of RANLP '07, Borovets, Bulgaria, 2007.