METU-Sabanci Turkish Treebank

METU-Sabanci Turkish Treebank is a morphologically and syntactically annotated treebank corpus of 7262 grammatical sentences. The sentences are taken form METU Turkish Corpus. The percentages of different genres in METU-Sabanci Turkish Treebank and METU Turkish Corpus were kept the similar. The structure of METU-Sabanci Turkish Treebank is based on XML. The distribution of the treebank also includes a user guide, a display program and related publications.

Turkish is an agglutinative language with free word order. Therefore, a dependency scheme was chosen to handle such a structure. Dependency links are put from words to inflectional groups of words.

The structure of METU-Sabanci Turkish Treebank is based on XML. Paragraphs, sentences and words are tagged by <Set>, <S> and <W> tags respectively. There are different attributes for each of the tags which hold information about number of sentences, number of words, morphological analyses, and dependency relations (For detailed information see the user guide).

 

The complete METU-Sabanci Turkish Treebank is available to researchers around the world, free of charge for research purposes only. The distribution of the treebank also includes a user guide, a display program and related publications. In order to get the treebank, fill in the METU-Sabanci Turkish Treebank user agreement form (click for the English version), sign it, scan it and e-mail to corpus@ii.metu.edu.tr. You may also fax the signed form to +90 312 210 3745, and simultaneously send a notice to corpus@ii.metu.edu.tr unless you have the option to scan the form. We prefer the first way and will be able to reply faster in that case.