Computational Lexical Semantics at ESSLLI 09

Just another weblog

3_Case study

During the course, you will gain some practical experience with a case study based on Task 4 of the SemEval 2007 competition. You will:

  • manually annotate data items from the task;
  • perform Machine Learning experiments using features kindly provided by Roser Morante (Hendricx et al. 2007).

1. Annotation.

  • read the guidelines.
  • download the file with the data to annotate.
  • do the annotation, either on a spreadsheet (excel, openoffice, csv format) or on a simple text file, making sure that you use one line per label, e.g. as follows:


2. Machine Learning experiments.

This time it is up to the computer to “annotate”, that is, to be able to distinguish between positive and negative examples of the Content-Container relation. We will be using Weka, a toolkit for Machine Learning experiments. To use it, please:

0. Download and decompress the datasets for the Content-Container relation, kindly provided by Roser Morante (reference below). There is a README in the tgz file.

1. Install Weka from Note that:
– we will be using version 3.6 (“Stable GUI version”);
– if your computer does not have Java 1.5, you can install the version that includes it.

2. Initialize it.

3. Open the ‘Explorer’. From this interface:
– open the file content-container-train-v1.arff.
– try some machine learning on it under ‘Classify’. For instance, algorithm ‘J48’ under option ‘trees’.


Girju, Roxana, Preslav Nakov, Vivi Nastase, Stan Szpakowicz, Peter Turney and Deniz Yuret (2009). Classification of semantic relations between nominals. Language Resources and Evaluation, 2009, 43(2), 105-121. [Overview of the whole SemEval task: how the dataset was built, what kinds of systems participated, etc.]

Hendricx, I., R. Morante, C. Sporleder, and A. Van den Bosch (2008). Machine learning of semantic relations with shallow features and almost no data. Proceedings of the 4th International Workshop on Semantic Evaluations (SEMEVAL), Prague, Czech Republic, pp. 187-190. [Paper by the authors of the dataset that we will be using.]

Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s

%d bloggers like this: