Corpus Homework Assignment

In thematic groups of 3-4, outline an example corpus tagset that fits the common theme of your group (i.e. envisage a corpus that would be useful to all members or at least have all members contribute an idea) and then use it to create a corpus document. You will find an example corpis docutment on the FTP.

Follow these steps:

  • establish the common theme of your hypothetical corpus (e.g. learner corpus, corpus of metaphors, parallel translation corpus, conversation analysis/pragmatics corpus, contrastive analysis corpus, error analysis corpus, etc.);
  • consider what the tags of your mark-up will/would/should be;
    • propose some attributes for the tags;
    • propose some values for the attributes;
  • consider what kind of metadata to include in your document;
  • find a short text and create an example corpus document in XML format (any dialect) using your annotation scheme;
  • note any and all problems that you encounter (in order to discuss them later in class);
  • submit your document.

Formal considerations:

  • the final deadline for this assignment is 11/12/2017;
  • only .xml documents will be accepted (any number of editors can be used for this purpose, including, but not limited to, the free Notepad++ or the paid oXygen XML);
  • your document must contain an explicit header and explicit body;
  • this is a teamwork exercise; hence, any group smaller than 3 or larger than two will receive a slight penalty (-10%) for every participant that deviates from the norm; the only exception that applies to this rule is the instance wherein we ran out of people due to group asymmetry;
  • there will obviously be a difference in proportions between corpus-based/corpus-driven and corpus-informed studies, so be sure to indicate what kind of study is your hypothetical corpus intended for;
  • when testing your annotation scheme, I will attempt to perform query searches using your tagset (as appropriate, considering your group’s theme).