Type: lecturenote

Up: 026_computational-linguistics-for-discourse-analysis Prev: week2-computational-discourse-analysis Next: week4-processing-and-parsing

Lecture Notes:

Ethics

  • Key Ethical Questions:
    • Consent: Should we always ask permission?
    • Privacy: how do we protect anonymity?
    • Representation: Whose voices are included/excluded?
    • Benefit: Who gains from this research?
    • Cultural Resources: What about indigenous language/discourse?
Risk AssessmentRequirement
LawfulnessProcessed lawfully, fairly, and in a transparent manner
Data MinimizationCollected for specified explicit and legitimate purposes
AccuracyAccurate and where necessary, kept up to date
Purpose LimitationsAdequate, relevant and limited to what is necessary
Storage LimitationRetained only for a time that is necessary
ConfidentialityProcessed in an appropriate manner to maintain security
AccountabilitySupported by the further principle of accountability to customers and employees

Corpus

  • A corpus is simply a collection of texts stored digitally.
    • Plus metadata — information about the texts:
      • Who wrote/translated it
      • When it was created
      • What language it’s in

How is it used?

Symbolic MethodsStatistical Learning
Formal logic & AIProbability theory & statistics
Handcrafted parsersFrequency analysis
Ontologies (WordNet)Collocation detection
Mostly disappearedMachine learning
Still used in semanticsNeural networks
Dominates the field

Collocation

  • Refers to words that frequently occur together in natural language. These are word combinations that appear more often than we would expect by chance.
    • Strong tea ↔ Powerful tea
    • Make a decision ↔ Do a decision
    • Heavy rain ↔ Strong rain
  • Usefulness:
    1. Natural language use: Native speakers use certain word combinations automatically.
    2. Language learning: Helps learners sound more natural.
    3. Translation: Different languages have different collocational patterns.
    4. Meaning: Words take on specific meanings when paired together.
  • Collocation analysis: How do we know if words appear together by chance or as meaningful patterns?