Up: 011_data-and-disc-lect Prev: week2-what-is-discourse-analysis Next: week4-language-as-data-2

Language as Data 1

Online Lecture Notes:

  • Overview of “Data”
    • Etymology and Definition
      • Derived from the Latin word datum (something given).
      • Defined by the Oxford English Dictionary as collective information, often numerical, used for analysis, reference, or computation.
      • In computing: quantities, characters, or symbols processed collectively.
    • Public Perception of Data
      • Often linked to concepts like big data, AI, and digitization.
      • Popular images depict data as abstract, impersonal, and intimidating.
      • However, these representations often exaggerate the power of data systems.
  • Algorithms and Their Relationship to Data
    • Definition:
      • A set of rules or procedures for calculation or problem-solving.
      • Algorithms are implemented in programming languages (e.g., Python, Java) and executed as software components.
    • Interaction with Data:
      • Algorithms process input data to generate output, forming the backbone of software systems.
  • Contextualizing Data in Social and Technical Discourses
    • Two Approaches to Conceptualizing Data:
      1. Traditional View:
        • Data as “given” an objective, forming the basis for knowledge and decision-making.
        • Criticism: Data collection always involves selection and contextualization, making it a subjective process.
      2. Critical Perspective:
        • Data as “reinterpretation” or formalization of complex phenomena.
        • ISO/IEC defines data as a formalized representation suitable for communication and processing.
    • Implications:
      • Data is not neutral but shaped by social, cultural, and institutional contexts.
      • Collecting and interpreting data is a constructive process, influenced by perspectives and interests.
  • Critical Data Studies
    • Key Points:
      • Data reflects and reinforces power structures (e.g., corporate control, algorithmic biases).
      • Researchers emphasize that data is “cooked,” not “raw,” shaped by selection and recontextualization.
      • Examples of biased data systems:
        • Gender and racial biases in facial recognition.
        • Misguided word associations in language models (e.g., Man is to Computer Programmer as Woman is to Homemaker)
        • Algorithms optimized for efficiency, often ignoring individual and societal impacts.
  • Applications and Challenges
    • Data in Journalism and Politics
      • Data journalism analyzes structured information to uncover newsworthy stories (e.g., emergency services coverage, election analytics).
      • Data-driven campaigns, such as Barack Obama’s election strategy, illustrate the growing influence of big data in politics.
      • The misuse of data (e.g., disinformation) highlights its role as a form of power.
    • Ethical Concerns:
      • The need for iterative improvements and debugging of data systems.
      • Questions about fairness, transparency, and accountability in algorithmic decision-making.
      • Challenges in identifying and mitigating unintended harms caused by data systems.
  • Key Quotes and Insights
    • “Data is always reinterpreted information”
    • Algorithms often prioritize efficiency and profitability over individual well-being and societal health.
    • “Data are never raw but always cooked” — emphasizes the constructed nature of data.
  • Closing Thoughts

Lecture Notes

  • Two paths of conceptualisation of “data” as a term:
    1. Data is something given that can be captured and further processed. Data does not have a meaning by itself.
      • Criticique: Data gets always selected, perspectivated and contextualised when collected.
    2. Data is something that can be obtained by reduction, selection or formalization from unites with a higher degree of complexity. Data is always cooked, and is never raw.
  • IT perspective:
    • A reinterpretable representation of information in a formalized manner, suitable for communication, interpretation, or processing. Examples of data include a sequence of bits, a table of numbers, the characters on a page, the recording of sounds made by a person speaking, or a moon rock specimen.
  • In Digital Discourse Analysis:
    • A datum is a phenomenon interpreted as a sign, which in the course of a research process is extracted from a given complexion and recontextualized.
  • Critical Data Studies
    • “Data as a form of power”
      • Controlling data
      • Manipulating data
      • Gathering Data