Cookies Policy

This site uses cookies. By continuing to browse the site you are agreeing to our use of cookies.

I accept this policy

Find out more here

From web page to mega-corpus: the CNN transcripts

Brill’s MyBook program is exclusively available on BrillOnline Books and Journals. Students and scholars affiliated with an institution that has purchased a Brill E-Book on the BrillOnline platform automatically have access to the MyBook option for the title(s) acquired by the Library. Brill MyBook is a print-on-demand paperback copy which is sold at a favorably uniform low price.

Access this chapter

+ Tax (if applicable)

Chapter Summary

This paper focuses on the technical and methodological issues involved in using data available on the internet as a basis for quantitative analyses of Present-day English. For this purpose, I concentrate on the creation of a specialized corpus of spoken data and outline the steps necessary to convert a large number of publicly available CNN transcripts into a format which is compatible with standard corpus tools. As an illustration of potential uses of such data, the second part of my paper then presents a sample analysis of the intensifier so. The paper concludes with a brief discussion of the advantages and limitations of this type of internet-derived data for corpus linguistic analysis.



Can't access your account?
  • Tools

  • Add to Favorites
  • Printable version
  • Email this page
  • Recommend to your library

    You must fill out fields marked with: *

    Librarian details
    Your details
    Why are you recommending this title?
    Select reason:
    Corpus Linguistics and the Web — Recommend this title to your library
  • Export citations
  • Key

  • Full access
  • Open Access
  • Partial/No accessInformation