Language scientists and developers need corpora, but licensing them (no, not the linguists) can be costly. Either they are too expensive, or there are heavy restrictions on making versions or using them for commercial purposes. Or they add a heavy administrative overhead for gaining permission from all parties involved. In an effort to make it easier to build up corpora from existing web resources, Björn Lindström in Uppsala has come up with a ‘Creative Commons for Corpus Consruction’ which basically collects and parses web pages to check the metadata to see if they have an appropriate Creative Commons license. He’s found that the amount of material on the web licensed under Creative Commons licenses is “more than enough to build a large corpusâ€. The next step is doing something interesting with the corpus.