corpus-linguistics

Community

Analyze language through corpora, fast.

Authorxjtulyc
Version1.0.0
Installs0

System Documentation

What problem does it solve?

This Skill helps you turn raw text corpora into measurable linguistic insights, like frequencies, key keywords, collocations, and concordance lines, so you can analyze language use quantitatively instead of manually.

Core Features & Use Cases

  • Corpus preprocessing with NLTK + spaCy: tokenization, optional lemmatization, POS tagging support, and named-entity extraction using spaCy pipelines.
  • Descriptive frequency analysis: unigram/bigram/trigram counts, type-token ratio, hapax legomena, and vocabulary statistics.
  • Key word and association discovery: log-likelihood keyword analysis against a reference corpus plus collocation scoring (PMI / log-likelihood / t-score) to surface meaningful word pairings.
  • KWIC concordance (Key Word In Context): produce context windows around a chosen keyword to inspect usage patterns.
  • Distributional semantics (word2vec): train word2vec models and compute cosine similarity to compare semantic neighborhoods.
  • Diachronic-ready workflows: structure outputs (per year/decade/partition) to track how lexical patterns and associations change over time.

Quick Start

Use the corpus-linguistics skill to preprocess your texts, then compute keywords and KWIC for a target term you care about against a reference corpus.

Dependency Matrix

Required Modules

None required

Components

Standard package

💻 Claude Code Installation

Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.

Please help me install this Skill:
Name: corpus-linguistics
Download link: https://github.com/xjtulyc/awesome-rosetta-skills/archive/main.zip#corpus-linguistics

Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
View Source Repository

Agent Skills Search Helper

Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.