corpus-linguistics
CommunityAnalyze language through corpora, fast.
Education & Research#keyword extraction#spacy#corpus analysis#collocation mining#kwic concordance#word2vec#diachronic trends
Authorxjtulyc
Version1.0.0
Installs0
System Documentation
What problem does it solve?
This Skill helps you turn raw text corpora into measurable linguistic insights, like frequencies, key keywords, collocations, and concordance lines, so you can analyze language use quantitatively instead of manually.
Core Features & Use Cases
- Corpus preprocessing with NLTK + spaCy: tokenization, optional lemmatization, POS tagging support, and named-entity extraction using spaCy pipelines.
- Descriptive frequency analysis: unigram/bigram/trigram counts, type-token ratio, hapax legomena, and vocabulary statistics.
- Key word and association discovery: log-likelihood keyword analysis against a reference corpus plus collocation scoring (PMI / log-likelihood / t-score) to surface meaningful word pairings.
- KWIC concordance (Key Word In Context): produce context windows around a chosen keyword to inspect usage patterns.
- Distributional semantics (word2vec): train word2vec models and compute cosine similarity to compare semantic neighborhoods.
- Diachronic-ready workflows: structure outputs (per year/decade/partition) to track how lexical patterns and associations change over time.
Quick Start
Use the corpus-linguistics skill to preprocess your texts, then compute keywords and KWIC for a target term you care about against a reference corpus.
Dependency Matrix
Required Modules
None requiredComponents
Standard package💻 Claude Code Installation
Recommended: Let Claude install automatically. Simply copy and paste the text below to Claude Code.
Please help me install this Skill: Name: corpus-linguistics Download link: https://github.com/xjtulyc/awesome-rosetta-skills/archive/main.zip#corpus-linguistics Please download this .zip file, extract it, and install it in the .claude/skills/ directory.
Agent Skills Search Helper
Install a tiny helper to your Agent, search and equip skill from 471,000+ vetted skills library on demand.