Coverage by position
Word count at each positionPosition view
Letters at position 1 (from start)Letter profile
Distribution for A (from start)Reliability
Top position-specific letters
Entropy by position
Lower entropy = more predictable letter distributionZ-score outliers
Top deviations from expected frequencyNear-zero occurrences
Letters nearly absent at specific positionsVowel density by position
Percentage of vowels (a, e, i, o, u) at each positionTop-3 concentration by position
How much the top 3 letters dominate each positionMethod and definitions
Let position \(i \in \{1,\dots,31\}\) and letter \(\ell \in \{a,\dots,z\}\). Let \(n_i\) be the number of words long enough to have a letter at position \(i\), and let \(c_{i,\ell}\) be the number of those words whose letter at position \(i\) is \(\ell\).
count: \(c_{i,\ell}\)
probability: \(p_{i,\ell} = P(\ell \mid i) = \dfrac{c_{i,\ell}}{n_i}\)
baseline: overall letter frequency across all positions: \[ p_{\ell} = \frac{\sum_i c_{i,\ell}}{\sum_i n_i} \]
deviation: difference from baseline in percentage points: \[ \Delta_{i,\ell} = (p_{i,\ell} - p_{\ell}) \times 100 \]
lift: relative rate compared to baseline: \[ L_{i,\ell} = \frac{p_{i,\ell}}{p_{\ell}} \]
entropy: Shannon entropy at position \(i\): \[ H_i = -\sum_{\ell} p_{i,\ell}\log_2(p_{i,\ell}) \] Maximum is \(\log_2(26)\approx 4.70\) bits (uniform over 26 letters). Lower entropy = more predictable.
z-score: under a simple binomial null \(c_{i,\ell}\sim \mathrm{Binomial}(n_i,p_{\ell})\): \[ E_{i,\ell}=n_ip_{\ell},\quad \sigma_{i,\ell}=\sqrt{n_ip_{\ell}(1-p_{\ell})},\quad z_{i,\ell}=\frac{c_{i,\ell}-E_{i,\ell}}{\sigma_{i,\ell}} \] Roughly, \(|z|>3\) corresponds to two-tailed \(p\approx 0.0027\) under a normal approximation.
near-zero: almost absent relative to baseline: \[ \frac{p_{i,\ell}}{p_{\ell}} < 0.1 \]
vowel density: share of vowels at position \(i\): \[ V_i=\sum_{\ell\in\{a,e,i,o,u\}} p_{i,\ell} \]
top-3 concentration: combined probability of the 3 most common letters at position \(i\): \[ C_i=\sum_{k=1}^{3} p_{i,\ell_k} \] where \(\ell_k\) are the top-3 letters by \(p_{i,\ell}\) at that position.