Explore UAB

Mathematics Fast Track Program LEARN MORE

Statistics on Countable Alphabets

Jialin Zhang

When

November 10, 2023 | 2:30 p.m. – 3:30 p.m.
Refreshments provided

Where

University Hall 4002

Speaker

Jialin Zhang

Abstract

  1. Entropy estimation in Turing’s perspective is described. Given an iid sample from a countable alphabet under a probability distribution, Turing’s formula (introduced by Good (1953), hence also known as the Good-Turing formula) is a mind-bending non-parametric estimator of total probability associated with letters of the alphabet that are NOT represented in the sample. Some interesting facts and thoughts about entropy estimators are introduced.
  2. Turing’s formula brought about a new characterization of probability distributions on general countable alphabets that provides a new way to do statistics on alphabets, where the usual statistical concepts associated with random variables (on the real line) no longer exist. The new perspective, in turn, inspires some thoughts on the characterization of probability distribution when the underlying sample space is unclear. An application example of authorship attribution is provided.
  3. Inference regarding tail behavior remains a persistent challenge due to the rarity of tail observations. Additionally, it's not always feasible to assume a global form for a distribution function. Approaching this issue from an entropic standpoint, a method grounded in entropic basis and domains of attraction for countable alphabets is introduced, complete with its R implementation. An application involving the log-returns of Amazon stocks is presented at the end.