====== Suggested Notation for Machine Learning ====== ===== Data & Domains ===== ^ Symbol ^ Meaning ^ | $\mathbf{x}$ | Input instance (usually $\in \mathbb{R}^d$) | | $\mathbf{y}$ | Output / Label (usually $\in \mathbb{R}^{d_\text{o}}$) | | $\mathbf{z}$ | Example pair $(\mathbf{x}, \mathbf{y})$ | | $d$ | Input dimension | | $d_{\text{o}}$ | Output dimension | | $n$ | Number of samples | | $\mathcal{X}$ | Instance domain (set) | | $\mathcal{Y}$ | Label domain (set) | | $\mathcal{Z}$ | Example domain ($\mathcal{X}\times\mathcal{Y}$) | | $\mathcal{D}$ | Distribution over $\mathcal{Z}$ | | $S$ | Dataset sample $\{(\mathbf{x}_i,\mathbf{y}_i)\}_{i=1}^n$ | ===== Functions & Models ===== ^ Symbol ^ Meaning ^ | $\mathcal{H}$ | Hypothesis space | | $f_{\mathbf{\theta}}$ | Hypothesis function (Model) $f: \mathcal{X}\to\mathcal{Y}$ | | $\mathbf{\theta}$ | Set of model parameters | | $f^*$ | Target function (Ground truth) | | $\sigma$ | Activation function (e.g., ReLU, sigmoid) | | $\ell$ | Loss function $\ell(f_{\mathbf{\theta}}(\mathbf{x}), \mathbf{y})$ | ===== Training & Complexity ===== ^ Symbol ^ Meaning ^ | $L_S(\mathbf{\theta})$ | **Empirical Risk** (Training Loss) on set $S$ | | $L_\mathcal{D}(\mathbf{\theta})$ | **Population Risk** (Expected Loss) | | $\eta$ | Learning rate | | $B$ | Batch set | | $|B|$ | Batch size | | $\text{GD}$ | Gradient Descent | | $\text{SGD}$ | Stochastic Gradient Descent | | $\text{VCdim}(\mathcal{H})$ | VC-dimension of hypothesis class | | $\text{Rad}_S(\mathcal{H})$ | Rademacher complexity on $S$ | ===== Neural Network Specifics ===== ^ Symbol ^ Meaning ^ | $m$ | Number of neurons in a hidden layer | | $L$ | Total number of layers (excluding input) | | $\mathbf{w}_j, \mathbf{b}_j$ | Weights and bias for specific neuron $j$ | | $\mathbf{W}^{[l]}$ | Weight matrix for layer $l$ | | $\mathbf{b}^{[l]}$ | Bias vector for layer $l$ | | $f^{[l]}$ | Output of layer $l$ | | $\circ$ | Entry-wise operation (Hadamard product) | | $*$ | Convolution operation | ===== Key Formula Reference ===== **Empirical Risk:** $$ L_S(\mathbf{\theta})=\frac{1}{n}\sum^n_{i=1}\ell(f_{\mathbf{\theta}}(\mathbf{x}_i),\mathbf{y}_i) $$ **2-Layer Network:** $$ f_{\mathbf{\theta}}(\mathbf{x})=\sum^m_{j=1}a_j\sigma(\mathbf{w}_j\cdot\mathbf{x}+b_j) $$ **General Deep Network (Recursive):** $$ f^{[l]}_{\mathbf{\theta}}(\mathbf{x})=\sigma\circ(\mathbf{W}^{[l-1]}f^{[l-1]}_{\mathbf{\theta}}(\mathbf{x})+\mathbf{b}^{[l-1]}) $$ //Credit: Adapted from [[https://github.com/mazhengcn/suggested-notation-for-machine-learning|Suggested Notation for Machine Learning]]//