Math List
收集常用的公式,避免重复工作,并给出解释与相关领域
机器学习
- training data $(x,y)$
- neural network data $f_{\theta}$
- loss function $\mathbf{l}(f_{\theta}(x)_y,y)$
成员推理攻击
Given a target model $f$ and target sample $x$, the process of MIA can be defined as:
$\mathcal{A}:x,f \rightarrow {0,1}$calibrated membership score can be calculated using the following equation:
$s^{cal}(h,g,(x,y)) = s(h,(x,y))-s(g,(x,y))$A challenger $C$ trains a model $f_{\theta}$ using a dataset $D_train$ (which is a subset of a broader, universal dataset $D$) through a training algorithm $\mathcal{T}$. Then, the adversary $\mathcal{A}$ attempts to determine whether a specific data point $(x, y)$ from $D$ was included in $D_train$.
记忆的定义 $\mathrm{mem}(A,S,i):= \mathbb{E}{f\leftarrow A(S)}\left[\mathcal{M}\left(f(x_i),y_i\right)\right]-\mathbb{E}{f\leftarrow A(S\setminus i)}\left[\mathcal{M}\left(f(x_i),y_i\right)\right].$
数据集
For a dataset $\mathcal{T}={(x_i,y_i)}_{i=1}^m$,$\boldsymbol{x}_i\in\mathbb{R}^d$,$d$ is the dimension of the input data, and $y_i$ is the label.
we also define the loss between the prediction and ground truth
$\mathcal{R}{\mathcal{D}}(\boldsymbol{\theta})=\mathbb{E}{(\boldsymbol{x},y)\sim\mathcal{D}}\left[\ell\left(f_{\boldsymbol{\theta}}\left(\boldsymbol{x}\right),y\right)\right]$
$\mathcal{R}{\mathcal{T}}(\boldsymbol{\theta})=\mathbb{E}{(\boldsymbol{x},y)\sim\mathcal{T}}\left[\ell\left(f_{\boldsymbol{\theta}}\left(\boldsymbol{x}\right),y\right)\right]=\frac1m\sum_{i=1}^m\ell\left(f_{\boldsymbol{\theta}}\left(\boldsymbol{x}_i\right),y_i\right)$
梯度更新
$\theta^{(k+1)}=\theta^{(k)}-\eta\boldsymbol{g}_{\mathcal{T}}^{(k)},$formalizing dataset distillation
$\mathcal{S}={(s_j,y_j)}{j=1}^m$
$\mathbb{E}{\boldsymbol{\theta}^{(0)}\sim\mathbf{P}}\left[\ell\left(f_{\mathrm{alg}(\mathcal{T})}\left(\boldsymbol{x}\right),y\right)\right]\simeq\mathbb{E}{\boldsymbol{\theta}^{(0)}\sim\mathbf{P}}\left[\ell\left(f{\mathrm{alg}(\mathcal{S})}\left(\boldsymbol{x}\right),y\right)\right].$