中科院模式识别第三次(第五节)_作业_答案_更多

来源：年旅网

第5章：线性判别函数

第一部份：计算与证明

1．有四个来自于两个类别的二维空间中的样本，其中第一类的两个样本为(1,4)T和(2,3)T，

第二类的两个样本为(4,1)T和(3,2)T。那个地址，上标T表示向量转置。假设初始的权向量a=(0,1)T，且梯度更新步长数g(y)=aTy的权向量。解：

第一对样本进行标准化处置。将第二类样本更改成(4,1)T和(3,2)T. 然后计算错分样本集：

g(y1) = (0,1)(1,4)T = 4 > 0 (正确) g(y2) = (0,1)(2,3)T = 3 > 0 (正确) g(y3) = (0,1)(-4,-1)T = -1 < 0 (错分) g(y4) = (0,1)(-3,-2)T = -2 < 0 (错分) 因此错分样本集为Y={(-4,-1)T , (-3,-2)T }.

接着，对错分样本集求和：(-4,-1)T+(-3,-2)T = (-7,-3)T

第一次修正权向量a，以完成一次梯度下降更新：a=(0,1)T+ (-7,-3)T=(-7,-2)T 再次计算错分样本集：

g(y1) = (-7,-2)(1,4)T = -15 < 0 (错分) g(y2) = (-7,-2)(2,3)T = -20 < 0 (错分) g(y3) = (-7,-2)(-4,-1)T = 30 > 0 (正确) g(y4) = (-7,-2)(-3,-2)T = 25 > 0 (正确)

k固定为1。试利用批处置感知器算法求解线性判别函

因此错分样本集为Y={(1,4)T , (2,3)T }.

接着，对错分样本集求和：(1,4)T+(2,3)T = (3,7)T

第二次修正权向量a，以完成二次梯度下降更新：a=(-7,-2)T+ (3,7)T=(-4,5)T 再次计算错分样本集：

g(y1) = (-4,5)(1,4)T = 16 > 0 (正确) g(y2) = (-4,5)(2,3)T = 7 > 0 (正确) g(y3) = (-4,5)(-4,-1)T = 11 > 0 (正确) g(y4) = (-4,5)(-3,-2)T = 2 > 0 (正确)

现在，全数样本均被正确分类，算法终止，所得权向量a=(-4,5)T。

2．在线性感知算法中，试证明引入正余量b以后的解区(aTyib)位于原先的解区当中

(aTyi>0)，且与原解区边界之间的距离为b/||yi||。

证明：设a*知足aTyib,那么它必然也知足aTyi>0，因此引入余量后的解区位于原先的解区aTyi>0当中。

注意，aTyib的解区的边界为aTyi=b,而aTyi>0的解区边界为aTyi=0。aTyi=b与aTyi=0两个边界之间的距离为b/||yi||。（因为aTyi=0过坐标原点，相关于坐标原点到aTyi=b的距离。）

3．试证明感知器准那么函数正比于被错分样本到决策面的距离之和。

证明：感知器准那么函数为： J(a)T(ay) yY 决策面方程为aTy=0。当y为错分样本时，有aTy0。现在，错分样本到决策面的距离为aTy/||a||。所有样本到决策面的距离之和为

r(aTy)ayY

结论得证。

4．关于多类分类情形，考虑one-vs-all技术，即构建 c 个线性判别函数：

gTi(x)wixwi0,i1,2,...,c，现在的决策规那么为：对 j i, 若是 gi(x) > gj(x), x 那么被分类二维空间内的模式分类器，其判别函数为

g1(x) = x1 + x2

g2(x) = x1 + x2 1

g3(x) =

试画出决策面，指出为何现在不存在分类不确信性区域。解：依照上述决策规那么，属于第一类

1的区域应知足：

g1(x) > g2(x) 且g1(x) > g3(x)

因此

1的决策界面为：

g1(x) g2(x) = 2x1 + 1 = 0。 g1(x) g3(x) = x1 + 2x2 = 0。

一样地，属于第二类

2的区域应知足：

g2(x) > g1(x) 且g2(x) > g3(x)

因此

2的决策界面为：

g2(x) g1(x) = 2x1 1 = 0。 g2(x) g3(x) = x1 + 2x2 1 = 0。

属于第三类

3的区域应知足：

g3(x) > g1(x) 且g3(x) > g2(x)

i 类。现有三个

因此2的决策界面为：

g3(x) g1(x) = x1 2x2 = 0。 g2(x) g3(x) = x1 2x2 + 1 = 0。

以下图给出了决策边界：

g1(x) g2(x) =2x1 + 1 = 0 g2(x) g3(x) 2类判别区域 g3(x) g1(x)

=x1 2x2 = 0

=x + 2x 1 = 12

3类判别区域

1类判别区域由于三个决策边界交于一点，因此，不存在不确信性区域。这是因为直线g1(x)g2(x)=0与直线g1(x)g3(x)=0的交点必然位于 g1(x)g2(x)

(g1(x)g3(x)) =

g2(x)g3(x) =0的直线上，即g2(x)g3(x) =0过它们的交点。

5．已知模式样本集：

1 = {(0,0)

T, (1,1)T}, 2 = {(0,1)T, (1,0)T}。采纳误差平方准那么算法

（即Ho-kashyap算法）验证它是线性不可分的。（提示：迭代时b=(1,1,1,1)T）

k固定取1,初始

解：第一对第二类样本，进行齐次表示，然后再进行标准化表示，取得如下标准化增广训练数据矩阵：

001111 Y01110122221T1TY的伪逆矩阵为：Y(YY)Y2222

43111进行第一次迭代a=Y+b=(0,0,0)T 计算误差e=Ya-b=(-1,-1,-1,-1) T

现在，没必要再更新b即可明白不等式组Ya>0无解。因为e中部份元素为负（现在全为负）。依照Ho-kashyap算法相关（收敛性）原理，可知原样本集线性不可分。

6. Consider the hyperplane used in discrimination:

(a) Show that the distance from the hyperplane g(x) = wTx + w0 = 0 to the point

xa is |g(xa)|/||w|| by minimizing ||xxa||2 subject to the constraint g(x) = 0. （提示需要证明两点：其一，点xa到超平面g(x) = 0的距离为|g(xa)|/||w||；其二，该距离是位于超平面g(x) = 0上使目标函数||xxa||2最小的点x到点xa的距离。） (b) Show that the projection of xa onto the hyperplane is given by (即证明点xa到超平面g(x) = 0的投影xp为如下公式)：

xpxa证明

g(xa)w ||w||2

注意，在以下表达中，x要换成xa

(b) 依照对(a)的证明的第二个公式，结论显然成立。

第二部份：运算机编程题

本章所利用的数据：

1．Write a program to implement the “batch perception” algorithm (see page 44 or 45 in PPT).

(a). Starting with a = 0, apply your program to the training data from and

2. Note that the number of iterations required for convergence（即记录下

收敛的步数）。

(b). Apply your program to the training data from that the number of iterations required for convergence.

(c). Explain the difference between the iterations required in the two cases. 2. Implement the Ho-Kashyap algorithm and apply it to the training data from

1 and

3. Repeat to apply it to the training data from

2 and

4. Point out the

3 and

2. Again, note

training errors, and give some analyses.

3. Consider relaxation methods as described in the PPT. (See the slides for the \"Batch Relaxation with Margin\" algorithm and page 62 in PPT for the \"Single Sample Relaxation with Margin\" algorithm):

(a) Implement the batch relaxation with margin, set b = and initialize a = 0, and apply it to the data in

1 and

3. Plot the criterion function as a function of the

number of passes through the training set.

(b) Repeat for b = and a0 = 0 (namely, initialize a = 0). Explain qualitatively any differences you find in the convergence rates.

(c) Modify your program to use single sample learning. Again, Plot the criterion function as a function of the number of passes through the training set.

因篇幅问题不能全部显示，请点此查看更多更全内容

查看全文