Face Alignment by Mobilenetv2
Face Alignment by MobileNetv2. Note that MTCNN is used to provided the input boundingbox. You need to modify the path of images in order to run the demo.
The most important part of the mobilenet-v2 network is the design of bottleneck. In our experiments, we crop the face image by the boundingbox and resize it to , which is the input size of the network. Based on this, we can design the structure of our customized mobilenet-v2 for facial landmark lacalization. Note that the receptive field is a key factor to the design of the network.
Note that this structure mainly has two features:
The training data including:
Data augmentation is important to the performance of face alignment. I have tried several kinds of data augmentation method, including:
The performance on 300W is not good enough. May be I need to try more times. If you have any ideas, please contact me or open an issue.
Method | Input Size | Common | Challenge | Full set | Training Data |
---|---|---|---|---|---|
VGG-Shadow(With Dropout) | 70 * 60 | 5.66 | 10.82 | 6.67 | 300W |
Mobilenet-v2-stage1 | 64 * 64 | 6.07 | 10.60 | 6.96 | 300W and Menpo |
Mobilenet-v2-stage2 | 64 * 64 | 5.76 | 8.93 | 6.39 | 300W and Menpo |
Dataset | Number of images for training |
---|---|
300-W | 3148 |
Menpo | 12006 |
The ground truth landmarks is donated by white color while the predicted ones blue.
The pre-train models can be downloaded from baiduyun or GoogleDisk.
I write a demo to view the alignment results. Besides, the yaw, row and pitch parameters are estimated by the predicted landmarks. To run the domo, please do:
To use my code to reproduce the results, you need to use my caffe. I have added some useful layers.