|Compucon Roadmap 2019-2020|
Image Captioning Development Roadmap for 2019 – 2020
The immediate release in 2019-06 for Sampling is a model trained by Yueyuan up to 2018-06. The model was trained with nearly 100,000 images from Microsoft COCO dataset and reached a BLEU1 score of 0.651. The sampler for our peers consists of 100 images only and it is strictly not for commercial use. When the model is executed, it will create the captions of 10 images and display the captions and images for appraisal. The computer runtime should be in seconds. We will see that some captions are good while some are lousy- it is the state of the model.
The next Sampling release will have 2 different types of improvement. The first is that our peers are given the option of supplying up to 10 images for testing. Obviously some input data conditioning processing is required and this will take extra time. If the peer-supplied images have contents and styles different to the 100,000 images used for training, the captions are guaranteed to be poor as a reflection of the poor adaptability of the model to unseen images.
The second approach is an improved model with a peak BLEU1 score of close to 0.700. This model will obviously produce captions that are more sensible and less absurd.In early 2020, the model will be trained with video surveillance footage so that it can interpret footage for property owners. However, this model will stay in the sampling stage and will not be mature enough for commercial deployment.
Two further streams of exploration not directly related to the above model are in progress. One stream deals with natural language processing. The other stream deals with physics assisted machine learning method. They are in the domain of academic exploration.