What is Image Captioning?
June 2019
Image Captioning is a software application package for converting an image such as a photo into a text caption for describing the main theme of the image. For example, if an image shows a dog leaping to catch a ball, then the ideal caption would be exactly ‘a dog leaping to catch a ball’.  In other words, this software application behaves like human by reading the digital version of an image and writing a line of text on screen or in a diary.  This capability suggests that computers can be trained to perform like human.   

The application is in the domain of machine learning because the conversion is based on computers learning from examples (images with captions) given to it during a long period of training.  Compared to human, computers are not as efficient in this learning process.   For example, a 2 years old child can identify a taxi or a bus by seeing the object once on parents’ or another child’s guidance and the child will tell the difference between a car and a tree.  A computer will need to be trained with many tens of thousands or even millions of images with human captions to learn object classification.   The good point about computers is that they do not forget and can work 24 hours a day and 365 days a year without complaints to classify objects once trained.  Computers have already surpassed human capabilities in 2015 to classify objects in images with one object.