Main Article Content
The handwritten text recognition problem is widely studied by the researchers of computer vision community due to its scope of improvement and applicability to daily lives. It is a sub-domain of pattern recognition. Due to advancement of computational power of computers since last few decades neural networks based systems heavily contributed towards providing the state-of-the-art handwritten text recognizers. In the same direction, we have taken two state-of-the art neural networks systems and merged the attention mechanism with it. The attention technique has been widely used in the domain of neural machine translations and automatic speech recognition and now is being implemented in text recognition domain. In this study, we are able to achieve 4.15% character error rate and 9.72% word error rate on IAM dataset, 7.07% character error rate and 16.14% word error rate on GW dataset after merging the attention and word beam search decoder with existing Flor et al. architecture. To analyse further, we have also used system similar to Shi et al. neural network system with greedy decoder and observed 23.27% improvement in character error rate from the base model.
Article Details
J. Almazan, A. Gordo, A. Fornes, and E. Valveny. Word spotting and recognition with embedded attributes. IEEE Transactions on Pattern Analysis and Machine Intelligence, 36(12):2552-2566, 2014. https://doi.org/10.1109/TPAMI.2014.2339814. (Crossref)
D. Bahdanau, K. Cho, and Y. Bengio. Neural machine translation by jointly learning to align and translate. In Proc 3rd Int Conf Learning Representations, ICLR 2015, San Diego, CA, 7-9 May 2015. Accessible in arXiv. https://doi.org/10.48550/arXiv.1409.0473.
R. E. Bellman and S. E. Dreyfus. Applied Dynamic Programming, volume 2050 of Princeton Legacy Library. Princeton University Press, 2015. https://doi.org/10.1515/9781400874651. (Crossref)
A.-L. Bianne-Bernard, F. Menasri, Al-Hajj M. R., et al. Dynamic and contextual information in HMM modeling for handwritten word recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(10):2066-2080, 2011. https://doi.org/10.1109/TPAMI.2011.22. (Crossref)
T. Bluche. Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. arXiv, 2016. arXiv:1604.08352. https://doi.org/10.48550/arXiv.1604.08352.
T. Bluche. Joint line segmentation and transcription for end-to-end handwritten paragraph recognition. In Advances in Neural Information Processing Systems 29 - Proc 30th Conf NIPS 2016, volume 29, pages 838-846, Barcelona, Spain, 5-10 Dec 2019. Curran Associates, Inc. https://proceedings.neurips.cc/paper/2016/file/2bb232c0b13c774965ef8558f0fbd615-Paper.pdf.
T. Bluche, J. Louradour, and R. Messina. Scan, Attend and Read: End-to-end handwritten paragraph recognition with MDLSTM attention. arXiv, 2016. arXiv:1604.03286. https://doi.org/10.48550/arXiv.1604.03286.
T. Bluche, J. Louradour, and R. Messina. Scan, Attend and Read: End-to-end handwritten paragraph recognition with MDLSTM attention. In Proc 2017 14th IAPR Int Conf Document Analysis and Recognition (ICDAR), pages 1050-1055, Kyoto, Japan, 9-15 Nov 2017. IEEE. https://doi.org/10.1109/ICDAR.2017.174. (Crossref)
T. Bluche, H. Ney, and C. Kermorvant. Tandem HMM with convolutional neural network for handwritten word recognition. In Proc 2013 IEEE Int Conf Acoustics, Speech and Signal Processing (ICASSP), pages 2390-2394, Vancouver, Canada, 26-31 May 2013. IEEE. https://doi.org/10.1109/ICASSP.2013.6638083. (Crossref)
K.-N. Chen, C.-H. Chen, and C.-C. Chang. Efficient illumination compensation techniques for text images. Digital Signal Processing, 22(5):726-733, 2012. https://doi.org/10.1016/j.dsp.2012.04.010. (Crossref)
W.-T. Chen, P. Gader, and H. Shi. Lexicon-driven handwritten word recognition using optimal linear combinations of order statistics. IEEE Transactions on Pattern Analysis and Machine Intelligence, 21(1):77-82, 1999. https://doi.org/10.1109/34.745738. (Crossref)
A. Chowdhury and L. Vig. An efficient end-to-end neural model for handwritten text recognition, 2018. arXiv:1807.07965v2. https://doi.org/10.48550/arXiv.1807.07965.
D. Coquenet, Y. Soullard, C. Chatelain, and T. Paquet. Have convolutions already made recurrence obsolete for unconstrained handwritten text recognition? In Proc 2019 Int Conf Document Analysis and Recognition Workshops (ICDARW), volume 5, pages 65-70, Sydney, NSW, Australia, 20-25 Sep 2019. https://doi.org/10.1109/ICDARW.2019.40083. (Crossref)
A. Das, J. Li, G. Ye, et al. Advancing acoustic-to-word CTC model with attention and mixed-units. IEEE/ACM Transactions on Audio, Speech, and Language Processing, 27(12):1880-1892, 2019. https://doi.org/10.1109/TASLP.2019.2933325. (Crossref)
A. F. de Sousa Neto, B. L. D. Bezerra, A. H. Toselli, and E. B. Lima. HTR-Flor: A deep learning system for offline handwritten text recognition. In Proc 2020 33rd SIBGRAPI Conference on Graphics, Patterns and Images (SIBGRAPI), pages 54-61, Porto de Galinhas, Brazil, 07-10 Nov 2020. https://doi.org/10.1109/SIBGRAPI51738.2020.00016. (Crossref)
A. Flor de Sousa Neto. handwritten-text-recognition. GitHub repository, 2020. https://github.com/arthurflor23/handwritten-text-recognition.
P. Doetsch, M. Kozielski, and H. Ney. Fast and robust training of recurrent neural networks for offline handwriting recognition. In Proc 2014 14th Int Conf Frontiers in Handwriting Recognition (ICFHR), pages 279-284, Hersonissos, Greece, 01-04 Sep 2014. IEEE. https://doi.org/10.1109/ICFHR.2014.54. (Crossref)
P. Dreuw, P. Doetsch, C. Plahl, and H. Ney. Hierarchical hybrid MLP/HMM or rather MLP features for a discriminatively trained gaussian HMM: A comparison for offline handwriting recognition. In 2011 18th IEEE Int Conf Image Processing (ICIP), pages 3541-3544, Brussels, Belgium, 11-14 Sep 2011. IEEE. https://doi.org/10.1109/ICIP.2011.6116480. (Crossref)
S. España-Boquera, M. J. Castro-Bleda, J. Gorbe-Moya, and F. Zamora-Martinez. Improving offline handwritten text recognition with hybrid HMM/ANN models. IEEE Transactions on Pattern Analysis and Machine Intelligence, 33(4):767-779, 2010. https://doi.org/10.1109/TPAMI.2010.141. (Crossref)
A. Fischer. Handwriting Recognition in Historical Documents. PhD thesis, Universität Bern, Switzerland, 13 Mar 2012. https://www.researchgate.net/publication/259346163.
A. Fischer, A. Keller, V. Frinken, and H. Bunke. Lexicon-free handwritten word spotting using character HMMs. Pattern Recognition Letters, 33(7):934-942, 2012. Special Issue on Awards from ICPR 2010. https://doi.org/10.1016/j.patrec.2011.09.009. (Crossref)
A. Fischer, K. Riesen, and H. Bunke. Graph similarity features for HMM-based handwriting recognition in historical documents. In Proc 2010 12th Int Conf Frontiers in Handwriting Recognition (ICFHR), pages 253-258, Kolkata, India, 16-18 Nov 2010. IEEE. https://doi.org/10.1109/ICFHR.2010.47. (Crossref)
V. Frinken and S. Uchida. Deep BLSTM neural networks for unconstrained continuous handwritten text recognition. In Proc 2015 13th Int Conf Document Analysis and Recognition (ICDAR), pages 911-915, Tunis, Tunisia, 23-26 Aug 2015. IEEE. https://doi.org/10.1109/ICDAR.2015.7333894. (Crossref)
A. Giménez, I. Khoury, J. Andrés-Ferrer, and A. Juan. Handwriting word recognition using windowed Bernoulli HMMs. Pattern Recognition Letters, 35:149-156, 01 2014. https://doi.org/10.1016/j.patrec.2012.09.002. (Crossref)
A. Graves, S. Fernández, F. Gomez, and J. Schmidhuber. Connectionist temporal classification: labelling unsegmented sequence data with recurrent neural networks. In ICML '06: Proc 23rd Int Conf Machine Learning, pages 369-376, Pittsburgh, PA, USA, 25-29 Jun 2006. https://doi.org/10.1145/1143844.1143891. (Crossref)
A. Graves and N. Jaitly. Towards end-to-end speech recognition with recurrent neural networks. In Proc 31st Int Conf Machine Learning (ICML'14), volume 32 of ACM Proceedings, pages II–1764-II–1772, Beijing, China, 21-26 Jun 2014. JMLR.org. https://dl.acm.org/doi/abs/10.5555/3044805.3045089.
A. Graves, M. Liwicki, S. Fernández, et al. A novel connectionist system for unconstrained handwriting recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 31(5):855-868, 2009. https://doi.org/10.1109/TPAMI.2008.137. (Crossref)
A. Graves and J. Schmidhuber. Offline handwriting recognition with multidimensional recurrent neural networks. In Advances in Neural Information Processing Systems 21 - Proc 22nd Conf NeurIPS 2008, volume 21, pages 545-552. Curran Associates, Inc., 2008. https://proceedings.neurips.cc/paper/2008/file/66368270ffd51418ec58bd793f2d9b1b-Paper.pdf.
Keras Special Interest Group. Keras. simple. flexible. powerful. https://keras.io.
S. Johansson, G. N. Leech, and H. Goodluck. Manual of Information to accompany the Lancaster-Oslo/Bergen Corpus of British English, for use with digital Computers. Department of English, University of Oslo, Oslo, Norway, 1978.
L. Kang, P. Riba, M. Rusiñol, et al. Pay attention to what you read: Non-recurrent handwritten text-line recognition. arXiv, 2020. arXiv:2005.13044. https://doi.org/10.48550/arXiv.2005.13044.
L. Kang, P. Riba, M. Rusiñol, et al. Pay attention to what you read: Non-recurrent handwritten text-line recognition. Pattern Recognition, 129:108766, 2022. https://doi.org/10.1016/j.patcog.2022.108766. (Crossref)
G. Kim, V. Govindaraju, and S. N. Srihari. An architecture for handwritten text recognition systems. International Journal on Document Analysis and Recognition, 2(1):37-44, 1999. https://doi.org/10.1007/s100320050035. (Crossref)
M. Kozielski, P. Doetsch, and H. Ney. Improvements in RWTH's system for off-line handwriting recognition. In Proc 2013 IAPR 12th Int Conf Document Analysis and Recognition (ICDAR), pages 935-939, Washington, DC, USA, 25-28 Aug 2013. IEEE. https://doi.org/10.1109/ICDAR.2013.190. (Crossref)
A. Krizhevsky, I. Sutskever, and G. E. Hinton. ImageNet classification with deep convolutional neural networks. Communications of the ACM, 60(6):84-90, 2017. https://doi.org/10.1145/3065386. (Crossref)
L. Kumari and A. Sharma. A review of deep learning techniques in document image word spotting. Archives of Computational Methods in Engineering, 29(2):1085-1106. https://doi.org/10.1007/s11831-021-09605-7. (Crossref)
L. Kumari, S. Singh, and A. Sharma. Page level input for handwritten text recognition in document images. In J. H. Kim et al., editors, Proc 7th Int Conf Harmony Search, Soft Computing and Applications (ICHSA), volume 140 of Lecture Notes on Data Engineering and Communications Technologies, pages 171-183, Seoul, South Korea, 23-24 Feb 2022. Springer Nature Singapore. https://doi.org/10.1007/978-981-19-2948-9_17. (Crossref)
Y. Le Cun, B. Boser, J. S. Denker, et al. Handwritten digit recognition with a back-propagation network. In Advances in Neural Information Processing Systems 2 - Proc Conf NeurIPS 2008, volume 2, page 396–404, San Francisco, CA, USA, 1990. Morgan Kaufmann Publishers Inc. https://proceedings.neurips.cc/paper/1989/file/53c3bce66e43be4f209556518c2fcb54-Paper.pdf.
M.-T. Luong, H. Pham, and C. D. Manning. Effective approaches to attention-based neural machine translation. In Proc EMNLP 2015, Lisbon, Portugal, 17-21 Sep 2015. Accessible in arXiv. https://doi.org/10.48550/ARXIV.1508.04025. (Crossref)
U.-V. Marti and H. Bunke. The IAM-database: an English sentence database for offline handwriting recognition. International Journal on Document Analysis and Recognition, 5(1):39-46, 2002. https://doi.org/10.1007/s100320200071. (Crossref)
J. Michael, R. Labahn, T. Grüning, and J. Zöllner. Evaluating sequence-to-sequence models for handwritten text recognition. In Proc 2019 IAPR Int Conf Document Analysis and Recognition (ICDAR), pages 1286-1293, Sydney, NSW, Australia, 20-25 Sep 2019. IEEE. https://doi.org/10.1109/ICDAR.2019.00208. (Crossref)
J. Poulos and R. Valle. Character-based handwritten text transcription with attention networks. Neural Computing and Applications, 33(16):10563-10573, 2021. https://doi.org/10.1007/s00521-021-05813-1. (Crossref)
A. Poznanski and L. Wolf. CNN-N-Gram for handwriting word recognition. In Proc 2016 IEEE Conf Computer Vision and Pattern Recognition (CVPR), pages 2305-2314, Las Vegas, NV, USA, 27-30 Jun 2016. https://doi.org/10.1109/CVPR.2016.253. (Crossref)
R. Ptucha, F. Petroski Such, S. Pillai, et al. Intelligent character recognition using fully convolutional neural networks. Pattern Recognition, 88:604-613, 2019. https://doi.org/10.1016/j.patcog.2018.12.017. (Crossref)
J. Puigcerver. Are multidimensional recurrent layers really necessary for handwritten text recognition?
In Proc 2017 14th IAPR Int. Conf Document Analysis and Recognition (ICDAR), pages 67-72, Kyoto, Japan, 9-15 Nov 2017. IEEE. https://doi.org/10.1109/ICDAR.2017.20. (Crossref)
H. Sak, A. Senior, and F. Beaufays. Long short-term memory recurrent neural network architectures for large scale acoustic modeling. In Proc Annual Conf of the International Speech Communication Association (Interspeech), pages 338-342, Singapore, 14-18 Sep 2014. https://doi.org/10.21437/Interspeech.2014-80. (Crossref)
J. Sauvola and M. Pietikäinen. Adaptive document image binarization. Pattern Recognition, 33(2):225-236, 2000. https://doi.org/10.1016/S0031-3203(99)00055-2. (Crossref)
H. Scheidl. CTCWordBeamSearch. GitHub repository, 2019. https://github.com/githubharald/CTCWordBeamSearch.
H. Scheidl, S. Fiel, and R. Sablatnig. Word Beam Search: A connectionist temporal classification decoding algorithm. In Proc 2018 16th Int Conf Frontiers in Handwriting Recognition (ICFHR), pages 253-258, Niagara Falls, NY, USA, 5-8 Aug 2018. IEEE. https://doi.org/10.1109/ICFHR-2018.2018.00052. (Crossref)
B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. arXiv, 2015. arXiv:1507.05717. https://doi.org/10.48550/arXiv.1507.05717.
B. Shi, X. Bai, and C. Yao. An end-to-end trainable neural network for image-based sequence recognition and its application to scene text recognition. IEEE Transactions on Pattern Analysis and Machine Intelligence, 39(11):2298-2304, 2017. https://doi.org/10.1109/TPAMI.2016.2646371. (Crossref)
F. Such Petroski, D. Peri, F. Brockler, et al. Fully convolutional networks for handwriting recognition. In Proc 2018 16th Int Conf Frontiers in Handwriting Recognition (ICFHR), pages 86-91, Niagara Falls, NY, USA, 5-8 Aug 2018. IEEE. https://doi.org/10.1109/ICFHR-2018.2018.00024. (Crossref)
D. Suryani, P. Doetsch, and H. Ney. On the benefits of convolutional neural network combinations in offline handwriting recognition. In Proc 2016 15th Int Conf Frontiers in Handwriting Recognition (ICFHR), pages 193-198, Shenzhen, China, 23-26 Oct 2016. IEEE. https://doi.org/10.1109/ICFHR.2016.0046. (Crossref)
J. I. Toledo, S. Dey, A. Fornes, and J. Llados. Handwriting recognition by attribute embedding and recurrent neural networks. In Proc 2017 14th IAPR Int Conf Document Analysis and Recognition (ICDAR), volume 01, pages 1038-1043, Kyoto, Japan, 9-15 Nov 2017. IEEE. https://doi.org/10.1109/ICDAR.2017.172. (Crossref)
A. Vinciarelli. A survey on off-line cursive word recognition. Pattern Recognition, 35(7):1433-1446, 2002. https://doi.org/10.1016/S0031-3203(01)00129-7. (Crossref)
A. Vinciarelli and J. Luettin. A new normalization technique for cursive handwritten words. Pattern Recognition Letters, 22(9):1043-1050, 2001. https://doi.org/10.1016/S0167-8655(01)00042-3. (Crossref)
M. Yousef and T. Bishop. OrigamiNet: Weakly-supervised, segmentation-free, one-step, full page text recognition by learning to unfold. In Proc 2020 IEEE/CVF Conf Computer Vision and Pattern Recognition (CVPR), pages 14698-14707, Seattle, WA, USA, 13-19 Jun 2020. IEEE. https://doi.org/10.1109/CVPR42600.2020.01472. (Crossref)
M. Yousef, K. F. Hussain, and U. S. Mohammed. Accurate, data-efficient, unconstrained text recognition with convolutional neural networks. Pattern Recognition, 108:107482, 2020. https://doi.org/10.1016/j.patcog.2020.107482. (Crossref)