Detecting Persian poem metre (وزن شعر) using a sequence to sequence deep learning model

Afshin Khashei
3 min readFeb 26, 2020

Over the past 1000 years, the Persian language has enjoyed a very rich literature, especially of poetry. Until the advent of free verse in the 20th century, this poetry was always quantitative, that is the lines were composed of various patterns of long and short syllables. The different patterns are known as metres. Knowledge of metre is essential for the correct recitation of Persian poetry, and also very often since short vowels are not written in the Persian script, for the correct interpretation of the meaning of a verse in cases of ambiguity. (See: Persian metres on Wikipedia)

There are different ways to represent the metre patterns. One of these methods is to use a series of made-up base words (aka. Arkān or feet) which are derived from the Arabic verb فعل‎ f’l (meaning to do). For instance, the following line, both in Persian and in English translation are in the same metre pattern that can be represented as “me fâ’ î lün — me fâ’ î lün — me fâ’ î lün — me fâ’ î lün” (مفاعیلن مفاعیلن مفاعیلن مفاعیلن)

درخت دوستی بنشان که کام دل به بار آرد
نهال دشمنی برکن که رنج بی‌شمار آرد

Breed only rapport sapling to get hearty delightfulness
Weed out all of feud bushes because they lead to hatefulness

There are more than hundreds of different metres identified in millions of lines of classical Persian poems. Extracting the metres is a technical task that starts with replacing the characters of the line with a string of symbols representing vowels and consonants then combining them to short, long and overlong syllables. After that, a series of rules will apply to match the sequence with one of the existing patterns. This process requires a fair bit of poetry knowledge.

In this project, I tried to use an end to end deep learning model to detect the meter. The model uses a multi-layer LSTM encoder-decoder with an attention mechanism that was trained on samples of poems which were categorized manually. The model was then tested on a sample size with 60000 samples in 50 different metres that were not used in training and achieved an accuracy of 97.8% on this set.

Previous attempts to detect Persian meter using a computer program was either through rule-based algorithms or statistical approaches using Markov Models and were limited to detecting a smaller set of patterns (~30) that cover a majority of the Persian poems with lower accuracy. (see this)

As far as I know, detecting Persian meter with a deep learning model and with this accuracy is done for the first time. I don’t have access to all journals published in Iran so feel free to comment if this statement is not accurate.

If you know the Persian language feel free to try the result which is hosted here: http://vaznyab.nosokhan.com

http://vaznyab.nosokhan.com

Originally published at https://www.linkedin.com on April 26, 2018

--

--