Writing Persian Poetry with GPT-2.0

Afshin Khashei
8 min readMar 29, 2020


Ibrahim Jabbar-Beik (1923–2002) — Chehelstotoun restaurant, Abbasi Hotel, Isfahan, Iran

I should confess that generating Persian poetry using deep neural networks was my number one hobby in the past couple of years. I tried to design, train and fine-tune several models from plain multilayer-LSTM and GPT-2 language models to custom CharCNN, RNNs and Transformers mixed with pre-trained ELMO, BERT, RoBERTa and Albert. I don’t think I ever found the answer that I was looking for, but I learnt a lot from these experiments and once in a while I got a result that was interesting and encouraging to keep me going.

In January 2020 I pre-trained a Persian GPT-2 medium model on a large text corpus that was collected from the internet. After that, I tried different ways to fine-tune this model to generate classical and modern Persian poetry.

For anyone curious enough to learn about Persian poetry, there is a basic introduction to the rhythmic structure and classical forms of Persian poetry in this post. I also shared how I went after fine-tuning Persian GPT-2. For Persian readers, there are some examples of the generated poetry on top.

Rhyme and Rhythm in Persian Poetry

Classical Persian poetry was almost always written as couplets. A couplet usually consists of two successive lines that rhyme and have the same rhythmic pattern (meter or وزن شعر). A poem may have two to 100s of couplets depending on its “form” (e.g ghazal, Masnavi).

There are about 15 different classic forms of Persian poetry (source: قالب شعر). Forms as much as we care in this article, are different in the typical number of lines and patterns of rhyming. The most popular patterns are:

  • Masnavi(مثنوی): Each couplet has its rhyme
    ……………. A ……………. A
    ……………. B ……………. B
    ……………. C ……………. C
  • Ghazal or Ghaside (غزل یا قصیده): First couplet and the second part of other couplets rhyme the same
    ……………. A ……………. A
    ……………. B ……………. A
    ……………. C ……………. A
  • Qate’ (قطعه): The second parts of all couplets rhyme the same
    ……………. B …………… A
    ……………. C …….…….. A
    ……………. D …..………. A

Unlike most of the English poetry where the rhythmic structure of the verses is based on stressed syllables coming at regular intervals, Classical Persian poetry is based not on stress but quantity, that is, on the length of syllables. A syllable ending in a short vowel is short (u), but one ending in a long vowel or a consonant is long (–). Thus a word like ja-hān “world” is considered to have a short syllable plus a long one (u –), whereas far-dā “tomorrow” consists of two long syllables (– –). (source: Wikipedia)

Classical Persian poetry is written in more than 130 different rhythmic patterns defined using above notation, known as meters (وزن شعر), where 70% of the remaining poetry is in one of the four popular meters. Here is an example of a couplet with rhymes in bold:

درخت دوستی بنشان که کام دل به بار آرد
نهال دشمنی بر کن که رنج بی شمار آرد

In English (with similar rhytmic pattern and meaning as original):
Breed only rapport sapling to get hearty delightfulness
Weed out all of feud bushes because they lead to hatefulness

Meter (in Persian notation):
مفاعیلن مفاعیلن مفاعیلن مفاعیلن

Meter (in syllable type notation):
u — — — u — — — u — — — u — — —

Today we have access to millions of lines of classical poetry written in these forms by hundreds of poets some belong to more than 1000 years ago.

During the 20th century, new forms of poetry which were mostly inspired by western forms became more popular. In the new forms, the restrictions about rhythmic structure and rhyming forms are relaxed. The poetry will consist of lines with different lengths which may or may not rhyme.

GPT-2 Persian

GPT-2 is a language model. A language model can predict (generate) the next word in a text, given previous words. The following example was generated by Persian GPT-2, which learns the Persian language by looking into a large corpus of text collected from the internet. The opening line in bold was taken from a news website but the rest of the text is generated by the GPT-2 Persian model word by word:


یک مقام ارشد دولتی در پایتخت ایران گفته است که خیابان‌های تهران امروز شاهد یکی از شلو‌ترین روزها بوده و هشدار داده که این وضعیت « بسیار خطرآفرین خواهد بود » . شمار افراد مبتلا به ویروس کرونا در این کشور به ۱۲ هزار ۷۲۹ نفر رسیده و تاکنون ۶۱۱ نفر هم جان خود را از دست داده‌اند .


این ویروس که در میان جوانان و زنان در حال فراگیر شدن است ، در سال جاری میلادی در کشور ایران نیز شیوع پیدا کرده است .

در حال حاضر بیش از ۳ ‌میلیون و ۳۰۰‌ هزار نفر در ایران به این ویروس مبتلا هستند .

به گفته مقامات بهداشتی ایران ، بیماری کرونا ، یکی از شایع‌ترین بیماری‌های عفونی در منطقه خاورمیانه و شمال آفریقاست .

سازمان بهداشت جهانی هم در گزارش خود با اشاره به افزایش ابتلا به این بیماری در ایران ، از شیوع ۱۵ درصدی آن در سال جاری میلادی خبر داده است .

در سال جاری میلادی ، ۴۴‌ هزار نفر به این بیماری مبتلا شده‌اند که آمار بالایی است .

با این حال ، کارشناسان بهداشت عمومی بر این باورند که آلودگی به ویروس کرونا در ایران ، به‌هیچ‌وجه به‌معنای مرگ نیست و در صورت شیوع گسترده …

If I give the same model an opening line from a poem, the result will be something like this:


سالم بود آن کس که تو را یار نباشد
کس نیست که از عشق تو بیمار نباشد


در عشق تو ای شاه جهان هیچ کس از هیچ کس آزرده نگردد
جان‌ها همه در خدمت تو باشد
ای دوست ، بیا و دیده‌ها را بگشای
تا دیده و دل در بر تو باز شود
زان پیش که جمله عالم از هم بشکافند
تا تو ز درون پرده‌ها باز شوی

(سعدی )

As you can see the model was not able to continue the poem with the same meter and matching rhyme given the first couplet. That is because Persian GPT-2 was trained on a plain text corpus and learned the semantic regularities and grammatical rules of its training dataset. My goal was to generate classical Persian poetry in a given topic while adhering to the strict rhythmic and rhyming structure of Persian poetry. To achieve that, I tried three different approaches:

  1. Pre-training GPT2 medium on a large corpus of plain text (like the above model) then continues training on classical Persian poetry for a while.
  2. Training GPT2 medium on a large corpus of text including both poetry and text, with heavier weight on poetry. (Poetry content was repeated in training set a few times more than the plain text).
  3. Training a GPT2 small only on a corpus of classical and modern poetry.

In all models, the standard GPT-2 hyperparameters were used with 128 or 256 input size (subwords). While all three approaches ended up improving the poeticness of the result, only the 3rd model was able to consistently generate an output with proper rhythmic and rhyming structure. Let’s see some examples:


سالم بود آنکس که تو را یار نباشد
کس نیست که از عشق تو بیمار نباشد


از دست غمت جز غم و تیمار ندارم
زانست که جز غصه مرا یار نباشد

هر کس که شود کشته تیغ ستم او
در مذهب عشاق گنهکار نباشد

در عشق رخ و زلف تو ما را همه کاریست
کاری که به از دولت بیدار نباشد

گر روی تو بینم همه عالم به بهشتی
جز روی تو ما را دل بیدار نباشد

آن را که بود عشق تو، گویی به بهشتی
دوزخ بود آن مجلس و در کار نباشد

در مذهب ما زهد و ورع در گرو عشق
این است که اینها همه بیکار نباشد

دانی که چه خیزد ز دعاهای سحرگه
تا بوی گل و ناله عطار نباشد

گر خلق جهانت بدهندم به امیدی
یک وعده به پرسش دل بیمار نباشد

از یار چو یاری طلبم، یار دگر جوی
تا یار بود یار وفادار نباشد

خسرو، چه کنی ناله و فریاد که روزی
یک بنده به جان ماند که دلدار نباشد ؟

I think the first two couplets are amazing, relevant and poetic!

Since the 3rd model is also trained on modern poetry (شعر نو), it will continue textual input lines with modern poetry:


تولدت مبارک


تو در عین ناباوری به دنیا اومدی
اما چه خوشبختی
چه نازه آغوش گرم و مهربونش
به اندازه‌ی یک عمر با تو بودن و تا ابد به یاد تو بودن لذت بخشه .
تولدت مبارک
تو روزای بی تو بودن چقدر دوستت دارم
دوستت دارم دوستت دارم
من به تو خندیدم
تو به من خندیدی
من به این لبخند ، مدیونم
« من » با تو می‌مانم
و برایت دعا می‌خوانم
به من لبخند بزن
مرا شاد کن
مرا در شادی خود شریکم کن
چون برای هر روز تو و من روزهای متفاوتی هستیم

The 3rd model, however, has a weak semantic understanding outside of the context of poetry. That means if the opening line is not poetic or written about some topic that does not usually appear in a poem the continuation will be irrelevant. This is because it was trained only in poetry.

یک مقام ارشد دولتی در پایتخت ایران گفته است که خیابان‌های تهران امروز شاهد یکی از شلو‌ترین روزها بوده و هشدار داده که این وضعیت « بسیار خطرآفرین خواهد بود » . شمار افراد مبتلا به ویروس کرونا در این کشور به ۱۲ هزار ۷۲۹ نفر رسیده و تاکنون ۶۱۱ نفر هم جان خود را از دست داده‌اند


و به این جهت ، خود را در مهلکهٔ هلاکت خود می‌اندازد

هر که را در راه او باشد قدم، باید زدن
هر که را در کوی او باشد قدم ، باید زدن

هر که را باشد هوای سلطنت در سر، چو شمع
هر چه او خواهد بگوید ، بر زمین باید زدن

هر که را باشد هوس در سینه، چون آیینه است
بر سر هر سنگ ، چون آیینه ، دم باید زدن

تا توانی در حریم وصل، گامی ترک کن
در ره توفیق ، چون مردان قدم باید زدن

Next Step

My goal is to generate fluent poems that are grammatically and poetically correct, semantically coherent and relevant to the given topic. Technical evaluation metrics such as loss or accuracy will only tell how good the model learns the regularities of the training dataset in general but there is no metric to systematically measure different aspects of regularities (e.g. poeticness, fluency, grammar, …) which ultimately determines the goodness of the result from reader’s perspective. At least so far I couldn’t come up with any automated evaluation mechanism to compare the result of different models.

One way to measure the quality of a generative model is to use human feedback. For that, I would like to put this model online and build a tool that allows readers to like/dislike or compare original/generated poetry.

Generating one of these examples (128–256 subwords) is depending on the model size and on my intel core i7 laptop with a 4GB GPU takes between 5 to 10 seconds. The same process takes ~5 times longer on a CPU. That means one execution takes more than a minute to complete which is quite unpractical for a web application.

Unfortunately, At this time I can’t pay for a decent GPU based host out of my pocket but I keep searching for viable options to do a human evaluation.


This project is supported by Cloud TPUs from Google’s TensorFlow Research Cloud (TFRC).