Bolbol-Zaban — Writing Persian Poetry with AI
Back In 2017, I’ve trained a character-based language model on Persian poetry using a multi-layer LSTM. The model was able to predict the next character, given the previous 256 characters and generate random Poetry starting with an empty string. It was able to understand the rhyme and rhythm (Meter) of Persian poetry, generate valid words, phrases and verses with decent grammar but the semantic coherence was weak. Despite that I was enjoying the magical beauty of the result and couldn’t stop reading the endless sequence of verses generated by the model and finding one meaningful verse every few lines like these examples:
روان را به دانش به کار آوری (~Empower soul with knowledge)
or
جهان از دلیران بسی کرد یاد (~The world remembers brave people)
I ran the same model on Harry Potter books and added some new chapters to the story. This was a smaller data-set leading to a lower quality result but looking into the example below helps the reader who can’t read Persian examples that were shared in my previous post, to understand how foolish was the model about generating meaningful text:
“what do you think i was doing there?” said harry, “what do you mean?” said hermione pleasantly, but he had not expected this evening.
“you haven’t got to straight him out of the ministry,” said harry. “i don’t know what he does.”
he turned away from him, shouting over his shoulder and pulling out his wand and reappearing down the spot, then turned to look at the desk and took the crowd at harry and hermione. “so what are you supposed to do with the ministry of magic?”
“i don’t look like it,” said hermione quickly. “i have no idea what you’d go to the castle.”
“i don’t think you are supposed to be looking for anything,” said harry quickly. “harry potter, i have still got a student of the dark lord, but i can tell you what it is.”
“what do you mean, my dear?” said harry.
Remembering the above text is generated character by character still makes me amazed!
I thought I could create a tool to enable everybody to play with this model and “probably” enjoy the beauty of AI-generated poetry, but before that, I wanted to improve the semantic coherence of the verses, as much as possible and find a way for the end-user to have more control over the generation process. After trying many different ideas, finally combining a character-based language model which was trained over a large collection of Poetry and, a word-based language model (ELMO) that was pre-trained on a larger data set of Persian text including Persian Wikipedia dump, led to a better result. These two data sets were the only datasets available at that time. Pre-training ELMO together with training the mixed model took about a month on my laptop with a 4GB GPU and the result was semantically improved.
In early 2019, I released an application based on this model. The application is named Bolbol-Zaban. Bolbol is the Persian word for Nightingale but Bolbolzaban is a phrase that is used to refer to a toddler who just started speaking and enjoys chatting all the time. It also translates to sweet-spoken in English. It sounded like a relevant name to me!
How it works
Each line of classical Persian poetry normally includes two verses in the same meter (rhythm), like this:
زلف آشفته و خوی کرده و خندان لب و مست
پیرهن چاک و غزل خوان و صراحی در دستzolf-’āšofte-vo xoy-karde-vo xandān-lab-o mast
pīrhan-čāk-o qazal-xān-o sorāhī dar dastxHair-tousled, perspiring, smiling-lipped, and drunk
Shirt torn, singing songs, and wine-flask in hand”
Bolbolzaban enables the user to give a pattern for the line as input and uses that pattern to generate one line of poetry. The pattern can include some words and some question marks. During generation, Bolbolzaban tries to maintain the user words as much as possible and generate one new word for each question mark. The new words are generated according to the semantic context while generating a “poetically” correct line (correct rhyme and meter).
This is the translation of the input and output pattern in the above example:
Input Pattern:
Will stay safe ??????
Who ??????? is not sick
Output:
who is not your friend will stay safe
There is no one who is your lover and is not sick (~lovesick)
I had to reorder the words in translation to match English grammar but Bolbolzaban will not change the word order as much as possible.
To be fair, it still takes some effort to get something meaningful from Bolbolzaban but there is at least one user on the internet that proved with proper try and error of different input patterns, one can write a complete poem with Bolbolzaban’s help without any prior poetic background. Silly enough, that person is me. If you read and write Persian, please read my story here and give Bolbolzaban a try and let me know what you think. Meanwhile, I share the poem that I wrote with Bolbolzanan’s help about her origin for a Persian reader:
بلبلزبان
چو بشنید افشین زبان برگشاد
ز گفتار بلبلزبان کرد یادبیاموخت چون نوسخن را بدید
به گفتن برآورد شعری جدیدز نیکی نمیناممش پهلوان
ولی یاد کرد آشکار و نهانبه کوشش بیاموزد از کاربر
همی دانش و گویش و هم هنرکنی استفاده لغاتی جدید
که بلبلزبان این شگفتی ندیددهد شعرکی از برای جواب
تو را پاسخی اندر آید به خواباگر وقت اندر نهادی به کار
بسازی تو یک شعر در کارزارچو آمد بر دوست کودک ربات
به آوا در آورد شاخ نباتبگفت از زمین و زمان داستان
بدو کاربر گشت همداستانچو با یکدیگر مثنوی ساختند
همه گفتِ دل را بیاراستندبه بهنام گفتم بیا و ببین
زمین را ببوسید و کرد آفرینبگفت این شگفتی که آورد پیش
مرا پندها داد ز اندازه بیشکه تا هیچ کس می نگوید که گنج
نمی یابی از شاعری و ز رنج