RIFMA: A Dataset of Russian Poetry with Accentuation Annotations

The RIFMA dataset consists of approximately 3,598 fragments of Russian-language poems (stanzas and poems) with stress marks and a rhyme scheme information. Fragments are taken from poems of different genres, forms and authored by different poets, to ensure the widest possible coverage.

An example of poem accentuation is as follows:

За́йку бро́сила хозя́йка —
Под дождё́м оста́лся за́йка.
Со скаме́йки сле́зть не мо́г,
Ве́сь до ни́точки промо́к.

This dataset is a supplementary material for our paper Automated Evaluation of Meter and Rhyme in Russian Generative Poetry.

This dataset was initially created to validate the Russian Poetry Scansion Tool. Recognizing the lack of similar resources for Russian poetry, we have made it publicly available under an open license to support research in natural language processing (NLP) and computational poetry analysis.

Dataset file

rifma_dataset.json

Structure of dataset

Each entry in the dataset is a tuple with the following fields:

poem_text - The poem plain text.

accentuation_markup - The poem text with stress annotations. Primary stresses are marked with \u0301, and secondary stresses with \u0302, as in example: И сло̀вони́тью сде́лайте окно́.

rhyme_scheme - The rhyme scheme of the poem. If not applicable, this field is null. Common patterns include ABAB (where the first line rhymes with the third, and the second with the fourth).

Statistics

Some information is presented in automatically generated report file.

Additional Resources

Ars Poetica is a dataset comprising Russian-language syllabo-tonic poetry with stress placement performed by Russian Poetry Scansion Tool.

Citation

If you use this dataset in your research or projects, please cite it as follows:

@misc{koziev2025automatedevaluationmeterrhyme,
      title={Automated Evaluation of Meter and Rhyme in Russian Generative and Human-Authored Poetry},
      author={Ilya Koziev},
      year={2025},
      eprint={2502.20931},
      archivePrefix={arXiv},
      primaryClass={cs.CL},
      url={https://arxiv.org/abs/2502.20931},
}

Name		Name	Last commit message	Last commit date
Latest commit History 13 Commits
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
poetry_aligner_tests.md		poetry_aligner_tests.md
rifma_dataset.json		rifma_dataset.json

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

RIFMA: A Dataset of Russian Poetry with Accentuation Annotations

Dataset file

Structure of dataset

Statistics

Additional Resources

Citation

About

Releases

Packages

License

Koziev/Rifma

Folders and files

Latest commit

History

Repository files navigation

RIFMA: A Dataset of Russian Poetry with Accentuation Annotations

Dataset file

Structure of dataset

Statistics

Additional Resources

Citation

About

Topics

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Packages