Markov chain: Difference between revisions

418 bytes added ,  22:41, 22 March 2023
no edit summary
No edit summary
Line 1: Line 1:
A markov chain, used in the context of Caves of Qud, is a method of generating new text using a "corpus" of existing text. This corpus can be found at <code>QudCorpus1.txt</code> and <code>QudCorpus2.txt</code> where [[File locations|game data files are stored]]. This method of generation is used to generate text for white-titled [[books]], [[telepathy|dreaming creatures]], graffiti, urn engravings, and {{favilink|glowcrow}} dialogue.
A markov chain, used in the context of Caves of Qud, is a method of generating new text using a "corpus" of existing text. The game pulls from two files [[File locations|in the base game directory]]: <code>QudCorpus.txt</code>, which is a compilation of all of the game's dialogue, descriptions, quest info, and help text; and <code>LibraryCorpus.txt</code> which is a collection of public domain books from Project Gutenberg. Some specific titles include [https://www.gutenberg.org/ebooks/29444 <i>The Machinery of the Universe: Mechnaical Concepts of Physical Phenomena</i>], [https://www.gutenberg.org/ebooks/38687 <i>Zoological Mythology; or, The Legends of Animals</i>].This method of generation is used to generate text for white-titled [[books]], [[telepathy|dreaming creatures]], graffiti, urn engravings, and {{favilink|glowcrow}} dialogue.


Jason Greenblat, aka Ptychomancer, also had a talk at the International Roguelike Celebration about markov generation:
Jason Greenblat, aka Ptychomancer, also had a talk at the International Roguelike Celebration about markov generation:
Line 53: Line 53:
* <b> Initial Phrasing -</b> Because of how the data is organized, seeds must only be of two words. There is no fuzzy searching, so the seed must be the exact same case and contain the same punctuation if needed. For example, the model considers "Of the" and "of the" to be two distinct phrases: the first occurs at the start of the sentence, while the second will only appear somewhere in the middle.
* <b> Initial Phrasing -</b> Because of how the data is organized, seeds must only be of two words. There is no fuzzy searching, so the seed must be the exact same case and contain the same punctuation if needed. For example, the model considers "Of the" and "of the" to be two distinct phrases: the first occurs at the start of the sentence, while the second will only appear somewhere in the middle.
   
   
* <b> Corpus Size -</b> The corpus consists of only the game's descriptions and dialogue and some public domain texts. This is 871KB, compared to GPT-2's training model of 40GB. These are two completely different machine learning algorithms, but this is a good way of showing scale. Because of this comparatively small corpus, putting in any phrase as the seed will not work. That exact phrase must be in the corpus. If you want to check if a phrase is in the corpus, you can <code>ctrl+F</code> the QudCorpus.txt to see if it appears. If you are using Cryptogull, you can use <code>?sleeptalk <word></code> to see all possible phrases that contain that word.
* <b> Corpus Size -</b> The corpus consists of only the game's descriptions and dialogue and some public domain texts. This is about 1MB, compared to GPT-2's training model of 40GB. These are two completely different machine learning algorithms, but this is a good way of showing scale. Because of this comparatively small corpus, putting in any phrase as the seed will not work. That exact phrase must be in the corpus. If you want to check if a phrase is in the corpus, you can <code>ctrl+F</code> the QudCorpus.txt to see if it appears. If you are using Cryptogull, you can use <code>?incorpus <word></code> to see all possible phrases that contain that word.


== Further Reading ==
== Further Reading ==
*<code>XRL.MarkovChain.cs</code>
*<code>XRL.MarkovChain.cs</code>
*<code>XRL.MarkovChainData.cs</code>
*<code>XRL.MarkovChainData.cs</code>
*[https://github.com/TrashMonks/cryptogull/blob/main/helpers/corpus.py Cryptogull's markov generation module]
*[https://github.com/TrashMonks/cryptogull/blob/main/bot/helpers/corpus.py Cryptogull's markov generation module]
[[Category:Guides]]
[[Category:Guides]]