Markov chain: Difference between revisions

225 bytes added ,  01:19, 3 May 2022
m
no edit summary
(Created page with "A markov chain, used in the context of Caves of Qud, is a method of generating new text using a "corpus" of existing text. This corpus can be found at <code>QudCorpus1.txt</co...")
 
mNo edit summary
Line 8: Line 8:
The general algorithm is that the corpus is chopped up into key-value groups, with the size of each key based on the "order" of the markov chain. Caves of Qud uses an order of two, so the word groups consist of at most two words.
The general algorithm is that the corpus is chopped up into key-value groups, with the size of each key based on the "order" of the markov chain. Caves of Qud uses an order of two, so the word groups consist of at most two words.


For example, take the sentence "Oh, a quetzal is a pretty bird in the trogon family." The game will split this into the following pairs:
As a simplified example, take the sentence "Oh, a quetzal is a pretty bird in the trogon family." The game will split this into the following pairs:


<pre>
<pre>
Line 53: Line 53:
* <b> Corpus Size -</b> The corpus consists of only the game's descriptions and dialogue and some public domain texts. This is 871KB, compared to GPT-2's training model of 40GB. These are two completely different machine learning algorithms, but this is a good way of showing scale. Because of this comparatively small corpus, putting in any phrase as the seed will not work. That exact phrase must be in the corpus. If you want to check if a phrase is in the corpus, you can <code>ctrl+F</code> the QudCorpus.txt to see if it appears. If you are using Cryptogull, you can use <code>?sleeptalk <word></code> to see all possible phrases that contain that word.
* <b> Corpus Size -</b> The corpus consists of only the game's descriptions and dialogue and some public domain texts. This is 871KB, compared to GPT-2's training model of 40GB. These are two completely different machine learning algorithms, but this is a good way of showing scale. Because of this comparatively small corpus, putting in any phrase as the seed will not work. That exact phrase must be in the corpus. If you want to check if a phrase is in the corpus, you can <code>ctrl+F</code> the QudCorpus.txt to see if it appears. If you are using Cryptogull, you can use <code>?sleeptalk <word></code> to see all possible phrases that contain that word.


 
== Further Reading ==
*<code>XRL.World.MarkovChain.cs</code>
*<code>XRL.World.MarkovChainData.cs</code>
*[https://github.com/TrashMonks/cryptogull/blob/main/helpers/corpus.py Cryptogull's markov generation module]
[[Category:Guides]]
[[Category:Guides]]