There are a few separate programs. The majority of the work was in the code to generate the Markov chain content. This was split into two parts -- the content was partially digested on my desktop computer so that the demands on the Arduino CPU could be minimized.
First the code to digest the text input:
<?php
/**
* This script takes input from stdin and uses it to generate a set of tables
* that can be used to efficiently produce Markov chain output.
*
* It produces three tables, all stored contiguously in the same binary file:
* - Words. Each word is a fixed-length string, presented in order of word ID.
* For example, if the longest word is 20 characters, word ID 10 will be
* found in the file between bytes 200 and 219. Null terminators are not
* included but shorter words are padded up to size using nulls.
* - Keys. Each key represents a specific sequence of $n (only tested with 2)
* words. For example, "is a" is a single key and the its key ID is used to
* represent all occurrences of "is a" in the text. Each key entry contains
* two little-endian 16-bit integers representing, respectively, the first
* and last option IDs that can follow this key. Keys are organized in
* sequence, starting with key ID 0, and each key entry is 4 bytes long; thus
* it is possible to find a key entry from its ID by calculating the offset
* from the end of the Words table.
* - Options. Each option represents a possible word that can follow a given
* sequence of words (key). For example, the key "is a" may have option
* entries that refer to the words "housecat", "newsworthy", "limp", and
* "limp", if the input text contained the phrases "is a housecat", "is a
* newsworthy", and "is a limp" (the last appearing twice). Each option
* entry contains two little-endian 16-bit integers representing,
* respectively, the word ID for this option, and the key ID to use in the
* next iteration. (If one of the two "limp" options is chosen randomly,
* the word ID will refer to the single word entry for "limp" and the key ID
* will refer to the single key entry for "a limp", which can then be used
* to repeat the process for the next word.)
*
* Written by Alec Smecher. Please include a thank-you if you use this code
* for anything. Distributed under the GNU GPLv2 license.
*/
// Consider a "history" of 2 words when pondering what next to say. This has
// not been tested with n <> 2; you'll at least have to modify the content
// generator.
$n = 2;
// $history stores the history as an array of word IDs. Initialize it.
$history = array_fill(0, $n, null);
// Keep a few simple statistics
$maxWordLength = $maxKeyLength = 0;
// Initialize the tables that hold the bulk of the data.
$words = array(); // word => wordIndex (inverted index)
$keys = array(); // key => keyIndex (inverted index)
$options = array(); // optionIndex => array(...option data...)
$in = fopen('php://stdin', 'r');
while (($l = fgets($in)) !== false) {
foreach (explode(' ', $l) as $word) {
// Find the word index in the $words list, creating it if needed
if (!isset($words[$word])) {
$wordIndex = count($words);
$words[$word] = $wordIndex;
} else {
$wordIndex = $words[$word];
}
// Keep track of the longest word length
$maxWordLength = max($maxWordLength, strlen($word));
// Build the history into a string key
$key = join($history, ' ');
// Find the key index in the $keys list, creating it if needed
if (!isset($keys[$key])) {
$keyIndex = count($keys);
$keys[$key] = $keyIndex;
$maxKeyLength = max($maxKeyLength, strlen($key));
} else {
$keyIndex = $keys[$key];
}
// Store this key index as the next key index from the last iteration.
// This is funky reference stuff.
$someKeyIndex = $keyIndex;
unset($someKeyIndex); // Clear the reference
// Maintain the $options list.
$someKeyIndex = 0; // We will back-fill this in the next iteration.
$options[] = array(
'wordIndex' => $wordIndex,
'thisKeyIndex' => $keyIndex,
'nextKeyIndex' => &$someKeyIndex // by reference
);
// Shift the history along for the next iteration.
for ($i=1; $i<$n; $i++) {
$history[$i-1] = $history[$i];
}
$history[$n-1] = $wordIndex;
}
}
fclose($in);
// Use stderr for helpful output
$stderr = fopen('php://stderr', 'w');
// We've generated the tables. Now present some statistics.
fputs($stderr, 'Number of words: ' . count($words) . "\n");
fputs($stderr, 'Number of keys: ' . count($keys) . "\n");
fputs($stderr, "Maximum word length: $maxWordLength\n");
fputs($stderr, "Number of options: " . count($options) . "\n");
// Sort the options array by current key. This will mean the
// options table will contain contiguous blocks each containing all the options
// for each key.
usort($options, create_function('$a, $b', 'return $b[\'thisKeyIndex\']-$a[\'thisKeyIndex\'];'));
// Find out the first and last option IDs for each key.
$keyFirstOptions = $keyLastOptions = array();
$lastKeyId = null;
foreach ($options as $optionId => $option) {
if ($lastKeyId != $option['thisKeyIndex']) {
$keyFirstOptions[$option['thisKeyIndex']] = $optionId;
$lastKeyId = $option['thisKeyIndex'];
}
$keyLastOptions[$option['thisKeyIndex']] = $optionId;
}
// The tables are all generated. Start saving the output to a file.
$out = fopen('markovchain.bin', 'w');
// 1. Write out the table of words. This can later be looked up starting at bytes [index * maxWordLength]
foreach ($words as $word => $wordIndex) {
fputs($out, pack('a' . $maxWordLength, $word));
}
$wordTableEnds = ftell($out);
fputs($stderr, 'Word table: 0 - ' . ($wordTableEnds-1) . " bytes\n");
// 2. Write out the key index.
foreach ($keys as $key => $keyIndex) {
// Find and store the first and last occurrences of the key in the option set.
fputs($out, pack('v', $keyFirstOptions[$keyIndex]));
fputs($out, pack('v', $keyLastOptions[$keyIndex]));
}
$keyTableEnds = ftell($out);
fputs($stderr, "Key table: $wordTableEnds - " . ($keyTableEnds-1) . " bytes\n");
// 3. Write out the options.
foreach ($options as $optionData) {
fputs($out, pack('v', $optionData['wordIndex']));
fputs($out, pack('v', $optionData['nextKeyIndex']));
}
$optionTableEnds = ftell($out);
fputs($stderr, "Option table: $keyTableEnds - " . ($optionTableEnds-1) . " bytes\n\n");
// Done writing output.
fclose($out);
// Display some constants that will need to be imported into the Arduino C code
// or PHP test program.
fputs($stderr, "Constants for C/C++:\n");
fputs($stderr, "#define WORDS_OFFSET 0\n");
fputs($stderr, "#define KEYS_OFFSET $wordTableEnds\n");
fputs($stderr, "#define OPTIONS_OFFSET $keyTableEnds\n");
fputs($stderr, "#define MAX_WORD_LENGTH $maxWordLength\n\n");
fputs($stderr, "Constants for PHP:\n");
fputs($stderr, "define('WORDS_OFFSET', 0);\n");
fputs($stderr, "define('KEYS_OFFSET', $wordTableEnds);\n");
fputs($stderr, "define('OPTIONS_OFFSET', $keyTableEnds);\n");
fputs($stderr, "define('MAX_WORD_LENGTH', $maxWordLength);\n\n");
// Clean up
fclose($stderr);
?>
Now a quick PHP test implementation to use the resulting table to generate new text:
<?php
/**
* Generate Markov chain output until forcibly stopped.
* This takes a set of data from markovchain.bin, which can be generated
* using the gentables.php script.
*
* Written by Alec Smecher. Please include a thank-you if you use this code
* for anything. Distributed under the GNU GPLv2 license.
*/
// These constants need to correspond to those given by gentables.php
define('WORDS_OFFSET', 0);
define('KEYS_OFFSET', 92600);
define('OPTIONS_OFFSET', 162100);
define('MAX_WORD_LENGTH', 20);
($fp = fopen('markovchain.bin', 'rb')) || die("Unable to open markovchain.bin; did you generate it?\n");
$keyId = 0; // Start at the beginning. (Where else?)
while (true) {
// Get the information for the current key.
fseek($fp, KEYS_OFFSET + ($keyId * 4));
extract(unpack('vfirstOptionId/vlastOptionId', fread($fp, 4)));
// From the options available for this key, choose one.
$optionId = rand($firstOptionId, $lastOptionId);
// Get the information for the chosen option (including the next key).
fseek($fp, OPTIONS_OFFSET + ($optionId * 4));
extract(unpack('vwordId/vkeyId', fread($fp, 4)));
// Display the word we just chose.
fseek($fp, WORDS_OFFSET + ($wordId * MAX_WORD_LENGTH));
echo trim(fread($fp, MAX_WORD_LENGTH)) . ' ';
}
fclose($fp);
?>
Just the length of the two listings should give some impression of how much work is saved on the Arduino side of things by pre-computing the tables.Now the Arduino implementation, including both the TVout code and the Markov chain code.
// Arduino code for the Dymaxion Auto-Matic Buckminster Fuller.
// Written by Alec Smecher. Please include a thank-you if you use this code
// for anything. Distributed under the GNU GPLv2 license.
// SD stuff
#include "Sd2PinMap.h"
#include "SdInfo.h"
#include "Sd2Card.h"
#define BLOCKSIZE 512
int CS_pin = 10; // Chip Select pin for the SD interface
Sd2Card c;
// TVOut stuff
#include <TVout.h>
#include "font6x4.h" // This is the rotated font
TVout TV;
// Markov constants from the gentables.php script
#define WORDS_OFFSET 0
#define KEYS_OFFSET 92600
#define OPTIONS_OFFSET 162100
#define MAX_WORD_LENGTH 20
// Buffers / useful runtime stuff
unsigned int keyId = 0;
char thisWord[MAX_WORD_LENGTH+1]; // +1 for null termination of longest word
// Cursor coordinates
unsigned char curx = 0, cury = 0;
// Print a character on the rotated display with a delay effect.
char print_c(char c) {
// Translate cursor position into rotated coordinates and print
TV.print_char((cury+1)*8, 72-(curx*8), c);
// Fake a typing-speed effect by delaying between characters
delay(random(200) + 200);
curx++; // Move the cursor along
if (curx>9) { // If the cursor is too far along...
curx=0; // reset it to the left and
cury++; // move the cursor down.
if (cury>11) { If the cursor is off the bottom of the screen,
cury=11; // move it back to the last row
TV.shift(8, LEFT); // and shift the contents of the display "up"
}
}
return c;
}
// Display a string on the rotated display with a delay effect.
void print_s(char *s) {
char c;
while (c = *s++) {
print_c(c);
}
}
void setup() {
// Prepare the TV display
TV.begin(NTSC,120,96);
TV.select_font(font6x4);
// Prepare the SD interface
pinMode(CS_pin, OUTPUT);
c.init(SPI_HALF_SPEED, CS_pin);
// Useful for debugging
Serial.begin(9600);
// Seed the random number generator
randomSeed(analogRead(0));
// Null-terminate thisWord (just in case we hit the longest word).
memset(thisWord, 0, MAX_WORD_LENGTH+1);
}
// Helper function to read a chunk of data off the SD card
void getData(Sd2Card *card, unsigned long o, char *s, unsigned int n) {
card->readData(o / BLOCKSIZE, o % BLOCKSIZE, n, (uint8_t *) s);
}
void loop() {
unsigned int numbers[2]; // Scratch pad
// Read from the keys table.
getData(&c, KEYS_OFFSET + (keyId * 4), (char *) numbers, sizeof(numbers));
// Option ids from numbers[0] to numbers[1] are available. Choose one.
unsigned int optionId = random(numbers[1] - numbers[0] + 1) + numbers[0];
// Read from the options table.
getData(&c, OPTIONS_OFFSET + (optionId * 4), (char *) numbers, sizeof(numbers));
// Numbers now contains wordId, keyId.
keyId = numbers[1]; // Store key ID for next loop
// Read the word we've selected and display it. (Both serial and TV.)
getData(&c, WORDS_OFFSET + (numbers[0] * MAX_WORD_LENGTH), thisWord, MAX_WORD_LENGTH);
Serial.print(thisWord);
Serial.print(" ");
print_s(thisWord);
print_s(" ");
}
Just for giggles, one last script -- this one used to generate the rotated font, using as its basis one of the fonts from the TVout library:
<?php
/**
* Quick hack to rotate a TVout font. Pass in the C code via stdin and
* receive the resulting C code on stdout.
* Written by Alec Smecher. Please include a thank-you if you use this code
* for anything. Distributed under the GNU GPLv2 license.
*/
$in = fopen('php://stdin', 'r');
$out = fopen('php://stdout', 'w');
$pixbuf = array();
while (($l = fgets($in)) !== false) {
// Watch for lines containing only "0b........"
if (preg_match('/^\w*0b(.*),$/', trim($l), $matches)) {
$pixbuf[] = $matches[1];
} else {
// If we hit a line that doesn't match, rotate and dump what
// we have in the pixel buffer.
if (!empty($pixbuf)) {
for ($i=strlen($pixbuf[0])-1; $i>=0; $i--) {
fputs($out, "0b");
for ($j=0; $j<count($pixbuf); $j++) {
fputs($out, $pixbuf[$j][$i]);
}
fputs($out, "00,\n");
}
$pixbuf = array();
}
fputs($out, $l);
}
}
fclose($in);
fclose($out);
?>
Forgive the lack of pretty-printing and happy hacking!
No comments:
Post a Comment
Note: Only a member of this blog may post a comment.