Wednesday, February 22, 2012

Addendum to Bucky TV: the source code

I've had a couple of requests to post the source code for the Dymaxion Auto-Matic Buckminster Fuller, so here's a quick follow-up. Hey, it's probably the most effective backup mechanism I've used recently.


There are a few separate programs. The majority of the work was in the code to generate the Markov chain content. This was split into two parts -- the content was partially digested on my desktop computer so that the demands on the Arduino CPU could be minimized.

First the code to digest the text input:
<?php

/**
* This script takes input from stdin and uses it to generate a set of tables
* that can be used to efficiently produce Markov chain output.
*
* It produces three tables, all stored contiguously in the same binary file:
* - Words. Each word is a fixed-length string, presented in order of word ID.
*   For example, if the longest word is 20 characters, word ID 10 will be
*   found in the file between bytes 200 and 219. Null terminators are not
*   included but shorter words are padded up to size using nulls.
* - Keys. Each key represents a specific sequence of $n (only tested with 2)
*   words. For example, "is a" is a single key and the its key ID is used to
*   represent all occurrences of "is a" in the text. Each key entry contains
*   two little-endian 16-bit integers representing, respectively, the first
*   and last option IDs that can follow this key. Keys are organized in
*   sequence, starting with key ID 0, and each key entry is 4 bytes long; thus
*   it is possible to find a key entry from its ID by calculating the offset
*   from the end of the Words table.
* - Options. Each option represents a possible word that can follow a given
*   sequence of words (key). For example, the key "is a" may have option
*   entries that refer to the words "housecat", "newsworthy", "limp", and
*   "limp", if the input text contained the phrases "is a housecat", "is a
*   newsworthy", and "is a limp" (the last appearing twice). Each option
*   entry contains two little-endian 16-bit integers representing,
*   respectively, the word ID for this option, and the key ID to use in the
*   next iteration. (If one of the two "limp" options is chosen randomly,
*   the word ID will refer to the single word entry for "limp" and the key ID
*   will refer to the single key entry for "a limp", which can then be used
*   to repeat the process for the next word.)
*
* Written by Alec Smecher. Please include a thank-you if you use this code
* for anything. Distributed under the GNU GPLv2 license.
*/

// Consider a "history" of 2 words when pondering what next to say. This has
// not been tested with n <> 2; you'll at least have to modify the content
// generator.
$n = 2;

// $history stores the history as an array of word IDs. Initialize it.
$history = array_fill(0, $n, null);

// Keep a few simple statistics
$maxWordLength = $maxKeyLength = 0;

// Initialize the tables that hold the bulk of the data.
$words = array(); // word => wordIndex (inverted index)
$keys = array(); // key => keyIndex (inverted index)
$options = array(); // optionIndex => array(...option data...)

$in = fopen('php://stdin', 'r');
while (($l = fgets($in)) !== false) {
    foreach (explode(' ', $l) as $word) {
        // Find the word index in the $words list, creating it if needed
        if (!isset($words[$word])) {
            $wordIndex = count($words);
            $words[$word] = $wordIndex;
            } else {
            $wordIndex = $words[$word];
        }
        
        // Keep track of the longest word length
        $maxWordLength = max($maxWordLength, strlen($word));
        
        // Build the history into a string key
        $key = join($history, ' ');
        
        // Find the key index in the $keys list, creating it if needed
        if (!isset($keys[$key])) {
            $keyIndex = count($keys);
            $keys[$key] = $keyIndex;
            $maxKeyLength = max($maxKeyLength, strlen($key));
            } else {
            $keyIndex = $keys[$key];
        }
        
        // Store this key index as the next key index from the last iteration.
        // This is funky reference stuff.
        $someKeyIndex = $keyIndex;
        unset($someKeyIndex); // Clear the reference
        
        // Maintain the $options list.
        $someKeyIndex = 0; // We will back-fill this in the next iteration.
        $options[] = array(
        'wordIndex' => $wordIndex,
        'thisKeyIndex' => $keyIndex,
        'nextKeyIndex' => &$someKeyIndex // by reference
        );
        
        // Shift the history along for the next iteration.
        for ($i=1; $i<$n; $i++) {
            $history[$i-1] = $history[$i];
        }
        $history[$n-1] = $wordIndex;
    }
}
fclose($in);

// Use stderr for helpful output
$stderr = fopen('php://stderr', 'w');

// We've generated the tables. Now present some statistics.
fputs($stderr, 'Number of words: ' . count($words) . "\n");
fputs($stderr, 'Number of keys: ' . count($keys) . "\n");
fputs($stderr, "Maximum word length: $maxWordLength\n");
fputs($stderr, "Number of options: " . count($options) . "\n");

// Sort the options array by current key. This will mean the
// options table will contain contiguous blocks each containing all the options
// for each key.
usort($options, create_function('$a, $b', 'return $b[\'thisKeyIndex\']-$a[\'thisKeyIndex\'];'));

// Find out the first and last option IDs for each key.
$keyFirstOptions = $keyLastOptions = array();
$lastKeyId = null;
foreach ($options as $optionId => $option) {
    if ($lastKeyId != $option['thisKeyIndex']) {
        $keyFirstOptions[$option['thisKeyIndex']] = $optionId;
        $lastKeyId = $option['thisKeyIndex'];
    }
    $keyLastOptions[$option['thisKeyIndex']] = $optionId;
}

// The tables are all generated. Start saving the output to a file.

$out = fopen('markovchain.bin', 'w');

// 1. Write out the table of words. This can later be looked up starting at bytes [index * maxWordLength]
foreach ($words as $word => $wordIndex) {
    fputs($out, pack('a' . $maxWordLength, $word));
}
$wordTableEnds = ftell($out);
fputs($stderr, 'Word table: 0 - ' . ($wordTableEnds-1) . " bytes\n");

// 2. Write out the key index.
foreach ($keys as $key => $keyIndex) {
    // Find and store the first and last occurrences of the key in the option set.
    fputs($out, pack('v', $keyFirstOptions[$keyIndex]));
    fputs($out, pack('v', $keyLastOptions[$keyIndex]));
}
$keyTableEnds = ftell($out);
fputs($stderr, "Key table: $wordTableEnds - " . ($keyTableEnds-1) . " bytes\n");

// 3. Write out the options.
foreach ($options as $optionData) {
    fputs($out, pack('v', $optionData['wordIndex']));
    fputs($out, pack('v', $optionData['nextKeyIndex']));
}
$optionTableEnds = ftell($out);
fputs($stderr, "Option table: $keyTableEnds - " . ($optionTableEnds-1) . " bytes\n\n");

// Done writing output.
fclose($out);

// Display some constants that will need to be imported into the Arduino C code
// or PHP test program.
fputs($stderr, "Constants for C/C++:\n");
fputs($stderr, "#define WORDS_OFFSET 0\n");
fputs($stderr, "#define KEYS_OFFSET $wordTableEnds\n");
fputs($stderr, "#define OPTIONS_OFFSET $keyTableEnds\n");
fputs($stderr, "#define MAX_WORD_LENGTH $maxWordLength\n\n");
fputs($stderr, "Constants for PHP:\n");
fputs($stderr, "define('WORDS_OFFSET', 0);\n");
fputs($stderr, "define('KEYS_OFFSET', $wordTableEnds);\n");
fputs($stderr, "define('OPTIONS_OFFSET', $keyTableEnds);\n");
fputs($stderr, "define('MAX_WORD_LENGTH', $maxWordLength);\n\n");

// Clean up
fclose($stderr);

?>

Now a quick PHP test implementation to use the resulting table to generate new text:

<?php

/**
* Generate Markov chain output until forcibly stopped.
* This takes a set of data from markovchain.bin, which can be generated
* using the gentables.php script.
*
* Written by Alec Smecher. Please include a thank-you if you use this code
* for anything. Distributed under the GNU GPLv2 license.
*/

// These constants need to correspond to those given by gentables.php
define('WORDS_OFFSET', 0);
define('KEYS_OFFSET', 92600);
define('OPTIONS_OFFSET', 162100);
define('MAX_WORD_LENGTH', 20);

($fp = fopen('markovchain.bin', 'rb')) || die("Unable to open markovchain.bin; did you generate it?\n");
$keyId = 0; // Start at the beginning. (Where else?)

while (true) {
    // Get the information for the current key.
    fseek($fp, KEYS_OFFSET + ($keyId * 4));
    extract(unpack('vfirstOptionId/vlastOptionId', fread($fp, 4)));
    
    // From the options available for this key, choose one.
    $optionId = rand($firstOptionId, $lastOptionId);
    
    // Get the information for the chosen option (including the next key).
    fseek($fp, OPTIONS_OFFSET + ($optionId * 4));
    extract(unpack('vwordId/vkeyId', fread($fp, 4)));
    
    // Display the word we just chose.
    fseek($fp, WORDS_OFFSET + ($wordId * MAX_WORD_LENGTH));
    echo trim(fread($fp, MAX_WORD_LENGTH)) . ' ';
}

fclose($fp);

?>
Just the length of the two listings should give some impression of how much work is saved on the Arduino side of things by pre-computing the tables.

Now the Arduino implementation, including both the TVout code and the Markov chain code.

// Arduino code for the Dymaxion Auto-Matic Buckminster Fuller.
// Written by Alec Smecher. Please include a thank-you if you use this code
// for anything. Distributed under the GNU GPLv2 license.

// SD stuff
#include "Sd2PinMap.h"
#include "SdInfo.h"
#include "Sd2Card.h"
#define BLOCKSIZE 512
int CS_pin = 10; // Chip Select pin for the SD interface
Sd2Card c;

// TVOut stuff
#include <TVout.h>
#include "font6x4.h" // This is the rotated font
TVout TV;

// Markov constants from the gentables.php script
#define WORDS_OFFSET 0
#define KEYS_OFFSET 92600
#define OPTIONS_OFFSET 162100
#define MAX_WORD_LENGTH 20

// Buffers / useful runtime stuff
unsigned int keyId = 0;
char thisWord[MAX_WORD_LENGTH+1]; // +1 for null termination of longest word

// Cursor coordinates
unsigned char curx = 0, cury = 0;

// Print a character on the rotated display with a delay effect.
char print_c(char c) {
    // Translate cursor position into rotated coordinates and print
    TV.print_char((cury+1)*8, 72-(curx*8), c);
    
    // Fake a typing-speed effect by delaying between characters
    delay(random(200) + 200);
    
    curx++; // Move the cursor along
    if (curx>9) { // If the cursor is too far along...
        curx=0; // reset it to the left and
        cury++; // move the cursor down.
        if (cury>11) { If the cursor is off the bottom of the screen,
            cury=11; // move it back to the last row
            TV.shift(8, LEFT); // and shift the contents of the display "up"
        }
    }
    return c;
}

// Display a string on the rotated display with a delay effect.
void print_s(char *s) {
    char c;
    while (c = *s++) {
        print_c(c);
    }
}

void setup() {
    // Prepare the TV display
    TV.begin(NTSC,120,96);
    TV.select_font(font6x4);
    
    // Prepare the SD interface
    pinMode(CS_pin, OUTPUT);
    c.init(SPI_HALF_SPEED, CS_pin);
    
    // Useful for debugging
    Serial.begin(9600);
    
    // Seed the random number generator
    randomSeed(analogRead(0));
    
    // Null-terminate thisWord (just in case we hit the longest word).
    memset(thisWord, 0, MAX_WORD_LENGTH+1);
}

// Helper function to read a chunk of data off the SD card
void getData(Sd2Card *card, unsigned long o, char *s, unsigned int n) {
    card->readData(o / BLOCKSIZE, o % BLOCKSIZE, n, (uint8_t *) s);
}

void loop() {
    unsigned int numbers[2]; // Scratch pad
    // Read from the keys table.
    getData(&c, KEYS_OFFSET + (keyId * 4), (char *) numbers, sizeof(numbers));
    
    // Option ids from numbers[0] to numbers[1] are available. Choose one.
    unsigned int optionId = random(numbers[1] - numbers[0] + 1) + numbers[0];
    
    // Read from the options table.
    getData(&c, OPTIONS_OFFSET + (optionId * 4), (char *) numbers, sizeof(numbers));
    
    // Numbers now contains wordId, keyId.
    keyId = numbers[1]; // Store key ID for next loop
    
    // Read the word we've selected and display it. (Both serial and TV.)
    getData(&c, WORDS_OFFSET + (numbers[0] * MAX_WORD_LENGTH), thisWord, MAX_WORD_LENGTH);
    Serial.print(thisWord);
    Serial.print(" ");
    print_s(thisWord);
    print_s(" ");
}
Just for giggles, one last script -- this one used to generate the rotated font, using as its basis one of the fonts from the TVout library:

<?php

/**
* Quick hack to rotate a TVout font. Pass in the C code via stdin and
* receive the resulting C code on stdout.
* Written by Alec Smecher. Please include a thank-you if you use this code
* for anything. Distributed under the GNU GPLv2 license.
*/

$in = fopen('php://stdin', 'r');
$out = fopen('php://stdout', 'w');

$pixbuf = array();
while (($l = fgets($in)) !== false) {
    // Watch for lines containing only "0b........"
    if (preg_match('/^\w*0b(.*),$/', trim($l), $matches)) {
        $pixbuf[] = $matches[1];
        } else {
        // If we hit a line that doesn't match, rotate and dump what
        // we have in the pixel buffer.
        if (!empty($pixbuf)) {
            for ($i=strlen($pixbuf[0])-1; $i>=0; $i--) {
                fputs($out, "0b");
                for ($j=0; $j<count($pixbuf); $j++) {
                    fputs($out, $pixbuf[$j][$i]);
                }
                fputs($out, "00,\n");
            }
            $pixbuf = array();
        }
        fputs($out, $l);
    }
}

fclose($in);
fclose($out);

?>
Forgive the lack of pretty-printing and happy hacking!

No comments:

Post a Comment

Note: Only a member of this blog may post a comment.