Browse Source

truncate history to early 2019
Leonora Tindall 11 months ago
Signed by: nora GPG Key ID: 7A8B52EC67E09AAF
  1. 12
  2. 0
  3. 8
  4. 9
  5. 6
  6. 52
  7. 102
  8. 19
  9. 22
  10. 27
  11. 13
  12. 3
  13. 30
  14. 22
  15. 65
  16. 326
  17. 167
  18. 239
  19. 30
  20. 23
  21. 25
  22. 39
  23. 157
  24. 45
  25. 57
  26. 28
  27. 98
  28. 136
  29. 22
  30. 205
  31. 301
  32. 79
  33. 30
  34. 18
  35. 31
  36. 18
  37. 23
  38. 47
  39. 37
  40. 256
  41. 84
  42. 3
  43. 210
  44. 37
  45. 875
  46. 357
  47. 173
  48. 405
  49. 14
  50. 181
  51. 65
  52. 16
  53. 3
  54. 6
  55. 2
  56. BIN
  57. BIN
  58. BIN
  59. BIN
  60. BIN
  61. BIN
  62. BIN
  63. BIN
  64. BIN
  65. BIN
  66. BIN
  67. BIN
  68. BIN
  69. BIN
  70. BIN
  71. BIN
  72. BIN
  73. BIN
  74. BIN
  75. BIN
  76. BIN
  77. BIN
  78. BIN
  79. BIN
  80. BIN
  81. 83
  82. BIN
  83. 1
  84. 18
  85. 4
  86. 7
  87. 41
  88. 20
  89. 16
  90. 32
  91. 10
  92. 24
  93. 23
  94. 28
  95. 281
  96. 1
  97. 1
  98. BIN
  99. 3
  100. 13


@ -0,0 +1,12 @@
# Hugo default output directory
## OS Files
# Windows



@ -0,0 +1,8 @@
"spellright.language": "English (American)",
"spellright.documentTypes": [


@ -0,0 +1,9 @@
## My personal blog's source code!
![screenshot of the website](static/images/screenshot.png)
This is the source code for my personal website, at [](
It is written in Markdown (and HTML/CSS for styling) with minimal JavaScript (just for
better syntax highlighting) and is built using Hugo, the Go static site generator.


@ -0,0 +1,6 @@
title: "{{ replace .TranslationBaseName "-" " " | title }}"
date: {{ .Date }}
draft: true


@ -0,0 +1,52 @@
baseURL = ""
languageCode = "en-us"
title = "Patterns in Cyberspace"
theme = "loma"
# dateFormat = "date -u +'%Y-%m-%d %R:%S%:z'"
# Allow Hugo to munge, e.g., "Programming Stuff" into "programming-stuff"
preserveTaxonomyNames = false
# Allow Hugo to rewrite URLs so they work on IPFS as well as locally and via URL
relativeURLs = true
# Highlighting
pygmentsStyle = "fruity"
pygmentsUseClassic = false
pygmentsCodefences = true
category = "categories"
post = "/post/:slug"
tutorial = "/tutorial/:slug"
name = "Blog"
url = "/post/"
weight = 1
name = "Tutorials"
url = "/tutorial/"
weight = 2
name = "Projects"
url = "/projects/"
weight = 3
name = "RSS"
url = "/index.xml"
weight = 1
description = "Leonora Tindall's personal website"
author = "Leonora Tindall"
MetaDescription = "Leonora Tindall's personal website, regarding programming, hacking, and politics."
footer = "Text: [CC-BY-SA]( ≥ 4 | Code: [AGPL]( ≥ v3 | Font: [Bitter ht](, [OFL]( | <img class='inline' alt='GNU Natalie Nguyen' src='/images/tipsytentacle.gif' /> | [No AMP]("


@ -0,0 +1,102 @@
title: Home
{{< div "intro-paragraph" >}}
I'm Leonora.
I'm a full time student in computer science at [Beloit](
I build and break open source software for fun.
{{< /div >}}
{{< div "info-container" >}}
{{< div "resume" >}}
## 💼 Experience
**Software Engineering Intern**
{{< div "job-info" >}}
**CancerIQ, Inc. Summer 2018.**
I developed user interface components and designed algorithms used to analyze health data,
as well as working with DevOps technologies including Kubernetes, Apache Kafka, Prometheus,
and Grafana.
{{< /div >}}
**Software Engineering Intern**
{{< div "job-info" >}}
**GudTech, Inc. Summer 2017.**
I used agile methodology to work with a small team, gaining experience with Go and the
inner workings of Docker in a Service Oriented Architecture environment. I created a
developer tooling used for onboarding and external (SDK) development.
{{< /div >}}
This is a selection of my experience. See my [full resume](/resume.pdf) for more,
including volunteer positions and college work experience.
{{< /div >}}
{{< div >}}
## 💻 Code
I write systems code in [**Rust**](/categories/rust) and **Go**,
scripts and utilities in [**Python**](/categories/python) and **Lua**, and
websites in HTML5, CSS, JavaScript, and TypeScript.
## ⚛ Projects
- Contributor to [Open Energy Dashboard]( (React/Redux)
- [rloris]( - Rust implementation of layer 7 HTTP DoS attacks
- [Evolve SBrain]( — Rust genetic programming engine
- [RandomUA]( — browser extension to enhance privacy
- [workctl]( — Rust library for multithreaded programming
- [libUI-rs]( — Rust bindings to platform APIs for GUI apps
- [More...](/projects)
{{< /div >}}
{{< div >}}
## ⚖ Ethics
It is the duty of software engineers to build software in an ethical way. In short:
- computer systems should serve their users before their makers
- software should empower users, not restrict them
- users should decide how data is used before engineers or businesspeople
Whether it means embracing software freedom or another ideology, or going by gut feeling,
we must proceed as best we can.
{{< /div >}}
{{< div >}}
## 🎓 Tutorials
I've written tutorials on [type systems](/tutorial/a-gentle-introduction-to-practical-types/) and what can be done with them (for instance, [session types](/tutorial/session-types/)), [x86_64 binary reverse engineering](/tutorial/an-intro-to-x86_64-reverse-engineering/), and application security topics on [embedding malware in PDF files](/post/pdf-embedding-attacks/).
## 💬 Culture & Practices
The culture, business, and sociology of technology fascinate me.
I've written about topics ranging from [hacker superstitions](/post/hacker-superstitions/) and how spending way too much time customizing my desktop is [productive, actually](/post/modding-vim-i3-and-efficiency/) to [repairable hardware](/post/i-repaired-my-headphones/) and [free software](/post/open-source-for-normal-people/) and [why it matters](/post/a-story-about-my-personal-trainer/), for individuals and for [society](/post/deletefacebook-and-fosta/) as a whole.
{{< /div >}}
{{< div "recent-posts" >}}
## 📓 Recent Posts
{{< recent >}}
{{< /div >}}
{{< div >}}
## 🗺 Around the 'Net
You can find me:
- On the <a href="" rel="me">Fediverse</a>.
- On <a href="" rel="me">GitHub</a>.
- On <a href="" rel="me">Patreon</a>.
- At nora AT nora DOT codes.
- Everywhere, with my <a href="/leonoratindall.asc" rel="pgpkey authn">PGP Key</a>.
{{< /div >}}
{{< /div >}}


@ -0,0 +1,19 @@
title: Meanings
Eli Johnson, 2018.
Final project for Matt Vadanais's Creative Writing (ENGL 250), spring 2018.
Runtime: 20ish minutes.
[Psalm 151](/eli/p151.ogg)
[Heaven's Gate](/eli/hg.mp3)
Hosted with ❤ by Leo Tindall.


@ -0,0 +1,22 @@
date: 2017-06-16
title: BattleDome VR, a Review
slug: battledome-vr-a-review
- Video Games
- Virtual Reality
description: One of the best VR games I've played, and why it's so great.
I hadn't really decided whether or not I liked _Battle Dome_ until I punched a wall with my Vive wand while trying to poke my rifle out from behind cover to distract the sniper that was killing all my teammates. Then I decided that I liked it very, very much.
_Battle Dome_, available on [Steam](, is a 5v5 first person shooter that manages to combine all the best elements of second-generation teathered VR with a solid core of multiplayer shooter gameplay. It's not a particularly innovative game in the way _Quanero_ or _Accounting_ are; its graphics are decent, but not mindblowing, and it has no story to speak of. Upon starting the game, you're presented with a "lobby" area in which you can try out any of the vast array of weapons, from pistols to laser rifles to grenade launchers (each one of which has seperately tracked personal stats) and, when ready, join or create a game.
There are a number of gamemodes available, from traditional FPS deathmatch to an interesting cooperative horror mode. There is also a lot of variation between maps, even in mechanics; some allow players to alter gravity, some have jet-packs available, and many are so called "paint" maps. This is where one of the most interesting mechanics comes into play; in order to cater to both players who enjoy free movement in vr and those who get motion sick from the difference between percieved movement and inner-ear inertial measurement, the game allows both teleportation _and_ trackpad-based free movement, but on paint maps, players can only teleport to places where their teams color has been painted. This leads to some very exciting _Splatoon_-style paint raids. In one of my recent matches, two players with a machine gun and a grenade launcher guarded me while I used dual paint guns to spray a path to the enemy base so we could destroy their core in the attack-defense mode.
The huge selection of weapons in the game is one of its greatest features. Three basic types of weapons exist: bullet-firing, plasma, and laser weapons, each of which has unique advantages and disadvantages. There are large and small guns, too, from pistols and one-handed automatic weapons to assault rifles, light machine guns, and even a huge-scoped sniper rifle, and all of them are designed for VR. Every one, even the huge rocket launcher, can be weilded one-handed, but many of the larger ones have an optional two-handed mode which I've found increases both accuracy and immersion. There are more unique weapons, too, like the multi-use grenade launcher, which has utility grenades such as smoke, and the ricochet gun, whose multicolored rounds bounce off walls, ceiling, and floor to devastating effect. In addition, the shortsword and lance can be paired with the energy shield for that authentic space-Spartan feel.
I do have a few complaints. The graphics, especially the character models, can be somewhat offputting, with low-resolution textures and weapons floating in mid-air. The teleporting movement is balanced using a cooldown timer which isn't adjusted based on distance, so tiny adjustments freeze you in place for just as long as full-range hops. There are a lot of maps, but each has different features enabled, like paint, gravity control, and jetpacks; it would be very nice to be able to enable and disable those features at will, for maps that support them.
That's not to say you shouldn't play it, though. _Battle Dome_, like the original _Unreal Tournament_, is an well-built, solid game that, while not particularly innovative, combines most of the mechanics of a new medium to create a game that can be learned in a few minutes but has nearly infinite replayability. I give it an 8/10.


@ -0,0 +1,27 @@
date: 2017-05-27
title: Quanero VR, a Review
slug: quanero-vr-a-review
- Video Games
- Virtual Reality
_Quanero VR_, available from “Laserboys3000” on Steam for nothing, is the best of the many VR experiences I’ve encountered so far.
This isn’t because it’s particularly beautiful (it’s graphics are competent, much more so than those I’m able to create, but not on a AAA level) nor because it’s gameplay is particularly well designed (in fact, it has almost no “gameplay”). Quanero is amazing because it’s the first truly “player-motivated” VR experience I’ve seen.
VR provides the ability to create almost perfectly immersive virtual worlds, and Quanero demonstrates how to use that immersion to tap into intrinsic motivations of players - in this case, mostly curiosity and a little bit of completionism. Quanero presents a tiny virtual world - just a terrace and small bar - with the hints of a large, cyberpunky, Blade Runner-esque world around it.
The experience begins with a little minor explanation. You are informed that you’re a detective and that you can run time forwards and backwards, but you may only observe, not alter things. This sets the perfect stage for pure exploration, since the player’s actions can’t have any consequences. You’re then shown an array of thumbnails, blurred out, of events and interactions you haven’t yet seen. That sets up the completionism; what player could resist at least trying to fill those in?
You’re dropped onto the terrace and, running the time forward with the right wand’s trigger, you see a peaceful scene rudely interrupted by a massive explosion from the outdoor grill, which leads to a dramatic rescue and a fistfight. Personally, I was very curious as to why the explosion occurred, so I teleported myself over to the grill and ran time backward, forward, backward again - and this was where the experience really got immersive for me. I saw a flash of green just before the explosion and I was hooked.
Long story short (and less spoilerific), I spent about twenty minutes figuring out what the green flash was, then where it had come from, and then piecing together the whole story. The scene is maybe five minutes long, but it took me much longer to discover the true sequence of events, and the experience was utterly compelling throughout, without any extrinsic motivation. Looking at that fistfight in great detail didn’t unlock a new level or improve my combat stats - it was just interesting. That’s the true promise VR, in my opinion - it allows game designers to engage players in a much more immersive and intrinsic manner, and to drop many of the traditional gimmicks and Skinner-box tactics that motivate players in traditional games.
This is not to say that VR games automatically (or necessarily) tap into these motivations, however. For instance, Raw Data from Survios Inc. is a traditional first person combat game that smoothly integrates the unique traits of VR without drastically changing the formula; the player’s motivation basically comes from the infinite cycle of beat up robots, get better at beating up robots, find more and more interesting robots to beat up, repeat. That doesn’t make it a bad game; in fact, Raw Data is one of my favorite VR experiences. It just isn’t revolutionary in the way Quanero VR is.
My ultimate wishlist VR item is a longer, more intense, more complex, and more diverse version of Quanero in which the player must find clues in one part of the environment to understand events in other parts. In Quanero, this essentially only happens with the strange green flash that causes the explosion; in a larger game, there could be more such events and more levels of uncertainty. There could also be a few more people who weren’t ripped white dudes - and no, one blue alien, one skinny female pool player with no characterization, and a fat dude who dies right away don’t count.
So, since this seems to have become a review, I guess I'll give a score. Quanero gets 8/10; not perfect, but pretty damn good, especially given the price.


@ -0,0 +1,13 @@
date: 2017-11-06 16:08:38+00:00
slug: resources-for-a-new-rustacean
title: Resources for a New Rustacean
draft: true
- Programming
- Rust


@ -0,0 +1,3 @@
title: Blog


@ -0,0 +1,30 @@
title: "A Methodology for Fontconfig Editing"
date: 2018-03-07T11:16:38-06:00
- Linux
- Open Source
- Modding
description: Making the font cascade behave as you want can be kind of difficult, but it's not impossible, especially with the right methodology and mindset.
One of the hardest parts of building beautiful Linux systems is fonts. Font precedence on Linux is generally handled with [fontconfig](
In essence, `fontconfig` is used to permit many fonts to be installed and uninstalled over time without breaking applications which specify a font or font family, while letting users configure which fonts are used when a missing font, font family, or missing glyph is requested.
This is really useful piece of technology; having a defined configuration system for which fonts are used in which scenarios is a boon for configurability, but fontconfig has no real GUI editors or usable interactive configuration tools. Users are expected to manually edit XML configuration files.
As with most Unix styling topics, Eevee has [a great piece]( on fontconfig's complexities. She digs into how to disable and re-configure fonts, how to set fallbacks, and how to verify that the correct resolution order is set. Fontconfig relies on a set of config files, generally in `/etc/fonts/conf.d`, which are loaded in alphabetical order. These are usually prefixed with a number, so it's easy to determine the order.
Unfortunately, it can be very complex to determine where a specific font or option is configured. In my recent case, I wanted to switch from `DejaVu` as my default to `Bitstream Vera`, and I spent the better part of an hour flipping around different files changing mentions of `Deja Vu X` to `Bitstream Vera X`.
Eventually, I settled on the following methodology:
1. Identify problematic resolution result (either by observing it or using `fc-match -s`).
1. In `/etc/fonts/conf.d`, use `grep` or `rg` to search for the incorrectly resolved font (e.g. `rg DejaVu .*`).
1. Open highest-numbered file with a match. For me, this was `69-language-selector-zh-tw.conf`.
1. Determine whether or not this config file is causing the problematic match. In the case of `69-language-selector-zh-tw.conf`, it was only selecting DejaVu Sans Mono for language `zh-tw`, which is actually correct as Bitstream Vera Mono doesn't include `zh-tw` glyphs.
1. If that file might be causing the problematic match, modify it.
1. Check if the problematic resolution still occurs (using fc-match). If so, repeat.
I've been very successful with this methodology so far. In my specific case, I had to modify `56-emojione.conf`, which was setting the default serif, sans serif, and monospace fonts to resolve to DejaVu followed by Emoji One.


@ -0,0 +1,22 @@
date: 2016-06-01 04:39:51+00:00
slug: a-story-about-my-personal-trainer
title: A Story About My Personal Trainer
- Open Source
- Culture
description: Microsoft's forced updates caused major issues for a small business I used to work with.
This is a story about a woman who runs her own small business as a personal trainer. She uses Microsoft Publisher to create her at-home training programmes and Excel for keeping track of payments, et cetera. She has a Win XP computer from 2004.
I upgraded said computer to Windows 7 at XP EOL and replaced her ancient GPU so that she could use two monitors. I also got her to use LibreOffice Calc instead of MS Excel, but she loves her Publisher, so we decided it wasn't worth switching to Linux. This has been working just fine for several years, and there was no reason to believe she'd have to spend a cent on the thing for years to come. Until last week.
Last week, Microsoft decided it would be fitting to install Windows 10 without her permission. (And don't give me the excuse that she implicitly accepted it by closing the window or whatever, that's bullshit. She was very clear on that.) Her Publisher software immediately stopped working, and she phoned me. I told her not to worry and scheduled a visit so that I could reinstall 7. I backed up her user directory and went to install Windows 7. The key I'd sold her didn't work. I phoned Microsoft and was told to wait, then told that the key was not valid. Luckily, I had another key, so I used that one to install Windows 7. I ran Ninite, jimmied the ancient CD-ROM drive to install Publisher, and restored the backup. All was going well. She paid me for the new Windows license and my time, and took her computer back to her house, plugged it in, and began working.
Twenty minutes later, the computer froze and she was forced to hard reboot. It is now not working, and she'll have to bring it by _again_ so I can figure out what bootloader rubbish Windows 10 did that is causing it to have some kind of intermittent fault. In the mean time, she's basically unable to work.
When we say "user choice", this is what we mean. Not some nebulous idea of "freedom" or extreme Stallman-level trust, but rather the ability to choose whether you want to risk screwing up your perfectly working system with an upgrade. And, Microsoft? This is unacceptable, period.
> A note, two years on. Windows 10 doesn't suck as much as it did - for the most part - but the business practices it is a part of haven't changed. Microsoft thinks it can treat paying customers as beta testers, shipping code that deletes important software and files or renders machines unbootable and forcing users to accept it with no repercussions. So far, as consumers and as an industry, we have failed to prove them wrong. We need to do better.


@ -0,0 +1,65 @@
title: "#DeleteFacebook and FOSTA/SESTA"
slug: "deletefacebook-and-FOSTA"
date: 2018-03-28T11:16:18-05:00
- Culture
- Privacy
- Open Source
description: The fallout of the FOSTA and SESTA bills and recent revelations about Facebook's unethical business practices have made it clear that users need to retake control over our data.
We, each and every one of us, need to make the decision to move to free, open source, and **decentralized** online services.
It will be painful. It will be difficult. It may mean giving up some comforts, like sending money instantly to friends without fees.
It is also the only way to prevent some seriously bad things from happening.
## Inciting Events
In recent weeks, two major things happened:
* Facebook's business model - gathering as much information about you as possible, then selling it - was used in [a totally predictable way]( that, nonetheless, nobody seemed prepared for. Cambridge Analytica paid 270,000 users and ended up with data from **50 million**, none of whom consented to that data being used. They then targeted vulnerable individuals in a (successful) attempt to play on their fears and get Donald Trump elected president.
* The United States government passed two laws, FOSTA and SESTA, which are completely ass-backwards attempts to combat sex trafficking that actually [severly harm victims and sex workers]( In response, many web services are [cracking down]( on _any_ sexually explicit content, including articles written about the porn industry and its issues.
These are not good things, but, and this may be hard to hear: **they are entirely our fault, as Web users**.
## How Could This Happen?
A lot of people are running around with their hair on fire, wondering, "how could this happen?". The answer is simple: **we trusted large, centralized, profit-driven services with everything we care about** - our photos, our messages, our opinions, our livelihoods, and even our relationships.
Had Facebook not been the primary way for hundreds of millions of people to communicate and express their opinions, Cambridge Analytica could not have used the fears and anxieties expressed in those communications to manipulate voters.
Had we not become reliant on Skype and Google Drive to carry out business and pleasure, FOSTA and SESTA would not be hitting sex workers and survivors of rape so hard.
Had we interrogated, for even one instant, what it was that funded and powered the "free" online services we entrusted with our social and professional lives, we would have glimpsed the writhing horror we were building, and turned away in disgust.
## Is There Any Hope?
Fortunately, there is a light at the end of the tunnel. Those of us who _did_ understand the true nature of "free" services like Facebook have been building a solution.
This solution is **decentralized services**. This means that anyone - any organization or individual with Internet access - can participate, not just as a user but as a service provider.
One of the key pieces of software here is called [Mastodon]( It is a microblogging platform, like Tumblr or Twitter, but rather than every user handing over their data to Twitter-the-company, there are thousands of "Mastodon sites". There is []( and [](, []( and []( The crucial feature is this: **if you have an account on (or any other Mastodon site), and I have one on (or any other Mastodon site), I can talk to you and you can talk to me**.
> If you want to see a sample Mastodon profile, check out mine: [](
Let's say I join with all my friends, but the people who run the site don't agree with me - perhaps they don't adequately protect me from harassment, and I want to move to []( which is more aggressively moderated. With a traditional, centralized model, I would have to convince my friends to come with me, or be resigned to losing contact. **With decentralized social media, you stay connected even if you move service providers**.
That means that the problems mentioned above would be very, very short-lived. One Mastodon site (or "instance" as they're called) starts doing shady things with user data. **All the users can simply move to another instance with practically zero effort.**. The network of instances can also cut off instances that are sources of spam or harassment, and that decision is up to the moderators and admins for each instance. **If you don't like the decisions the moderation team of your instance makes, you can move to another, more agreeable one with almost no effort.**
> Remember, there have been massive issues with moderation on Twitter. Those issues are largely solved by smaller, more tailored moderation teams, which Mastodon enables.
While Mastodon is modelled after Twitter, the same underlying systems are being applied to Facebook-like and Tumblr-like sites. Even more interestingly, **these will be able to talk to each other**. Imagine being able to see Twitter posts in Facebook, or Facebook posts on Tumblr. This further reduces lock-in. Don't like the Mastodon format? Move to Aardwolf or Pleroma or ... wherever. You will still be connected to your friends.
<iframe width="560" height="315" src="" frameborder="0" allow="autoplay; encrypted-media" allowfullscreen></iframe>
## What You Need To Do
There are a few things you can do to help.
1. Make a Mastodon account. []( will help you get started. Then, **aggressively convince your friends to switch from Facebook, Twitter, et cetera.** Be "that person". It's painful at first but the benefits are worth it.
1. Use other decentralized software. E-mail, for instance, is better than Facebook Messenger, even if you use GMail or Yahoo! Mail or another large e-mail provider, because there is the _option_ to move to another.
1. Share this post. Share it on Facebook and Twitter and Instagram and _everywhere_. Or write your own and share that. Make it resonate.
Together, we can build an internet that's not just safer, but more enjoyable. Thank you. Your effort and struggle are valuable.


@ -0,0 +1,326 @@
title: "Fedidict Implementation: Setting Up the Database"
slug: fedidict_implementation_01_setting_up_the_database
date: 2018-09-06T12:31:37-05:00
- Fedidict
- Programming
- Rust
- Databases
description: The core of Fedidict is the data it stores, for which it uses a traditional SQL database with a simple but powerful schema. This post explores both the SQL and Rust sides of the schema.
*This is the first of several posts on the implementation of my ActivityPub-enabled Rust
web application, FediDict. Each of these posts was released to my Patreon patrons before
being make publicly available. I recommend that you read the
[security design post](/post/fedidict_ux_design_part_2/) first.*
The core of any stateful service is its datastore, and FediDict uses PostgreSQL, a very
performant, ACID-compliant, free and open source relational database. In this post, I'll
look at how to make an efficient, normalized database for FediDict's data.
That's a lot of jargon, so let's break it down.
- "Performant", here, means that it is capable of many simultaneous reads, since that is
likely to be the main load.
- "ACID-compliant" basically means that PostgreSQL won't lose data, unless I explicitly
ask it to.
- "Free and open source" is important becuase I want anyone to be able to run FediDict.
- "Relational" means that relations, such as "user x owns definition y", are encoded in
the database itself, rather than being something FediDict has to handle at the application
level. This is good for both performance and security.
# From the Ground Up
FediDict is mosly interested in its list of definitions, and that's where I'll begin with
the database implementation as well. A definition requires a lot of attributes. These will
go in the `Definitions` table of the database.
- An arbitrary unique ID value
- The term being defined
- The part of speech being defined (refers to a part of speech; a **foreign key** for PartOfSpeech)
- The actual text of the definition
- A list of related ("see also") terms
- Date of creation
- Created by (foreign key for User)
- Date of approval (it may not exist; we call this **nullable**)
- Approved by (foreign key for User; nullable)
## Normalization
This is a lot of columns for one database table, and furthermore, some of these items
depend on one another. For instance, it's valid to have no approval date AND no approved
by user ID, or to have both an approval date and an approved by user ID, but not one or
the other.
This is not desirable; it's better to have such constraints encoded in the database, and
this is impossible with a single table solution. Furthermore, a bulk approval could
result in multiple records storing data that refers to a single event, which is a
violation of database normalization. To avoid this, I'll split out approval into its own
I'll also replace the two fields for date of approval and approved by user ID with a
single nullable foreign key reference for an Approval. If it's null, the definition
has not been approved; if it has a value, the definition has been approved, and the system
can look up which user approved it.
- An arbitrary unique ID value
- The term being defined
- The part of speech being defined (that is, "kick, n." is not the same as "kick, v.")
- The actual text of the definition
- A list of related ("see also") terms
- Date of creation
- Created by (foreign key for User)
- Approval record (nullable, foreign key for Approval)
Written in SQL, the language of the database, that looks like:
CREATE TABLE Definitions (
part_of_speech INT NOT NULL,
definition TEXT NOT NULL,
see_also TEXT NOT NULL,
created_by INT NOT NULL,
approval INT,
FOREIGN KEY (part_of_speech) REFERENCES PartsOfSpeech(id),
FOREIGN KEY (created_by) REFERENCES Users(id),
FOREIGN KEY (approval) REFERENCES Approvals(id)
- An arbitrary unique ID value
- The user responsible for approving the definition (foreign key for User)
- The date on which the definition was approved
In SQL, this looks like:
CREATE TABLE Approvals (
approved_by INT NOT NULL,
FOREIGN KEY (approved_by) REFERENCES Users(id)
The final definition-related table is PartsOfSpeech. I considered using a hard-coded list
of parts of speech, but I realized that this isn't scalable across languages, so I will
definitely need to let users define their own. Fortunately, it's very simple.
Part of Speech:
- An arbitrary unique ID value
- The name of the part of speech, like "noun"
- The plural of the name, like "nouns"
- The symbol for the part of speech, like "n."
CREATE TABLE PartsOfSpeech (
# Identity Crisis
The other important entity in the system is the user. Users have already been referred to
several times in the schema, but given the discussion in the [previous post](/post/fedidict_ux_design_part_2/), how I refer
to them is pretty important.
- An arbitrary unique ID value
- The user's name
- The user's email
- The hash salt for the user's password
- The hashed password used for login
- The federation partner associated with this user (foreign key for FederationPartner, nullable)
- The date upon which the account was created in FediDict's database
- The RBAC record for the user (foreign key for Roles)
This definition has brought in a new structure, as well: a Federation Partner. That's what
I'm calling the other ActivityPub sites that FediDict will cooperate with. For now, I'll
keep them simple.
- An arbitrary unique ID value
- A domain name for the partner
- The date upon which this partner became known to FediDict.
Users with a non-null partner ID will be written as **username**@**partner.domain**, while
users with a null partner ID are just **username**.
Here's how these two look in SQL:
passhash TEXT NOT NULL,
partner INT,
FOREIGN KEY (partner) REFERENCES FederationPartners(id),
CREATE TABLE FederationPartners (
Finally, I need to define the RBAC table. This is based on the previous post:
definition_reader BOOLEAN,
definition_submitter BOOLEAN,
definition_evaluator BOOLEAN,
definition_remover BOOLEAN,
queue_reader BOOLEAN,
queue_approver BOOLEAN,
queue_rejecter BOOLEAN,
account_creator BOOLEAN,
account_remover BOOLEAN,
account_role_assigner BOOLEAN,
And that's it!
# Next Steps
Now that the database is defined, I need to actually create it. I'm going to use the
Diesel database mapper for Rust. In essence, Diesel allows me to write Rust code that is
automatically converted into SQL, with all the type safety benefits of Rust code.
## Installing Diesel
Most Rust crates are just a `Cargo.toml` edit away, but Diesel is a bit more complex, as
it also has a CLI tool which I'd like to use. So, I first have to:
$ cargo install diesel_cli
Note that this requires both the `mysql` and `postgresql` client libraries be installed
on my system, despite the fact that I'm only using PostgreSQL.
Now I do need to add it to my `Cargo.toml`, along with some other libraries. I'm using
`serde` for serialization and deserialization, along with its code generator from
`serde_derive` and JSON functionality from `serde_json`. I'm also using `dotenv` to
configure the database with a `.env` file and `chrono` to handle dates and times.
name = "fedidict"
version = "0.1.0"
authors = ["LeoTindall <>"]
serde = "1"
serde_derive = "1"
serde_json = "1"
diesel = { version ="1", features = ["postgres"] }
dotenv = "0.13"
chrono = "0.4"
## Installing PostgreSQL
I'm going to use Docker to handle the database.
docker create -p \
--name fedidict-db \
-e POSTGRES_PASSWORD="password" \
-t postgres:alpine
docker start fedidict-db
And, in a file called `.env` in the project folder, I'll make an environment variable:
and then run `source .env` to get this variable in my shell.
Now I can use Diesel to set up the database. This creates both the database and a
directory to hold SQL files for database setup.
diesel setup
## Heading North
Diesel's CLI tool helps manage database state, and in fact I'll use it to create the
database to begin with.
I'll add a new **migration** using `diesel_cli` and put some SQL code in it telling
Diesel how to set up (and tear down) the database. This is the only SQL I expect to write,
since Diesel will handle interaction with the database in most cases.
$ diesel migration generate initial_schema
Creating migrations/2018-09-10-170446_initial_schema/up.sql
Creating migrations/2018-09-10-170446_initial_schema/down.sql
In Diesel, migrations go both "up" and "down"; up migrations add new functionality, while
down migrations remove that functionality.
The up migration is the combination of the SQL I've written so far in the post, but the
order is important; tables must be created in dependency order. For instance, Approvals
must be created before Definitions because Definitions has a foreign key that points
to Approvals.
The order I decided on is PartsOfSpeech, FederationPartners, Roles, Users, Approvals,
The down migration is much simpler. I simply drop all the tables I created in tue up
DROP TABLE Roles, Definitions, Approvals, PartsOfSpeech, Users, FederationPartners;
It's good practice to test migrations by running them and then rerunning them.
$ diesel migration run
Running migration 2018-09-10-170446_initial_schema
$ diesel migration redo
Rolling back migration 2018-09-10-170446_initial_schema
Running migration 2018-09-10-170446_initial_schema
This verifies that the down migration at least doesn't leave anything obvious hanging
After running this migration, two new files will show up: `diesel.toml`, Diesel's config
file, and `src/`, the Rust mapping of the database schema.
That's it - the database is set up. In the next post, I'll discuss how I map these
data structures into Rust, how I test them, and what behavior I need to set up for them.


@ -0,0 +1,167 @@
title: "FediDict UX Design, Part 1"
slug: "fedidict_ux_design_part_1"
date: 2018-08-23 18:04:14+00:00
- Fedidict
- Programming
- Rust
description: My first stab at defining the user experience for my federated dictionary software.
_This is the first of several posts on the design of my current ActivityPub-enabled
Rust web application, FediDict. Each of these posts was be released to my Patreon
patrons before being made publicly available._
FediDict, a portmanteau of Federated and Dictionary, is my current open source project.
Most of the time, I've taken the approach of "dive in and write some code; the design
will shake itself out". This isn't a great idea for something I want to be widely
adopted, so I took a different approach for this project.
I do have some code written - mostly dealing with Diesel and the database, as well as
simple HTTP server boilerplate - but I have yet to work on the meat of the application.
In this series, I'll walk through some of the design work.
# What does it do?
FediDict lets people define jargon that they know and look up jargon that they don't, and
share those definitions between disciplines and across the web.
'Jargon' is an important word here; I want FediDict to help people who are reading a text
from a specific discipline find out the meaning of words they don't know, and get some
context on those words in other disciplines as well.
While the word "proband" probably only has meaning in the medical community, something
like "stall" means one thing in aviation and another in automotive repair.
Some words, like "[normal](", are so overloaded as
to have multiple meanings in multiple fields. Wikipedia alone lists six in psychology,
one in chemistry, two in physics, and upwards of 25 in various fields of mathematics,
including linear algebra, geometry, and statistics!
Such words can be confusing if misinterpreted, so I'd like each instance of FediDict to
connect to others. That way, they can send definitions around, with attribution, so
that each FediDict instance has access to the whole corpus of information while making it
clear where each definition comes from.
# Seeking Knowledge
## The Guest
A guest coming to the FediDict site just wants to know a specific piece of information.
Let's imagine that I host "" and that Danielle A. Hacker wants to
know what a "PIC" is. She knows that is the place to be to learn
about computer jargon, so she enters that URL.
My top priority here is to get out of her way and let her get the info she needs. She
needs to have a big search bar, front and center, where she can start typing the word
she wants to define, so we need a distraction-free home page with search bar.
While she's typing, she might wonder if the ComputerJargon FediDict instance even has
the definition for "PIC". To reassure her, and to make typing long terms (or terms a
user doesn't fully know how to spell) easier, it would be great to have some kind of
typeahead for terms the FediDict instance already knows.
Once the term is entered, FediDict needs to search the database for all the terms
matching the search, and display them.
It's also important to have some way to order these terms. In order to maintain relevance,
the terms from the instance the user actually searched on should come first. In addition,
there should be some mechanism for determining whether or not a given definition is
accurate and relevant. For now, that's the "score"; manipulating it comes later.
## The Member
Imagine that another person, Jake P. Grammer, wants to look up the same information.
He has a good idea that a "PIC" is a programmable _something_, but he isn't sure what.
He goes through basically the same workflow as Danielle the guest, but with two pre-
conditions: he's made an account, been approved by the moderation team, and signed in.
He navigates to, enters "PIC" in the search bar, and gets a list
of results, sorted as mentioned above. Unlike Danielle, however, he has the option to
voice his opinion on the relevance of the results. Let's say the results are as follows.
> **PIC**, _n._, by duck3345
> A PIC, or photonic integrated circuit, is an electronic component
> which uses light rather than electricity for its internal operation.
> **PIC**, _n._, by user123
> A PIC, standing for programmable interrupt controller, is a component
> of a computer which does x, y, z, and q, and blah blah.
> **PIC**, _n._, by farmer1372
> PIC, particulate inorganic carbon, is some chemistry thing that Leo
> doesn't know about, but it's certainly not relevant to
>'s database.
The first definition is totally irrelevant to Jake, but he doesn't know if it's wrong, and
it doesn't break any rules. He does, however, know that the entry by user123 on
programmable interface controllers is relevant to his needs. He should be able to signal
this by liking the definition. This action is associated with his account and increments
the score of the post.
The third definition might also catch his eye. It makes no sense to have a chemistry term
on the ComputerJargon database, and Jake wants to help out the site by making the
moderator team aware of this. He should be able to do that by reporting the definition,
another action associated with his account. It will reduce the score of the definition,
as well as sending a message of his choosing to the moderators of the site.
Jake has successfully gotten the information he needs, and has contributed some
contextual information back to the site. As more and more users use the site and look up
"PIC", the most domain-relevant information will be liked most often and will drift
to the top of the list, while lesser-used information will drift downwards and totally
irrelevant information (and downright rule-breaking) can be deleted by moderators after
being reported.
# Sharing Knowledge
Perhaps another user of the site, Eric D. Velopoer, just attended a seminar on frobnitzem
and wants to make sure that everyone knowns the definition of "frobnicate".
He follows the basic flow of the other users to determine whether or not the
definition already exists in the database. In this case, it doesn't exist in the
ComputerJargon database, but it might exist on other instances. Whether or not it does,
the result of the search is clear.
Once he has found that the definition doesn't exist, Eric has a clear path to
create a new definition.
(He also has the option to add a new definition for which terms already exist.)
Note that I deliberately do _not_ want people to be able to add definitions without
searching for them first, because that is likely to lead to duplication.
Once Eric has decided to add a new definition, he's presented with a form to
add the definition. It has a prefilled, uneditable entry with the term being defined,
a dropdown for part of speech (verb, noun, etc), and a large text box for the definition.
It might also be useful to provide a way to enter related terms or "see also"s for a
definition. For instance, Eric might want to mention that "frobnicate" is related to
"frobnitz", "twiddle", and "tweak".
Eric's definition might end up looking like this (from the Original Hacker's Dictionary):
> **frobnicate**, _v._
> To manipulate or adjust, to tweak. Derived from "frobnitz".
> Usually abbreviated to frob. Thus one has the saying "to frob a frob".
> "Frob", "twiddle", and "tweak" sometimes connote points along a continuum.
> Frob connotes aimless manipulation; twiddle connotes gross manipulation,
> often a coarse search for a proper setting; tweak connotes fine-tuning.
> If someone is turning a knob on an oscilloscope, then if he's carefully
> adjusting it he is probably tweaking it; if he is just turning it but
> looking at the screen he is probably twiddling it; but if he's just doing
> it because turning a knob is fun, he's frobbing it.
> See also: frobnitz, tweak, twiddle.
That definition, once submitted, goes onto a moderation queue, so that the site
moderation team can decide whether or not to allow it to go public.
# Next Steps
These are the basic workflows for FediDict users - looking things up, giving feedback on
existing definitions, and submitting new definitions. In the next post, we'll look at the
security aspect of design, including anti-abuse tooling.


@ -0,0 +1,239 @@
title: "FediDict UX Design, Part 2"
slug: "fedidict_ux_design_part_2"
date: 2018-09-02 00:00:00+00:00
- Fedidict
- Programming
- Rust
description: More in-depth analysis and design for the UX of FediDict, focused on security.
*This is the second of several posts on the design of my current ActivityPub-enabled
Rust web application, FediDict. Each of these posts was be released to my Patreon
patrons before being made publicly available. I recommend that you read the [first
UX design post](/post/fedidict_ux_design_part_1) first.*
Any software that accepts input over the network will, eventually, be subject to
attack, of some kind. Any federated software service must accept input over the network -
that's the whole point. In the case of FediDict, these attacks could come from either the
user-facing side or the federation-facing, or server-to-server, side.
Either way, some of the technology choices I've made will help FediDict withstand attack,
and in this post, I'll explore how a solid access control model can help secure a system
while still enabling rich control by administrators and a good user experience.
# Inherent Sturdiness
FediDict is built in Rust, a language that promises memory safety for all programs written
without the `unsafe` keyword. This eliminates (or, at least, mitigates to the point of
insignificance) the majority of common memory-based attacks, like buffer overruns,
use-after-free bugs, double-free bugs, and other such issues.
I'm also using a proven database system, though I'm not sure if PostgreSQL or MariaDB will
be my final choice yet. Either one has a track record of high security, and FediDict will
interface with them using Diesel, an object-relational mapping library which uses static
analysis to ensure that certain types of database errors and bugs (like SQL injection) are
not possible.
# Access Control
Nonetheless, bugs exist in any piece of software, and application-level bugs are often far
more impactful than are those in underlying code, precisely because they are so hard to
predict and expose precisely what the attacker wants - the data held in the application.
One of the best ways to mitigate application-level bugs is to have a consistent (and
consistently enforced) access control model. Rust's type system helps me here.
## A Pyramid Model
A dictionary produced by a small team of professionals is an easy thing to secure, but a
crowdsourced (or, a semi-crowdsourced) one is not. In FediDict, I'll have a few basic
levels of access:
1. The Guest. A guest has absolutely no write access, at all, but they have read access
to most things. I'll define exactly what later.
2. The Contributor. A contributor has all the privelages a Guest has, as well as write
access to the moderation queue (but, you'll note, not to the site directly), and to
the vote count of existing definitions, and to the report queue.
3. The Moderator. A moderator has all the privelages a Contributor has, as well as being
able to read the moderation queue and write from the moderation queue to the public
database of definitions.
This is a tidy and convenient pyramid of access. Each level builds on the last, each group
is smaller than the last, and all these privelages can be enforced with simple conditions,
even at the database level.
Access becomes slightly more complex when I bring federated users into the mix. Because
ActivityPub supports "liking" posts remotely, and I wish to support this as well, some
people can write to the vote count of definitions without being Contributors.
## Roles, not Tiers
This would appear to pose a problem, but in fact, it's easy to solve with a concept known
as Role Based Access Control, or RBAC. RBAC is a technique used by many complex systems,
including [Amazon Web Services]( and [Google Cloud](,
but is applicable to simpler systems as well.
With RBAC, I'll define a set of roles and assign those roles to users based on various
factors. Then, when a user attempts to perform an action, FediDict will check for these
roles. For instance, the roles could be:
- **DefinitionReader**: can access the public database of definitions.
- **DefinitionSubmitter**: can submit new definitions to the moderation queue.
- **DefinitionEvaluator**: can submit likes and reports for definitions.
- **DefinitionRemover**: can remove definitions from the public database.
- **QueueReader**: can read definitions submitted to the moderation queue.
- **QueueApprover**: can move definitions from the moderation queue to the public database.
- **QueueRejecter**: can remove definitions from the moderation queue.
- **AccountCreator**: can create new local FediDict accounts.
- **AccountRemover**: can remove local FediDict accounts.
- **AccountRoleAssigner**: can assign and remove roles on accounts.
I could use these roles in Rust through an `enum`. I'm not sure just how it will look,
but for example:
impl Definition {
// ... other code for definitions
/// approve() sets a definition as approved, by the authority of the
/// given user.
fn approve(self, authority: &User) -> Result<Definition, Error> {
if authority.has_role(Roles::QueueApprover) {
let mut new = self;
new.approved_by = Some(;
return Ok(new);
} else {
return Err(Error::PermissionDenied(
"approve a definition from the moderation queue");
This could then be used in a pipeline. For instance:
// Assume I have user, the current user, and def, a definition, from elsewhere
// in the code.
In the case of an error, the information given is enough to render a message like:
> Error: Permission denied. In order to approve a definition from the moderation queue,
> you need the QueueApprover role, which you do not have. Contact an administrator for
> assistance.
## More Flexibility Without (Much) Complexity
I can still model tiers with these roles. A Guest has only the **DefinitionReader**
role, while a Contributor has **DefinitionReader**, **DefinitionSubmitter**, and
**DefinitionEvaluator**. A Moderator has all of those roles, in addition to
**QueueReader**, **QueueApprover**, and **QueueRejecter**.
I can also construct other types of user from these roles. For instance, a site admin
needs **AccountCreator**, **AccountRemover**, and **AccountRoleAssigner**, but
could delegate account creation to others by giving them only **AccountCreator**.
Someone the organization trusts to remove spam but not to fully evaluate the validity of
new definitions could have **QueueReader** and **QueueRejecter** but _not_
A user interacting over ActivityPub can like and report as well as viewing the database,
so they would get **DefinitionReader** and **DefinitionEvaluator**, but not any other
With roles, I get the security provided by a solid tiered model with a flexibility that
a tiered model can never provide.
## Implementing Roles
This might seem like a difficult technique to implement, but actually, it's very natural,
especially using the combination of Rust's expressive type system and a powerful RBDMS
like PostgreSQL.
Each user will have a database record in the Users table with an associated ID.
That record will specify their user name, e-mail address, (hashed) password, and any
other information, including the ID of a row in the Roles table.
The Roles table will associate user IDs with a set of roles. Each role can be on, off,
or none. This maps naturally to a nullable Boolean value in SQL, or to an `Option<bool>`
in Rust.
Say, for instance, ther are three users, an admin, a moderator, and a contributor.
The users table might hold three records, like so:
ID | Username | email | password | roles |
12 | ltindall | a@b.c | d7239dhs | 02358 |
87 | chughes | c@x.y | a2342nb0 | 83950 |
34 | djanes | q@m.z | passw0rd | 22741 |
And, the Roles table would have some associated records:
ID | definition-reader | definition-submitter | queue-reader | queue-approver | account-creator |
02358| null | yes | yes | yes | yes |
83950| null | yes | yes | yes | no |
22741| null | null | null | null | null |
Here, some roles have explicit values and some do not. For instance, entry 83950
(associated with UID 87, user `chuges`) has no explicit value (`null`) for
**definition-reader** and explicit values for all the other fields, while entry 22741
(associated with UID 34, user `djanes`) has default values everywhere.
This is valuable because, if a problem with the default settings is discovered, it would
be quite difficult to set the roles for every user created before that point. With this
system, values are only non-`null` if they have been explicitly set; null values resolve
to defaults set elsewhere.
This approach also provides the flexibility I was looking for. If I decide to trust
`djanes` with creating accounts but not, say, approving new definitions, I need only
change the `account-creator` column of the Roles table to `yes` (or `true`, or however
the database stores Boolean values) and that account can now create new accounts.
There is a major issue with this model, however. It does not have a good way to handle
access control for users with no identity, or with an identity I don't know about until
they first interact with us over a federation protocol.
## Anonymous Identity
Anonymous users - those interacting through an unauthenticated web interface and without
identity provided by a federation partner - are actually pretty easy to handle. I already
laid out the need for default permissions for logged-in users, so I can just add another
set of defaults for not-logged-in users (probably just **DefinitionReader**).
## New Introductions
Like other federated systems (such as e-mail), an ActivityPub identity is determined by
both a username (like `ltindall`) and a domain (for instance, ``). Assuming
I hosted a FediDict instance on `` with the user accounts shown
above, their full federated usernames would be ``,
``, and ``. These are identities
ComputerJargon.Online already knows about, but the system could get a "like" from a user on
another ActivityPub-supporting instance at any time.
Let's say Eugen, creator of Mastodon, follows the definitions created by users of
ComputerJargon.Online and sees a definition he really likes. He could click the "like"
button in the Mastodon UI, which would eventually notify FediDict that a like had been
sent by ``. Assuming ComputerJargon.Online had never interacted
with Eugen before, he would not have an account in the ComputerJargon.Online database.
So, how does FediDict determine his access rights?
Defaults come to the rescue again. With a set of defined defaults for accounts coming in
from federation partners, FediDict simply has to create a new record for Eugen's account
in the accounts table (noting its remote origin) with a null access control row, giving
it default values for remote accounts.
# A Holistic View
These considerations alone are not enough to make a system secure, but with good underlying
technologies and a solid access control model, which I'll refine in future posts and as
I begin to write more code, it should be easy to keep egregious bugs out and fix and issues
that do make it into the codebase.
Security comes in layers, and this post has laid out a few of them: one at the very bottom,
the language and underlying technology used to build the system, and one near the top, the
access control model. I can fill in the rest as time goes on.


@ -0,0 +1,30 @@
date: 2017-02-21 17:04:20+00:00
slug: hacker-superstitions
title: Hacker Superstitions
- Writing
- Culture
description: Hackers pride ourselves on being logical and empirical, but that doesn't make us immune to superstition.
I'm currently taking a course called "Macbeth from Page to Stage". We're discussing superstitions in theatre, and it got me thinking about hacker myths. I wrote this for the assignment, but I thought others might find it interesting.
As a hacker and programmer, I’m part of a culture that is both dismissive of and fascinated by the supernatural. Of course, as people who have to perform extreme rationality and logic in order to be taken seriously, nobody really believes in ghosts or spirits or divination - and yet we have the “demo gods” to whom laptops, chickens, and goats are (figuratively) sacrificed in order to ensure the proper operation of live demonstrations at conferences.
We also have a strange level of personification of the machines and computer programs on which we work - machines are said to be “fighting over” resources, protocol handlers sometimes “get confused” when given incorrect input, and the phrase, “this subroutine’s goal in life is…” is quite common. The personification of the these machines isn’t literal, but it’s not totally figurative either. Sometimes computers do things we just can’t understand.
We call the most skilled hackers “wizards”. Hacking on compilers or writing machine code directly is “deep wizardry”, and doing so maliciously (or sometimes just in a way nobody else can understand) is “black magic”. We also have our koans, little stories about "disciples" becoming "enlightened" or the exploits of the "masters" in the heyday of the AI Lab. Consider this (from the New Hacker’s Dictionary):
<blockquote>A novice was trying to fix a broken Lisp machine by turning the power off and on.
Knight, seeing what the student was doing, spoke sternly: “You cannot fix a machine by just power-cycling it with no understanding of what is going wrong.”
Knight turned the machine off and on.
The machine worked.</blockquote>
Such stories from the MIT AI Lab abound, and are the foundation for much of the folklore of the hacker community.
Finally, some notable hackers who have died are considered to haunt our systems. These legends are then blamed (or applauded) for everyday events. For instance, at DEF CON 24, all the Bally’s Casino ATMs were broken, so someone hung a sign saying “Barnaby Jack Was Here”, Barnaby Jack being a hacker who died under mysterious circumstances a few months after demonstrating a remote, Internet-based attack which could entice an ATM to literally spit out cash onto the floor.
I don’t think I personally believe in these things - that the AI Lab masters were bodhisattvas, that the Demo Gods are watching over us, that my thirty thousand lines of C are alive, or that Barnaby Jack still haunts Vegas ATMs. But I do still participate in the customs, still make the jokes, still run the fortune command on Stallman’s birthday every year. So whether or not a ghost has ever taken over my computer, the supernatural has certainly affected my life.


@ -0,0 +1,23 @@
date: 2016-06-04 14:53:55+00:00
slug: i-repaired-my-headphones
title: I Repaired My Headphones
- Hardware
- Hacking
- Culture
Hardware manufacturers are missing out on a huge potential source of revenue: the thrifty tech user market. No, really. Let me explain.
I just woke up to find that my favorite pair of headphones was making only one sound, and it wasn't the one I was putting in as an electrical signal. It was rattling.
Feeling adventurous, I popped the around-the-ear pads off and - lo and behold! - found four Torx screws. I removed them, found that the drivers had separated from their sockets, and put them back in. I also glued them in with a bit of hot snot, since the drivers happened to be a little deeper than their sockets, giving me a nice lip around which I could run the hot glue gun.
These headphones cost about $25. They're not expensive and they're not fancy, but they're repairable. I bought them for that, in a sort of shallow sense: they're advertised as having a 3.5mm accepting jack on the left ear, which means that the cable can be replaced. I abuse my headphones constantly, and I've already replaced the 3.5mm cable twice.
My only question is, why didn't they advertise to me that the insides were trivial to repair as well? Every other pair of around-ear headphones I've owned has been ultrasonically welded, not affixed with screws; the drivers have been glued to the frame, not inserted into a socket that's integral to the frame; and the around-the-ear cups have been either glued on or attached with a huge lip that's nearly impossible to remove.
In _Idoru_, Gibson predicted that, by around now, we would have reverted to a model of non-disposable, easily repairable electronics. I'd pay so much more for headphones that were advertised not just with shots of the outside but with schematics and examples of how easy repairs are. I'd be willing to wait longer for my shipment if I didn't think I'd need to buy a new pair of headphones next month.
I want headphones made by the Sandbenders out of [coral and turquoise]( and the interior surface of renewable nuts, and I'd be willing to pay a lot for them, not for aesthetic reasons but because I know I won't have to replace them for a long, long time.


@ -0,0 +1,25 @@
title: "Improved User Interface 0.3.0!"
date: 2018-06-13T08:46:55-05:00
- Open Source
- Programming
- Rust
The [Improved User Interface crate]( has had its 0.3.0 release, adding new input fields ([Checkbox]( and [Combobox](, new layout options ([LayoutGrid](, as well as finally working 100% on Windows, and with many bug fixes.
This comes with the 0.1.3 release of the underlying [ui-sys crate]( to support these features.
It's been a big undertaking to get to this point, and I'm excited to grow from here, now that `libui` itself is moving forward again as well.
I want to give a huge shoutout to several GitHub users who helped with the project, specifically:
* pgvee, who helped fix Windows CI and worked on LayoutGrid
* huangjj27, who did an amazing job fixing Windows builds
* ZakCodes, who fixed an API bug
* masche842, who implemented several new features
From the project management side, we have a vastly improved README, a CONTRIBUTING guideline file, and have resolved to keep a changelog.
I'd like to extend a huge thank you to all my [patrons]( for keeping this ball rolling. I'm excited to see where we go from here.


@ -0,0 +1,39 @@
date: 2016-04-10 15:37:11+00:00
slug: ipfs-the-interplanetary-file-system
title: IPFS, the Interplanetary File System
- Networking
- Open Source
**IPFS**, the **InterPlanetary File System**, is a content-addressable network.
This means that rather than asking the network for a particular site or domain name (like, you ask for a particular piece of content, and you're guaranteed to recieve it.
There's no possibility of a "monkey in the middle" attack, in which someone maliciously modifies the web page you're trying to access.
To explain it another way, on the normal Web, when you access, the network translates it to an IP address, like `` or `2607:f8b0:4003:c00::6a`. Then, your computer connects to the server that address refers to and asks it, "Could you send me the content for, please?". This means that Google can change the content on their front page whenever they like - or, if someone malicious is inbetween you and Google, they could change the content. They might, for example, change the login form so that the passwords you enter are sent to them.
On IPFS, however, when you ask for something, you don't request an IP address from the network, but instead ask for a _hash_ of a file - a web page, an image, a video, or whatever. For example, /ipfs/QmbKM1C3ggNVdQtTnQuhvWruyodK6TUnoxjYwg31Q3crcn is the address of a specific version of my GNU/Linux tutorial series. If I change the content, the hash changes.
Of course, people still want to be able to change their content without breaking all the links to it. For that, we have **IPNS**, **InterPlanetary Name System**. IPNS allows you to securely point to mutable content with a hash-like address (/ipns/<whatever>).
These addresses are still not human-readable, but DNS can be used to resolve human-readable names to IPNS addresses, just like it's currently used to resolve IP addresses from human-readable names.
### Benefits
One of the coolest things about IPFS is that the load of serving files is shared between users.
Ever heard of the "Slashdot effect", wherein something cool gets linked to on social media and the server hosting it collapses under the sudden load? With IPFS, that won't happen.
That's because when you request a particular piece of content over IPFS, it gets cached to your machine and sits there until the allotted space is used up, or you decide to remove it.
Anyone who asks for that piece of content has a chance of connecting to your IPFS node before anyone elses and recieving the content from you.
They in turn will cache it, and serve it to future clients.
Everyone who is looking at a particular piece of content shares the load of serving it to everyone else.
### Technical Concerns
There have been worries about hash collisions in the data store. IPFS uses multihash, which allows its hashing function to be upgraded, and currently, implementations use SHA-256. [This]( StackOverflow post makes a good point about the likleyhood of such a collision:
> If we have a "perfect" hash function with output size n, and we have p messages to hash (individual message length is not important), then probability of collision is about p2/2n+1 (this is an approximation which is valid for "small" p, i.e. substantially smaller than 2n/2). For instance, with SHA-256 (n=256) and one billion messages (p=109) then the probability is about 4.3*10-60. A mass-murderer space rock happens about once every 30 million years on average. This leads to a probability of such an event occurring in the next second to about 10-15. That's **45** orders of magnitude more probable than the SHA-256 collision. **Briefly stated, if you find SHA-256 collisions scary then your priorities are wrong.**
So, perhaps, when IPFS is truly interplanetary, we will have to switch to a new hash function. That's fine - the way IPFS is built, that's entirely possible.


@ -0,0 +1,157 @@
date: 2016-08-29 22:06:19+00:00
slug: learning-japanese-the-python-way
title: Learning Japanese the Python Way
- Programming
- Python
Now that I'm in college, I'm taking a lot of non-computer science classes, and one of them is Japanese. I'm just starting out, and I need to be able to rapidly read numbers in Japanese and think about them without translating them consciously. I could make a bunch of flash cards, or use a service like Quizlet... or I could write some Python!
For those of you who are unfamiliar, Japanese doesn't have the ridiculous numerical system that English does. One through ten are defined, and eleven is simply (ten)(one). Twenty three, for example, is (two)(ten)(three) (に じゅう さん). This means that rather than having a long list of numbers and special cases, I can just have the numbers zero to ten "hard coded".
After that, the program is pretty simple: if the number is less than 11, simply look it up. If it's more than 11 but less than 20, build it with じゅう plus the second digit. If it's larger than 20, build it with the first digit plus じゅう plus the second digit.
The interactive part is pretty simple too: it runs a loop that randomly generates numbers, checking that they haven't been done before, translates them, and asks me to translate them back. If I succeed, it moves on; if not, it doesn't record the number as having been completed, so I have to do it again at some point in the same run.
This [simple program]( came out to 136 lines of very verbose and error-checked Python. It's a good piece of code for a beginner to try and modify - for example, can you get it to incorporate the alternate form of four (し) as well as the primary form? Can you make one that teaches Kanji numbers? (I plan to do both of those things at some point.)
#!/usr/bin/env python3
This is a program to help you study the Japanese numbers.
It currently goes from 0 to 99; I will extend it at a later date.
It can be executed as follows:
which will do all the available numbers, or
./ 10
which will go only up to 10.
numbers = ['ZERO',
class OutOfRangeException(Exception):
def small_to_japanese(n):
"Convert a number (0-10) to Japanese."
if n > 10 or n < 0:
raise OutOfRangeException
return numbers[n]
def medium_to_japanese(n):
"Convert a number from 11 - 100 to Japanese"
if n > 100 or n < 11:
raise OutOfRangeException
digits = list(map(
int, str(n)
out = ""
# Omit いち in numbers > 10
if digits[0] > 1:
out += numbers[digits[0]] + " "
out += numbers[10] + " "
out += numbers[digits[1]]
return out
def number_to_japanese(n):
return small_to_japanese(n)
except OutOfRangeException:
return medium_to_japanese(n)
except OutOfRangeException:
print("No way to represent numbers of that magnitude!")
if __name__ == "__main__":
from random import randint
from sys import argv
# Check if there is a command line option for max numbers
if len(argv) >= 2:
MAX_NUM = int(argv[1])
except ValueError:
MAX_NUM = -1
# A little edge case handling
if MAX_NUM > 99:
print("Impossible - this program doesn't "
"work with numbers over 99.")
# If a max wasn't given, default to 99
MAX_NUM = 99
given = ""
done_so_far = []
number_done = 0
while True:
n = randint(0, MAX_NUM)
# If and as long as n has already been done, get a new number.
while n in done_so_far:
n = randint(0, MAX_NUM)
given = input("What is {} in Roman numbers? ".format(
except KeyboardInterrupt:
except EOFError:
if given.lower() == 'quit':
if number_done >= MAX_NUM:
print("You did all the numbers in that set!")
given_n = int(given)
except ValueError:
given_n = -1
if given_n == n:
print("You got it!")
number_done += 1
print("No, that's incorrect. This is {}.".format(n))


@ -0,0 +1,45 @@
date: 2017-04-17 04:48:50+00:00
title: MLeM, a VM for genetic programming
slug: mlem-a-vm-for-genetic-programming
- Programming
- Machine Learning
I've recently been working on a project called the Machine Learning Machine, or MLeM.
It's a VM implemented in the Rust programming language which I hope to use as a basis for some genetic programming.
It's a Harvard architecture machine, meaning that it has separate representations and memory for data and program segments. While this is not the way most modern computers work, it does model the more secure W XOR X functionality that exists in operating systems such as BSD and allows me to properly utilize the amazing type system of the Rust language to do compile time verification of a lot of the system.
The machine has 8 general purpose registers, a “hardware” stack (that is, it has Stack Pointer and Base Pointer registers and PUSH and POP instructions), and built in I/O instructions. This makes is slightly abstracted over a real machine, but still makes any code written for it relatively easy to turn into actual machine code for real CPUs.
All data is stored in 64-bit registers and memory cells. Instructions, however, are represented using Rust’s enums. I chose to go with a unified-addressing scheme. This glosses over the realities of memory access, but because all accesses are already tagged by virtue of the system itself, I can easily add performance penalties for main memory access and even a cache simulation when needed.
A sample instruction looks like this:
`Add(a, b)`
where a and b are of type `Address`. So
`Add(RegAbs(R1), MemReg(R2))`
would add the value in R1 to the value at the memory address in R2 and place the result in R1. Note also that R1 and R2 are enum variants of `Register`; trying to
create something like
`Add(RegAbs(R1), MemReg(100))`
would produce a compiler error.
I’ve also developed an extremely simple assembly syntax for testing purposes; the above instruction would look like:
`add r:r1 p:r2`
That is, add (register) r1 to (pointer) r2.
The primary goal for this project is to provide a more stable and closer-to-real-CPU language than Muller’s Brainf\*\*\* or my own [SBrain](, both of which I’ve [previously used]( for genetic programming with rather mixed results.
The ability to generate instructions I know are valid helps short-circuit the process of culling unparseable genes from the population, which, while not particularly difficult, slows the process of evolution significantly.
Look out for a future post to discuss my previous experiences with GP and the future of this project.


@ -0,0 +1,57 @@
title: "Modding, Vim, i3, and Efficiency"
date: 2018-03-06T18:50:13-06:00
- Modding
- Linux
- Open Source
I spend a great deal of time _modding_ my Linux machine. Practiced by many Linux users, modding is the process of making a Linux installation pretty, by changing the color schemes, fonts and font sizes, icons, default applications, and the desktop background. As a noun, a _mod_ is the final product of that process: a computer system which looks pretty while remaining functional.
For example, here are screenshots of my two most recent mods.
> ![Screenshot of my dark NASA mod](/images/rices/2018-nasa-02.png)
> My previous mod, a dark blue/grey theme with red highlights.
> ![Screenshot of my purple space mod](/images/rices/2018-purple-space-01.png)
> My current mod, a purple theme with pink and white highlights.
A great many other examples of very pretty mods can be found on [Reddit](
I've always found ricing my system to be a very calming process. It re-familiarizes me with the machine I use to access the world, makes me re-think my work process, and encourages me to heavily optimize every aspect of what I do.
I have a deep sense of aestheticism regarding the software that I use, and my mods reflect this. I use the [i3]( window manager, which rather than drawing borders with titles, close buttons, and resize handles around windows simply "tiles" them so that the whole screen space is always used. It also allows me to use the machine exclusively via the keyboard.
My editor of choice, [vim](, is of a similar philosophy. Simple and configurable, it is almost entirely keyboard driven and easily configurable.
Even my computer's name, Asfaloth (after [Glorfindel's horse](, reflects this preference. I don't need it to be fancy; it just needs to get me where I'm going, rapidly and without fail.
Despite this, I've historically used heavy-weight, mouse driven IDEs like IntelliJ for development. Even when not working in an IDE, I tend to use a graphical editor like Visual Studio Code or Atom. I've gotten used to it, but every time I mod my machine, it grates on my. Everything I do in the terminal lets me enjoy my mod, seeing the color scheme and background blend beautifully as I work, but these "advanced" editors don't.
![Screenshot of my VSCode development setup](/images/rices/vscode.png)
The reason I use them is their efficiency - they integrate spell checking, style checking, type checking, container management, find/replace, and many other features into one application. The other day, I realized that there's no real reason I can't do this all through the terminal!
I spent some time setting up Vim and the rest of my environment to replicate the functionality I need from VS Code, but it didn't take nearly as long as I thought it would. It looks a little something like this.
![Screenshot of my terminal-only development setup](/images/rices/vim-only-pink.png)
I found that working this way wasn't faster in terms of code production or editing, but ended up being faster overall, because I spent less time flipping between and skimming files. Rather than typing code to get completion suggestions to refresh my memory (e.g., typing "CompareC", reading the suggested "CompareChartContainer", deleting the symbol, and then opening the appropriate file), I began simply pausing until the appropriate symbol name came to me.
I also noticed that a better mental representation of the overall project structure began to form. Using [CtrlP]( rather than a tree-based representation of the file structure forced me to think linearly through the process of making each multi-file change, which in several cases reminded me of other changes I needed to make, or revealed a problem with my approach.
> ![CtrlP in action, completing file names in a fuzzy fashion](/images/rices/ctrl-p.png)
> CtrlP.vim finds files using fuzzy searching. It is very configurable; in this case, it's searching the whole repository but excluding the `node_modules` directory.
I also found that using [ripgrep]( was much faster and easier than using the Visual Studio Code search tool, even accounting for the time required to open files it finds with CtrlP.
> ![ripgrep finds references to CompareChart in my entire project](/images/rices/ripgrep.png)
> ripgrep finds references to CompareChart in my entire project, respecting the .gitignore file
I really enjoy working this way, and I think I'm going to keep it up, at least for a while. While it does feel pretty cool to get code completion, linting, and type checking in the same window as my code, it's even cooler to notice the ways my brain internalizes the structure of a project when I don't have an assistant like Visual Studio Code to help me.
Ultimately, a combination of the two is likely to be most efficient; if I ever figure out the perfect workflow, I'll make sure to write it up.
> NOTE: After publishing this article, I was made aware that the term "ricing" comes from the racing community, where it was originally coined as a pejorative racist term referring to over-modded Japanese motorcycles. Yikes. Because of that, I've decided to use the term "modding" instead.


@ -0,0 +1,28 @@
title: "Moving to Subdomains"
date: 2018-05-13T16:26:58-05:00
- Networking
- System Administration
- Open Source
- Linux
I just finished moving my EtherPad Lite instance and my Gogs Git VCS instance to subdomains, rather than subdirectories.
This involved two sources of pain:
1. a lot of _waiting_ for the DNS to propagate. First I had to wait to get my new NS settings set, then to actually update the domain names allowing the []( and []( domains to point to this server.
1. a lot of config file updating in multiple places. I did eventually unify my Nginx config, which will save me some grief in the future, but I still had to update the Gogs and Etherpad Lite configs to be aware of their new locations.
My new Nginx config is in a few pieces:
* `` config. This holds the config for this blog, some other static content, my Keybase proof, and most importantly a server block which serves on port 80, on both IPv4 and IPv6, and issues 301 redirects to the same page at port 443 over HTTPS.
* `` config for PeerTube. This has to do a little bit of magic to do DNS resolution (for lookup for webtorrent hosts) and websockets.
* `` config for EtherPad Lite. It's relatively simple, but does have to do some websockets config.
* `` config for Gogs. The simplest of the configs, it just points to the Gogs webserver.
* `longview` config for the Linode Longview server.
Overall, it's a much less convoluted system that I had in the past, and it means I can trivially seperate these services into multiple servers in the future.
I also recently installed Cockpit, which has been... a bit underwhelming. I'm hoping that will pick up, but either way it'll be useful if I do end up leasing additional servers, since it can chart CPU and memory usage from more than one at a time, but it doesn't offer a way to monitor Nginx or other webservers, or PostgreSQL.


@ -0,0 +1,98 @@
date: 2016-04-11 05:54:17+00:00
title: Open Source for Normal People
slug: open-source-for-normal-people
- Culture
- Open Source
This is not an article for technical people. It is an article normal people who just use their computers to get things done: to look at Facebook and Twitter, check their work email, write novels, design houses… whatever.
Maybe you’ve heard about open source by using a open source program like Firefox, OpenOffice, or Inkscape. Maybe a friend suggested you look into it. Whatever the reason, you’re here, so here you go: a simple, non-technical explanation of what it is and why it’s important.
**TL;DR (Too long, didn't read)**
For the quick overview: Most software is secret; open source software isn't. As counterintuitive as it may seem, it's not necessary to keep software secret in order to make money, and when software is not secret, competition is fiercer and innovation often progresses at a faster pace. Keeping software transparent also prevents the authors of that software from using anti-consumer and anti-competitive tactics like proprietary format lock-in.
<!-- more -->
#### Background
Some background first (really, just a little bit, I promise!)
Computers really just do math. That’s it; they don’t know what words are, or what a web page is. **Software** turns things humans are interested in, like words or web pages or building plans, into numbers that computers know how to work with. Software is made of **code**. First, there’s the **source code**, which is a kind of code that humans can (sort of) read. Programmers make source code that describes what a computer should do to numbers to get the result that you, the user, want. It’s a little something like a recipe.
That recipe gets “baked” into **machine code**, which very, very few humans understand. It's only for computers. That’s what you download from the App Store or get on a CD when you buy a piece of software like Angry Birds or Microsoft Word.
#### Closed Source
Most software, like Microsoft Word or Angry Birds, is **closed source** or **proprietary**. This means that all you, the customer, get is the machine code. You can’t read it, but your computer can, so you can click on the little W icon and your computer will open up a window with a blank page, ready to be filled with the next great American novel. The machine code tells the computer what each keypress means, how to save and load files, and when to draw that little red squiggly to inform you that you spelled “antidiesstablishmentarianism” wrong.
That seems all fine and dandy, right? Well, it is, except for two things.
**First, the people who made the software are the only ones who know how it works.**
This is problematic for a lot of reasons. There’s the “bus factor”: what happens if the developer is hit by a bus, or, more realistically, the company shuts down or decides to stop supporting that product? No more updates for you, and there’s nothing anyone can do about it, because nobody has the recipe, only the pre-baked machine code. It also means that, if the people who made the software don’t want you to use it with their competitors’ software, they can stop you (or, at least, try).
This happens a lot in the real world, and it’s called **lock-in**. If your whole company uses Solidworks or Adobe Photoshop, for example, it’s going to be a real pain for you to switch to an alternative, even if some other company came out with a better, cheaper CAD package or photo editor. Once you’ve started using a particular piece of proprietary software, there’s no guarantee that you can switch to a better or cheaper program without a lot of headache. This is nothing, though, compared to the other problem.
**Second, the people who made the software are the only ones who are really sure about what it does.**
In the same way that the cook is the only one who _really_ knows what’s in the cake you’re eating, the developers are the only ones who _really_ know what their program does. This is really, really bad when the closed source software is something that’s essential to a lot of people’s everyday lives and businesses, like Microsoft Windows, which, starting with version 10, forces users to install updates, sends information about them back to Microsoft, and even removes programs they’ve installed [without asking]( Imagine if that happened to a program you used for work the day before a major deadline!
The creators of closed source software can also artificially limit what the software is able to do. This is the model of the EagleCAD printed circuit board design program, which sells licenses ranging from $0 to $50,000. The software you get with each license is exactly the same, but the more you pay, the more features get unlocked, so most users end up with a powerful CAD package they can never use to its fullest extent.
#### Open Source to the Rescue
These problems are why many users and even businesses are embracing **open source** software. Open source software is software whose source code, or recipe, is available for everyone to see. This means that anyone who can understand source code (a lot more people than can understand machine code) can make sure that the software does only what it’s supposed to.
It also means that anyone can _change_ that code. Not like a wiki, mind you ! Changes have to be reviewed by the product's maintainers before they are accepted into the mainstream distribution of the product, so if you decide to download an open source product, you can be sure that you're getting the best version that's available. Just like deciding to throw in that pinch of cinnamon, even if the original baker hated spices,  you can change your own copy whenever you like, and (usually) redistribute that copy to others. If you have a problem with the way your software works, you can just change it or, if you can’t program, you can hire any developer in the world to do it for you. This may not seem like a big deal, but it has far-reaching consequences.
Imagine trying to get Microsoft to change something in Windows or Word for you, or trying to convince Adobe to add your favorite feature to Photoshop. They’d never do it because they can’t justify the cost, and even if they did agree to your change, it might take years before it was incorporated into a consumer version. On the other hand, with an open source product, you can download the software and use it just like with a closed source product, but if you need something to be changed and can’t wait for the people who made the software to do it, you can get it done _right away_ to your exact specifications. Open source gives you that freedom. If you don't like the way something works, you can often change it yourself, and if you can't, any developer in the world can.
#### The Open Source Business
While on the surface it might seem impossible to make money on open source software, many companies do so. Red Hat Inc. sells an enterprise operating system based on the open source Linux kernel. While their product is almost entirely open source, and is freely available to anyone, many industrial facilities, financial institutions, and governments pay huge sums for technical support. MongoDB Inc. publishes an eponymous open source database solution under a similar business model.
Other companies, especially small ones with between one and ten developers, work partially on what are called **open source bounties**. The software is free, but customers can make a contract with the developers which essentially says, "for some amount of money, the developers will create some feature by some specific date". This allows developers to make money, allows the customer to be sure that they will get the features they need when they are needed, and allows the entire community to benefit from these new features once they've been created.
Even electronic hardware companies are embracing open source. While Linux, KiCAD, and other open source software has been popular in electronics for a long time, businesses like SparkFun, Adafruit Industries, and many others are creating **open source hardware**. Electronics that are open source are sold by the companies that create them just like normal, but all the schematics, diagrams, printed circuit board designs, and bills of materials are distributed for free to anyone who wants them.
#### Open Source and Competition
All of these practices encourage competition and make anticompetitive tactics like lock-in via proprietary file formats impossible. What is more, the moment an open source product is improved (whether by the community or the product's owners), that innovation is available to everyone. Other developers can immediately begin to improve it and build upon it, and users can immediately begin to integrate it into their workflows and business processes.
#### Try It!
Open source is not the answer to every problem, but it is the answer to many of them. Open source products exist for nearly everything.
* Office applications (like word processing, presentations, and spreadsheets);
* web browsing;
* computer aided design in architecture, mechanical engineering, and electrical engineering;
* digital photo processing;
* digital art;
* professional and amateur video editing;
* accounting and financial auditing;
* tax processing;
* professional and amateur audio processing;
* web servers;
* blogging software (in fact, this site is powered by the open source Jekyll software);
* almost anything you can think of
It costs you nothing but a download and a few clicks to try a piece of open source software, so go ahead. If you have a problem, most products have a method by which you can bring that problem to the developers' attention (called **filing a bug** against the product), so they can fix it.
If you're curious about open source as a concept, and how it can help you or your business, please contact me at I'm more than happy to answer any questions you might have.


@ -0,0 +1,136 @@
title: "PDF Embedding Attacks"
date: 2018-08-04T12:17:11-05:00
- Hacking
- Programming
- JavaScript
description: Turns out, it's possible to embed files that automatically execute as soon as a PDF is opened, making it an optimal malware delivery mechanism.
PDF, or Portable Document Format, is an incredibly complex file format, governed by many
standards and semi-standards. Like HTML and CSS, it was primarily designed for document
layout and presentation. Also like HTML and CSS, it has been augmented with a JavaScript
engine and document API that allows programmers to turn PDF documents into applications -
or vehicles for malware.
# Embedding Files in PDF Documents
It's very easy to embed any kind of file in a PDF document. Every document includes the
`EmbeddedFiles` name tree, along with support for collections of files, known as
Most PDF libraries provide support for this; we'll examine PyPDF2, which supports
everything we need and is pure Python.
PyPDF2's `PdfFileWriter` provides a method called `addAttachment` which takes a name
and some bytes and embeds them as a file in the PDF ([docs](
This is how malware is usually concealed in a PDF document - as an embedded file.
# Opening Files from PDF Documents
Now that we have a payload embedded in a PDF document, we need to actually open it.
The basic method for this is to also embed a script in the PDF document. In our case, we
want to add a document level script. This script will execute as soon as the PDF is opened.
Fortunately, PyPDF2 also supports this! We can simply add a JavaScript object with the
method `addJS`, and that JavaScript will be registered to run on the PDF opening.
Our JavaScript payload is pretty simple: we just add a single call to `exportDataObject`,
a function provided by the PDF reader. This function takes an object with 2 parameters:
- `cName`, the name of the embedded object, and
- `nLaunch`, an instruction as to what the PDF reader should do with the exported object
`nLaunch` is just an integer, and it has three valid values:
0. Prompt the user for a path and save the file there
1. Prompt the user for a path, save the file, and ask the operating system to open it
2. Pick a temporary location, save the file there, and ask the operating system to open it
That last option sounds great for malware. Assuming we embedded a file called
`myExploit.exe`, we would add the following JavaScript:
cName: "myExploit.exe",
nLaunch: 2,
and it would run as soon as the PDF was opened, right? Well, not quite. Unfortunately,
there's a bit more to it; Adobe Reader (and most other readers) will prevent the launch
of common executable files. For example, `.exe`, `.js`, `.vba`, and `.bat` files cannot
be opened.
# Evading the Blacklist
There are many ways to evade the blacklist, such as Microsoft Word documents with
malicious macros embedded in them ([read more](<Paste>)),
but recently, researchers discovered that another kind of file could be used:
`.SettingContent-ms`. As explai