Win a custom New Relic pinball machine! Just refer fellow Data Nerds to register for FutureStack. Register Now

Explaining Security Risks with the 1,000 Most Common Words

11 min read

Many people who work in tech need to create secure systems without being security experts. Inspired by Randall Munroe, creator of the webcomic xkcd, I decided to try a communication experiment.

Munroe created a comic and a book explaining technical topics using only the 1,000 most commonly used English words. The results are at once hilarious and informative. Some ideas can be expressed just as clearly using plain language, and some become amusingly distorted. For example, oxygen is “the part of air you need to breathe, but not the other stuff,” and tectonic plates are "the big flat rocks we live on."

Being a security engineer, I thought I’d take a crack at rewriting the Open Web Application Security Project (OWASP) Top 10, a list of ten key information security weaknesses, in this style. (Munroe’s xkcd website includes a tool to check that your words are all within the “top ten hundred.”) It started as a joke in our internal company Slack, but people liked it more than I thought. So I’ve attempted to clean them up a bit to share them with you, the general public.

The results are... mixed. Here’s what happens when I try my best to explain the OWASP Top 10 using the 1,000 (or ten hundred) most common English words:

  1. Injection: A bad guy can answer the questions a computer asks with the same kinds of words the computer uses to give itself directions. Then the bad guy can make the computer do bad things.
  2. Broken Authentication: If bad guys pretend to be other people, and nothing stops them, they can see other people’s information and do bad things.
  3. Sensitive Data Exposure: If you don’t guard special information that people don’t want others to know about, bad guys can steal it and use it to hurt people.
    Security risks could leave you vulnerable to a hooded person with face unseen, bent over a laptop, trying to steal data.
    A "bad guy"
  4. XML External Entities: Computers can store information as words about things. If the words point to outside things, a bad guy can tell them to point to bad words, and make the computer do bad things.
  5. Broken Access Control: If the computer knows who I am, but does not stop me from looking at stuff that belongs to other people, I can steal or change their information.
  6. Security Misconfiguration: Bad guys can break into computers that are set up wrong.
  7. Cross-Site Scripting: If a computer system stores words that some people have written on one computer, and shows those words to other people on another computer, a bad guy can tell the computer system to store bad words that do bad things when other people's computers read the words.
  8. Insecure Deserialization: If someone stores information as words, and your computer turns those words back into information, then a bad guy can put in words that make the computer do bad things.
  9. Using Components with Known Vulnerabilities: If you use computer directions from other people, and bad guys know that those directions let them do bad things, then the bad guys can do bad things to your computer.
  10. Insufficient Logging and Monitoring: If you don't watch your computer system hard enough, or tell the computer to write down enough information about what it does, you won’t know if bad stuff happens to it.

Because this is the internet, where nuance goes to die, I feel a need to be extremely clear: I am not advocating that everyone should literally write all their technical communications in this extremely simplified style. I’m more interested in what we can learn from the attempt, where it works and where it doesn’t.

Some of them work

I think some of these translate quite well—they sound funny, but they convey the ideas you need to understand in order to grasp this type of security risk.

People shouldn’t be able to “see other people’s information and do bad things.” It’s important to “guard special information that people don’t want others to know about.” Systems should “stop me from looking at stuff that belongs to other people.” They should record enough information that administrators will “know if bad stuff happens” to them. These are all foundational ideas in information security, foundational enough that we create specialized terms for them, like authentication, sensitive data, access controls, and detection.

Specialized terms allow us to communicate quickly about the foundational concepts we already understand and agree on the meanings of. Unfortunately, when talking to people who aren’t as versed in security, these terms can also intimidate people. It's not just people in non-technical roles, but also software engineers and system administrators with many years of experience who aren’t versed in infosec jargon. Sometimes that’s fine, because everyone has areas where they aren’t experts. Sometimes—like when you’re trying to get everyone on the same page to follow security best practices—specialized security terminology can scare off people from taking simple precautions they’re perfectly qualified to implement.

You don’t have to literally warn against “bad guys'' or tell people to make sure they’re “watching their computer systems hard enough.” But sometimes technical communication can become a little more accessible if you think about what kinds of language is going to feel the most approachable to your audience. In addition to choosing simpler language, it can help to pick examples or metaphors your audience is familiar with. You can link to external explanations of terms you use in an online document. Or, you can start a presentation with brief definitions of some of the key terms you plan to use throughout the talk. (Asking someone less familiar with the topic to read your draft or listen to your practice presentation helps!)

Some of them don’t

There are some concepts that just aren’t possible to explain without technical jargon, or at least without words specialized enough that they don’t show up in the top ten hundred. Take this example, which was my favorite one to write:

XML External Entities: Computers can store information as words about things. If the words point to outside things, a bad guy can tell them to point to bad words, and make the computer do bad things.

On the one hand, figuring out how to write this made me think about what the fundamental problem is with XML from a security standpoint. I know that by “things” I meant “objects” and by “words” I meant “data stored in a standardized text format.” I also know that “point to outside words” means references to other files (in this case, primarily through XML Document Type Definitions, or DTDs).

On the other hand... to someone who can’t look inside my brain and get the definitions of all these terms, the end result is completely incoherent. “Words about things” obscures the specific meaning of “object” in object-oriented programming, and the ways that serialization formats differ from human language. “A bad guy can tell them to point to bad words”—what makes the words “bad,” and how is the bad guy “telling” the benign words to point to unintended places? What kind of bad things can the computer be made to do? Even asking these questions is a generous interpretation; if you’re looking at sentences like these and trying to make sense of them beyond the entertainment value, you probably can’t get enough of a grasp on what they’re trying to say to even know where to start asking technical questions.

So the value isn’t in reading these sentences and suddenly becoming an expert on XML External Entities and why they pose security risks. For me, it’s in forcing my brain to question the assumptions I might hold about how much of these risks I really understand, and getting into the details I might have overlooked. I’m looking at the problem from a different angle and in doing so, catching details I might have missed the first time around. Why is XML so powerful, and therefore so appealing to application developers? Why is it especially vulnerable compared to other data serialization formats? What are the specific means an attacker would use to hijack XML to run malicious commands, and how can we guard against them?

Sometimes trying and failing to explain a concept in a simpler way can uncover ideas you might not have run into otherwise, or prompt you to more thoroughly explain attack vectors, which you took for granted that your audience already understood. But it doesn’t mean that this approach is an effective way to communicate those ideas to other people.

How do we communicate?

You can find countless examples of people trying to make technical concepts easier to understand. Consider people like Julia Evans and Amy Wibowo at Bubblesort Zines who make cartoons simplifying technical concepts, and projects like Simple English Wikipedia, or even tools for programming, like Scratch (a graphic programming language designed for children) and Glitch (a site that helps people of all skill levels to quickly build, deploy, and get help with JavaScript applications).

Specifically for security, intentionally vulnerable applications like OWASP Juice Shop help demonstrate security vulnerabilities in a digestible way. These apps are valuable to learners of all levels, helping people get over the hurdle of some technology that seems unapproachable. For information security, many people who work in technology need to be able to create secure systems without being security experts.

At the same time, there can be temptation to oversimplify things. You can't treat reading the words “the computer [should] stop me from looking at stuff that belongs to other people” as interchangeable with having the experience needed to implement access controls in a complex application, and so on. Additionally, creating a sentence from “simple” words does not always make the sentence easier to understand; language doesn’t work that way.

At a minimum, it’s necessary to be able to define domain-specific terms. First, those terms were created for a reason (“object” has a specific meaning in object-oriented programming, and translating it to “thing” does not convey the same meaning). In addition, having to constantly repeat definitional phrases (like “storing information as words” for “serialization”) is tedious and distracting.

At the end of the day, choosing the words to show an idea to the people who need to understand it is hard, and using only the top ten hundred words in English is one approach that can help us look at it in a different way. For information security, anyone who uses or makes computer systems has best practices they can follow to stop bad guys from stealing their information. I think we can all look for some ways to help our users, developers, and administrators understand a little better.

Did you know that you can help improve New Relic docs by filing issues and submitting pull requests? We have open sourced docs.newrelic.com.