Data science vs Orbicide lyrics

Our lyrics are supposed to refer to the Bible. Is it true? What patterns emerge when looking at vocabulary? We’ve performed a simple text mining exercise to answer these questions.

This post is mostly addressed to geeks. The weaponry used includes Python, Spacy NLP package and a couple of other simple tools.

Which words are most characteristic of Orbicide lyrics? Which words stand out most visibly when compared to “standard English”?

To analyse the lyrics vocabulary we downloaded the lyrics from this site and performed simple pre-processing to see just words:

  1. stripping off HTML tags and template elements; this was easy thanks to the BeautifulSoup package
  2. tagging text with Spacy to obtain base forms (for instance if we saw went, we wanted to get go; this is one of standard Natural Language Processing techniques

What resulted was a list of subsequent words (base forms) that appeared in the lyrics. These could be counted directly and we did this, but this way we saw that most frequent words were function words (the, be, and, of etc.).
This is typical for any text. What is interesting is that the word lord ranked 12th most common, this gives a hint about the topic.

To get a better insight into the characteristic features of Orbicide language we had to compare the word list to word list from standard English and find trends.

There are many approaches to this. One of them is the log-likelihood ratio. This formula allows to score words appearing in two texts and find those which correspond to most striking differences (when looking at counts). We compared Orbicide lyrics to Brown corpus, a well-known set of English texts of various origins and genres. Here are the words which are most characteristic to Orbicide lyrics as compared to the Brown corpus:


And here is a visualisation containing more words. The bigger a word, the more characteristic it is (according to log-likelihood ratio, at least) to the language of Orbicide.

Orbicide lyrics — word cloud
The vocabulary clearly refers to biblical topics (words such as lord, transgress, Sodom, Gomorrah, blessing, altar). Perhaps words refering to murder, destruction and other forms of violence are even more evident (not unlike in Bible itself; smite, plunder, persecutor, genocide, rape).

Here is the full record of the exercise, containing more technical details and intermediate steps: Orbicide_lyrics_analysis.ipynb (the notebook is rendered properly on desktop browsers, not sure about mobile).

