June 13, 2024

[ad_1]

Interactively analyze and visualize the family tree using Neo4j

It’s been a while since I wrote a blog post. I’ve been busy watching the House of Dragon show. However, now that the wait for season two has begun, I suddenly have some spare time on my hands. I’ve seen some social posts that describe the connections between characters in House of Dragon and Game of Thrones shows. Since any kind of connections are in my domain, and I like to write about them, I decided to write a post that shows how easy it is to quickly analyze and visualize family connections in the Neo4j ecosystem.

The House of Dragon storyline takes place around 200 years before the Game of Thrones. Therefore, some of the characters in Game of Thrones are descendants of the House of Dragon characters. For example, Daenerys Targaryen’s ancestors are Viserys and Rhaenyra Targaryen. Some visualizations of a specific family tree are available, but I have yet to see any attempt to put them all together. As you might know, the royal families in the Ice and Fire universe use marriage as a political tool to spread their influence. As a result, many royal families have joined through marriage at one point or another.

This blog post will teach you how to interactively search and visualize relationships between various characters that require no coding or Cypher knowledge. If you are a data analyst or aspire to become one, I will show you how to execute graph algorithms from a Python Notebook and evaluate their results in the last part of this blog.

A subset of the family tree from the perspective of Sansa Stark. Image by the author.

Family trees consist of people and their relationships. In graph theory, a tree is a particular type of graph. Given that the dataset is best represented by a graph model, it only makes sense to use a graph database to store and analyze the information. If you have read my posts, you know that I like to use Neo4j in my examples as it offers visualization and algorithm support out of the box without having to install anything on your computer. That makes it much easier for readers like you to follow the examples as you don’t have to prepare the environment on your local computer or. However, if you want to use any other tools in your analysis, you are more than welcome.

Neo4j Sandbox

If you want to follow along with the code examples, you need to set up a Neo4j database. I suggest you use a blank project on Neo4j Sandbox. The Sandbox option comes with pre-installed APOC and Graph Data Science libraries that you will need as well as Neo4j Bloom, which is a visualization tool that allows you to interactively explore and analyze the graph.

On the other hand, if you want to create a local environment, you can always download and install the Neo4j Desktop application.

Dataset

We need a dataset that contains characters from both the House of Dragon and the Game of Thrones timeline. Luckily we have some options that don’t include combining different datasets. Like so many times before, I searched for community-based wikis with all the relevant information. I stumbled upon the A Wiki of Ice and Fire webpage that looked like a good candidate containing all the information we need. All the content on the website is available under the CC BY-SA 3.0 license.

Like other wikis, the Ice and Fire page contains a list of all characters and nicely structured information in an infobox for each character.

Some of the information for Rhaenyra Targaryen. Data from Wiki of Ice and Fire.

As you can see, a lot of information is available for characters ranging from family ties to royal duties and allegiances. I have produced a CSV file to make it easier to construct a graph without having to scrape the website every time. When importing the data, I noticed that a couple of data cleaning steps were needed, so I published a Jupyter Notebook dedicated to importing and cleaning the dataset into Neo4j.

You only need to change the Neo4j credentials, and then you can run all the cells to produce the following graph model.

Graph model. Image by the author.

Characters are in the center of our graph model. We know their family connections, which are tagged with the FATHER, MOTHER, and SPOUSE relationships. We also know to which culture they belong and their allegiance to various factions. Lastly, I’ve also imported in which books or TV shows the characters appear.

The universe of Ice and Fire is enormous, as we have 3653 and 563 factions in the graph. The Wiki of Ice and Fire provides even more information about factions, culture, locations, and other entities, which I skipped as the focus of this analysis is the family connections.

Interactive visualizations

We will begin by interactively analyzing the relationship between characters through network visualizations in Neo4j Bloom. Neo4j Bloom allows users with no coding or Cypher knowledge to effectively explore and analyze any graph dataset, which means you can use it to impress your friends and family and help them explore the world of Ice and Fire in a compelling way.

You can open Neo4j Bloom in Sandbox by clicking the dropdown arrow and selecting the Neo4j Bloom option.

How to open Bloom in Sandbox environment. Image by the author.

Next, you will need to click the Create New button on the right hand-side of the screen and pick the Generate Perspective, which automatically infers graph schema.

Automatically generate Bloom perspective. Image by the author.

Now you are all set to go. Bloom offers near-natural language search that you can use to explore the graph.

Near-natural search in Neo4j Bloom. Video by the author.

Neo4j Bloom offers a search bar that can be used to find relevant nodes and examine their relationships and properties. We also have the capability to define more complex graph patterns in the search bar that are beyond the scope of this article. You can check some of the newer features in the following medium post by Jonathan Thein.

I like to define custom search phrase in Neo4j Bloom that can help you find relevant graph patterns more quickly. Instead of describing the whole graph pattern in the search bar, you can simply define it using a Cypher statement. The Cypher statement also supports optional parameters.

For example, say that we want to define a custom search phrase that will help us find connections between arbitrary pairs of characters. In practice, we will try to find the shortest path between two characters while only considering relationships of the family tree.

The following Cypher statement finds the shortest path between two characters with a constraint that it can only traverse FATHER, MOTHER, or SPOUSE relationships.

MATCH (s:Character {name:$person1}), (t:Character {name:$person2})
MATCH p=shortestPath((s)-[:FATHER|MOTHER|SPOUSE*]-(t))
RETURN p

Parameters in Cypher statements are prefixed with a dollar sign ($). You can observe that we did not hardcode any values for the characters’ names as we want to create a tool that will help users find connections between any pair of characters.

To define a search phrase with parameters, we need to include the parameters in the search phrase as well as the Cypher statement. In both instances, the parameters are prefixed with a dollar sign. In addition, the names of the parameters in the search phrase have to match those in the Cypher query.

Define a custom Search phrase in Neo4j Bloom. Image by the author.

The search phrase will be activated every time we type “Find connection from” in the search bar. A nice feature of the Neo4j Bloom is that it offers to autocomplete parameters out of the box. To enable autocomplete, you must describe which suggestions Bloom should provide in the lower part of the search phrase definition.

Bloom autocomplete suggestion definition. Image by the author.

As you can observe from the image, we need to define that the autocomplete suggestion should be provided using the Label-key feature and setting the label value Character and property value name. Essentially, suggestions will be provided by using the name property of Character nodes.

Now we can go ahead and test out the custom Search phrase we just defined. In this example, I wanted to find the shortest path between Tyrion Lannister, one of the main characters in the Game of Thrones series, and Viserys I, which appears in the House of Dragon storyline.

Shortest path between Tyrion Lannister and Viserys I. Image by the author.

Interestingly, the connection from Tyrion to Viserys starts by traversing to Cersei Lannister and Robert Baratheon I. I never knew that Robert’s grandmom is a Rhaelle Targaryen, who is the entry point to the Targaryen family tree in this instance. Targaryen names are confusing to me, so I won’t even try to decipher the rest of the way to the Viserys I.

If you remember, Robert Baratheon usurped the throne from the Mad King Aerys II Targaryen. Since Robert Baratheon is at least one-quarter of Targaryen, I wonder how closely related they are. Luckily, we can simply change the names in the search phrase and examine the shortest path.

Shortest path between Robert Barathon and Aerys II Targaryen. Image by the author.

It seems that Betha Blackwood is the most recent common ancestor of Robert Baratheon and the Mad King Aerys II. They are more closely related than one might imagine. It is easy to go down the rabbit hole of exploring connections in the Ice and Fire universe. Therefore, I suppose you will want to explore more relationships.

Next, we will prepare another search phrase that visualizes all the known ancestors of a particular person. The Cypher statement to define this graph pattern is the following.

MATCH p=(c:Character {name:$person})-[:FATHER|MOTHER*]->()
RETURN p

We just need to input this Cypher statement along with the search phrase definition into Neo4j Bloom.

Define the ancestor search phrase in Neo4j Bloom. Image by the author.

As soon as the search phrase is defined, we can use it in the search bar. In this example, we can visualize all the ancestors of a given person. I decided to examine the ancestors of Margaery Tyrell.

Family tree of Margaery Tyrell. Image by the author.

Not a lot is known or defined about the ancestors of Margaery Tyrell. However, you might be familiar with the Hightower family name. It is said that Margaery Tyrell is distantly related to Otto and Alicent Hightower from the House of Dragon, although no explicit connections are defined.

Graph data science

Perhaps you are a data scientist, or you aspire to become one. Graph analytics and data science offer a wide variety of algorithms that can enhance your analytical toolbox and help you find meaningful insights into highly-connected datasets. In this section, I will show how easily you can integrate graph algorithms into your analytical workflows. Neo4j offers a Python client for Neo4j Graph Data Science library that seamlessly allows you to execute graph algorithms using only Python code.

In this example, we will execute the Weakly-Connected components algorithm on the network of family ties. The Weakly-Connected components algorithm (WCC) is used to find disparate islands or components of nodes within a given network. A node can reach all the other nodes in the same component when you disregard the relationship direction.

Visualized weakly connected components in a sample graph. Image by the author.

Thomas, Amy, and Michael form a weakly connected component, and Alicia and John form the other. For example, Alicia and Michael are not in the same component as no path exists between the two.

In the context of the family network, the WCC algorithm can help you detect which families were joined together through marriage or kids at some point in time. In addition, the WCC might help us evaluate if any larger family components are fighting for power. For example, if Lannisters and Stark had been joined by marriage and Targaryens and Baratheons would form another component, then we might hypothesize that these two components are fighting for the overall control of the seven kingdoms. Note that this is just a made-up example and has no roots in the actual Ice and Fire universe.

I have prepared a Jupyter notebook that contains all the code to follow along with the WCC example.

First, we need to project an in-memory graph that contains relevant nodes and relationships for our use case. For example, since we want to focus on family ties, we will include the Character nodes with the MOTHER, FATHER, and SPOUSE relationships.

G, res = gds.graph.project("family", "Character", ["MOTHER", "FATHER", "SPOUSE"])

To execute the Weakly-connected component algorithm on family network projection, we simply run the gds.wcc.stream method. The stream mode of the algorithm retrieves the algorithm results as a Pandas Dataframe.

# Execute WCC algorithm
wcc_df = gds.wcc.stream(G)
# Fetch name property from the database
wcc_df["name"] = [el["name"] for el in gds.util.asNodes(wcc_df["nodeId"].to_list())]
# Derive the last name
wcc_df["last_name"] = [
el.split(" ")[-1] if len(el.split(" ")) > 1 and len(el.split(" ")[-1]) > 3 else None
for el in wcc_df["name"]
]

Results

We added two additional lines that retrieve the name property of characters from the database and extract the last name. If you have some experience with Python code, you can see that the last name is extracted from people with more than a single word in their name and the last word in their name has more than 3 characters. The last name extraction is not perfect, but it is good enough for our presentation.

The componentId column describes the community to which a node belongs to. If we want to calculate the size of the components, we use Pandas groupby method to aggregate and calculate the sizes of components.

wcc_df.groupby("componentId").size().sort_values(ascending=False).to_frame(
"componentSize"
).reset_index().head()

Results

It looks like the largest component contains 785 members. Believe it or not, this is actually a common occurrence in real-life networks where a network is composed of a single super component that spans a decent part of the network and then a couple smaller components.

p.s. If you prefer Cypher aggregations to Pandas aggregations, you can always use Cypher instead.

It probably takes a lot of different families joining to get a total of 785 members in a single community. Also, note that there needs to be only one marriage between two royal families, and they are joined together in a single component. As the last part of the analysis, we will examine which top ten families and how many of their members are in the largest component.

largest_component = wcc_df.groupby('componentId').size().sort_values(
ascending=False
).reset_index()['componentId'][0]
wcc_df[wcc_df["componentId"] == largest_component].groupby("last_name").size().sort_values(
ascending=False
).to_frame("count").reset_index().head(10)

Results

I would say that the history of the Ice and Fire universe is royal families marrying and killing each other. Interestingly how that works. For example, we learned that Robert Baratheon was at least one-quarter Targaryen through his grandmother but was hellbent on killing the last true living Targaryen.

Since we have some information about the time of birth and death of characters, it might be interesting to run the WCC algorithm on the family network through time and evaluate how to marriages effect the alliances and power of the families.

Summary

A graph is a great data model used to represent highly-connected datasets. When dealing with many explicit or implicit relationships between your data points, it might be worth investigating them through the lens of graph data science. Graph data science toolbox offers a variety of algorithms that derive valuable insights by analyzing how data points are connected. The Ice and Fire universe is a lovely first network dataset that can help you learn the basics of graph visualization and analytics that you can use on your real-world problems. Therefore, I encourage you to create a Sandbox project and get started on your graph analytics journey.

As mentioned, the both the import as well as the analysis notebooks are available on GitHub.

Investigate family connections between House of the Dragon and Game of Thrones characters Republished from Source https://towardsdatascience.com/investigate-family-connections-between-house-of-dragon-and-game-of-thrones-characters-ff2afd5bdb82?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed

<!–

–>

[ad_2]

Source link