Data visualization is a key step in a data science pipeline. Python offers great possibilities when it comes to representing some data graphically, but it can be hard and time-consuming to create the appropriate chart.

The Python Graph Gallery is here to help. It displays many examples, always providing the reproducible code. It allows to build the desired chart in minutes.

The gallery currently provides more than 400 chart examples. Those examples are organized in 40 sections, one for each chart types: scatterplot, boxplot, barplot, treemap and so on. Those chart types are organized in 7 big families as suggested by data-to-viz.com: one for each visualization purpose.

It is important to note that not only the most common chart types are covered. Lesser known charts like chord diagrams, streamgraphs or bubble maps are also available.

Each section always starts with some very basic examples. It allows to understand how to build a chart type in a few seconds. Hopefully applying the same technique on another dataset will thus be very quick.

For instance, the scatterplot section starts with this matplotlib example. It shows how to create a dataset with pandas and plot it with the `plot()`

function. The main graph argument like `linestyle`

and `marker`

are described to make sure the code is understandable.

The gallery uses several libraries like seaborn or plotly to produce its charts, but is mainly focus on matplotlib. Matplotlib comes with great flexibility and allows to build any kind of chart without limits.

A whole page is dedicated to matplotlib. It describes how to solve recurring issues like customizing axes or titles, adding annotations (see below) or even using custom fonts.

The gallery is also full of non-straightforward examples. For instance, it has a tutorial explaining how to build a streamchart with matplotlib. It is based on the `stackplot()`

function and adds some smoothing to it:

Last but not least, the gallery also displays some publication ready charts. They usually involve a lot of matplotlib code, but showcase the fine grain control one has over a plot.

Here is an example with a post inspired by Tuo Wang’s work for the tidyTuesday project. (Code translated from R available here)

The python graph gallery is an ever growing project. It is open-source, with all its related code hosted on github.

Contributions are very welcome to the gallery. Each blogpost is just a jupyter notebook so suggestion should be very easy to do through issues or pull requests!

The python graph gallery is a project developed by Yan Holtz in his free time. It can help you improve your technical skills when it comes to visualizing data with python.

The gallery belongs to an ecosystem of educative websites. Data to viz describes best practices in data visualization, the R, python and d3.js graph galleries provide technical help to build charts with the 3 most common tools.

For any question regarding the project, please say hi on twitter at @R_Graph_Gallery!

]]>Code-switching is the practice of alternating between two or more languages in the context of a single conversation, either consciously or unconsciously. As someone who grew up bilingual and is currently learning other languages, I find code-switching a fascinating facet of communication from not only a purely linguistic perspective, but also a social one. In particular, I’ve personally found that code-switching often helps build a sense of community and familiarity in a group and that the unique ways in which speakers code-switch with each other greatly contribute to shaping group dynamics.

This is something that’s evident in seven-member pop boy group WayV. Aside from their discography, artistry, and group chemistry, WayV is well-known among fans and many non-fans alike for their multilingualism and code-switching, which many fans have affectionately coined as “WayV language.” Every member in the group is fluent in both Mandarin and Korean, and at least one member in the group is fluent in one or more of the following: English, Cantonese, Thai, Wenzhounese, and German. It’s an impressive trait that’s become a trademark of WayV as they’ve quickly drawn a global audience since their debut in January 2019. Their multilingualism is reflected in their music as well. On top of their regular album releases in Mandarin, WayV has also released singles in Korean and English, with their latest single “Bad Alive (English Ver.)” being a mix of English, Korean, and Mandarin.

As an independent translator who translates WayV content into English, I’ve become keenly aware of the true extent and rate of WayV’s code-switching when communicating with each other. In a lot of their content, WayV frequently switches between three or more languages every couple of seconds, a phenomenon that can make translating quite challenging at times, but also extremely rewarding and fun. I wanted to be able to present this aspect of WayV in a way that would both highlight their linguistic skills and present this dimension of their group dynamic in a more concrete, quantitative, and visually intuitive manner, beyond just stating that “they code-switch a lot.” This prompted me to make step charts - perfect for displaying data that changes at irregular intervals but remains constant between the changes - in hopes of enriching the viewer’s experience and helping make a potentially abstract concept more understandable and readily consumable. With a step chart, it becomes more apparent to the viewer the extent of how a group communicates, and cross-sections of the graph allow a rudimentary look into how multilinguals influence each other in code-switching.

This tutorial on creating step charts uses one of WayV’s livestreams as an example. There were four members in this livestream and a total of eight languages/dialects spoken. I will go through the basic steps of creating a step chart that depicts the frequency of code-switching for just one member. A full code chunk that shows how to layer two or more step chart lines in one graph to depict code-switching for multiple members can be found near the end.

First, we import the required libraries and load the data into a Pandas dataframe.

```
import pandas as pd
import matplotlib.pyplot as plt
import seaborn as sns
```

This dataset includes the timestamp of every switch (in seconds) and the language of switch for one speaker.

```
df_h = pd.read_csv("WayVHendery.csv")
HENDERY = df_h.reset_index()
HENDERY.head()
```

index | time | lang |
---|---|---|

0 | 2 | ENG |

1 | 3 | KOR |

2 | 10 | ENG |

3 | 13 | MAND |

4 | 15 | ENG |

With the dataset loaded, we can now set up our graph in terms of determining the size of the figure, dpi, font size, and axes limits. We can also play around with the aesthetics, such as modifying the colors of our plot. These few simple steps easily transform the default all-white graph into a more visually appealing one.

```
fig, ax = plt.subplots(figsize = (20,12))
```

```
sns.set(rc={'axes.facecolor':'aliceblue', 'figure.facecolor':'c'})
fig, ax = plt.subplots(figsize = (20,12), dpi = 300)
plt.xlabel("Duration of Instagram Live (seconds)", fontsize = 18)
plt.ylabel("Cumulative Number of Times of Code-Switching", fontsize = 18)
plt.xlim(0, 570)
plt.ylim(0, 85)
```

Following this, we can make our step chart line easily with matplotlib.pyplot.step, in which we plot the x and y values and determine the text of the legend, color of the step chart line, and width of the step chart line.

```
ax.step(HENDERY.time, HENDERY.index, label = "HENDERY", color = "palevioletred", linewidth = 4)
```

Of course, we want to know not only how many switches there were and when they occurred, but also to what language the member switched. For this, we can write a for loop that labels each switch with its respective language as recorded in our dataset.

```
for x,y,z in zip(HENDERY["time"], HENDERY["index"], HENDERY["lang"]):
label = z
ax.annotate(label, #text
(x,y), #label coordinate
textcoords = "offset points", #how to position text
xytext = (15,-5), #distance from text to coordinate (x,y)
ha = "center", #alignment
fontsize = 8.5) #font size of text
```

Now add a title, save the graph, and there you have it!

```
plt.title("WayV Livestream Code-Switching", fontsize = 35)
fig.savefig("wayv_codeswitching.png", bbox_inches = "tight", facecolor = fig.get_facecolor())
```

Below is the complete code for layering step chart lines for multiple speakers in one graph. You can see how easy it is to take the code for visualizing the code-switching of one speaker and adapt it to visualizing that of multiple speakers. In addition, you can see that I’ve intentionally left the title blank so I can incorporate external graphic adjustments after I created the chart in Matplotlib, such as the addition of my social media handle and the use of a specific font I wanted, which you can see in the final graph. With visualizations being all about communicating information, I believe using Matplotlib in conjunction with simple elements of graphic design can be another way to make whatever you’re presenting that little bit more effective and personal, especially when you’re doing so on social media platforms.

```
# Initialize graph color and size
sns.set(rc={'axes.facecolor':'aliceblue', 'figure.facecolor':'c'})
fig, ax = plt.subplots(figsize = (20,12), dpi = 120)
# Set up axes and labels
plt.xlabel("Duration of Instagram Live (seconds)", fontsize = 18)
plt.ylabel("Cumulative Number of Times of Code-Switching", fontsize = 18)
plt.xlim(0, 570)
plt.ylim(0, 85)
# Layer step charts for each speaker
ax.step(YANGYANG.time, YANGYANG.index, label = "YANGYANG", color = "firebrick", linewidth = 4)
ax.step(HENDERY.time, HENDERY.index, label = "HENDERY", color = "palevioletred", linewidth = 4)
ax.step(TEN.time, TEN.index, label = "TEN", color = "mediumpurple", linewidth = 4)
ax.step(KUN.time, KUN.index, label = "KUN", color = "mediumblue", linewidth = 4)
# Add legend
ax.legend(fontsize = 17)
# Label each data point with the language switch
for i in (KUN, TEN, HENDERY, YANGYANG): #for each dataset
for x,y,z in zip(i["time"], i["index"], i["lang"]): #looping within the dataset
label = z
ax.annotate(label, #text
(x,y), #label coordinate
textcoords = "offset points", #how to position text
xytext = (15,-5), #distance from text to coordinate (x,y)
ha = "center", #alignment
fontsize = 8.5) #font size of text
# Add title (blank to leave room for external graphics)
plt.title("\n\n", fontsize = 35)
# Save figure
fig.savefig("wayv_codeswitching.png", bbox_inches = "tight", facecolor = fig.get_facecolor())
```

Languages/dialects: Korean (KOR), English (ENG), Mandarin (MAND), German (GER), Cantonese (CANT), Hokkien (HOKK), Teochew (TEO), Thai (THAI)

186 total switches! That’s approximately one code-switch in the group every 2.95 seconds.

And voilà! There you have it: a brief guide on how to make step charts. While I utilized step charts here to visualize code-switching, you can use them to visualize whatever data you would like. Please feel free to contact me here if you have any questions or comments. I hope you enjoyed this tutorial, and thank you so much for reading!

]]>The other day I was homeschooling my kids, and they asked me: “Daddy, can you draw us all possible non-isomorphic graphs of 3 nodes”? Or maybe I asked them that? Either way, we happily drew all possible graphs of 3 nodes, but already for 4 nodes it got hard, and for 5 nodes - plain impossible!

So I thought: let me try to write a brute-force program to do it! I spent a few hours sketching some smart dynamic programming solution to generate these graphs, and went nowhere, as apparently the problem is quite hard. I gave up, and decided to go with a naive approach:

- Generate all graphs of N nodes, even if some of them look the same (are isomorphic). For \(N\) nodes, there are \(\frac{N(N-1)}{2}\) potential edges to connect these nodes, so it’s like generating a bunch of binary numbers. Simple!
- Write a program to tell if two graphs are isomorphic, then remove all duplicates, unworthy of being presented in the final picture.

This strategy seemed more reasonable, but writing a “graph-comparator” still felt like a cumbersome task, and more importantly, this part would itself be slow, as I’d still have to go through a whole tree of options for every graph comparison. So after some more head-scratching, I decided to simplify it even further, and use the fact that these days the memory is cheap:

- Generate all possible graphs (some of them totally isomorphic, meaning that they would look as a repetition if plotted on a figure)
- For each graph, generate its “description” (like an adjacency matrix, of an edge list), and check if a graph with this description is already on the list. If yes, skip it, we got its portrait already!
- If however the graph is unique, include it in the picture, and also generate all possible “descriptions” of it, up to node permutation, and add them to the hash table. To make sure no other graph of this particular shape would ever be included in our pretty picture again.

For the first task, I went with the edge list, which made the task identical to generating all binary numbers of length \(\frac{N(N-1)}{2}\) with a recursive function, except instead of writing zeroes you skip edges, and instead of writing ones, you include them. Below is the function that does the trick, and has an additional bonus of listing all edges in a neat orderly way. For every edge \(i \rightarrow j\) we can be sure that \(i\) is lower than \(j\), and also that edges are sorted as words in a dictionary. Which is good, as it restricts the set of possible descriptions a bit, which will simplify our life later.

```
def make_graphs(n=2, i=None, j=None):
"""Make a graph recursively, by either including, or skipping each edge.
Edges are given in lexicographical order by construction."""
out = []
if i is None: # First call
out = [[(0, 1)] + r for r in make_graphs(n=n, i=0, j=1)]
elif j < n - 1:
out += [[(i, j + 1)] + r for r in make_graphs(n=n, i=i, j=j + 1)]
out += [r for r in make_graphs(n=n, i=i, j=j + 1)]
elif i < n - 1:
out = make_graphs(n=n, i=i + 1, j=i + 1)
else:
out = [[]]
return out
```

If you run this function for a small number of nodes (say, \(N=3\)), you can see how it generates all possible graph topologies, but that some of the descriptions would actually lead to identical pictures, if drawn (graphs 2 and 3 in the list below).

```
[(0, 1), (0, 2), (1, 2)]
[(0, 1), (0, 2)]
[(0, 1), (1, 2)]
[(0, 1)]
```

Also, while building a graph from edges means that we’ll never get lonely unconnected points, we can get graphs that are smaller than \(n\) nodes (the last graph in the list above), or graphs that have unconnected parts. It is impossible for \(n=3\), but starting with \(n=4\) we would get things like `[(0,1), (2,3)]`

, which is technically a graph, but you cannot exactly wear it as a piece of jewelry, as it would fall apart. So at this point I decided to only visualize fully connected graphs of exactly \(n\) vertices.

To continue with the plan, we now need to make a function that for every graph would generate a family of its “alternative representations” (given the constraints of our generator), to make sure duplicates would not slip under the radar. First we need a permutation function, to permute the nodes (you could also use a built-in function in `numpy`

, but coding this one from scratch is always fun, isn’t it?). Here’s the permutation generator:

```
def perm(n, s=None):
"""All permutations of n elements."""
if s is None:
return perm(n, tuple(range(n)))
if not s:
return [[]]
return [[i] + p for i in s for p in perm(n, tuple([k for k in s if k != i]))]
```

Now, for any given graph description, we can permute its nodes, sort the \(i,j\) within each edge, sort the edges themselves, remove duplicate alt-descriptions, and remember the list of potential impostors:

```
def permute(g, n):
"""Create a set of all possible isomorphic codes for a graph,
as nice hashable tuples. All edges are i<j, and sorted lexicographically."""
ps = perm(n)
out = set([])
for p in ps:
out.add(
tuple(sorted([(p[i], p[j]) if p[i] < p[j] else (p[j], p[i]) for i, j in g]))
)
return list(out)
```

Say, for an input description of `[(0, 1), (0, 2)]`

, the function above returns three “synonyms”:

```
((0, 1), (1, 2))
((0, 1), (0, 2))
((0, 2), (1, 2))
```

I suspect there should be a neater way to code that, to avoid using the `list → set → list`

pipeline to get rid of duplicates, but hey, it works!

At this point, the only thing that’s missing is the function to check whether the graph comes in one piece, which happens to be a famous and neat algorithm called the “Union-Find”. I won’t describe it here in detail, but in short, it goes though all edges and connects nodes to each other in a special way; then counts how many separate connected components (like, chunks of the graph) remain in the end. If all nodes are in one chunk, we like it. If not, I don’t want to see it in my pictures!

```
def connected(g):
"""Check if the graph is fully connected, with Union-Find."""
nodes = set([i for e in g for i in e])
roots = {node: node for node in nodes}
def _root(node, depth=0):
if node == roots[node]:
return (node, depth)
else:
return _root(roots[node], depth + 1)
for i, j in g:
ri, di = _root(i)
rj, dj = _root(j)
if ri == rj:
continue
if di <= dj:
roots[ri] = rj
else:
roots[rj] = ri
return len(set([_root(node)[0] for node in nodes])) == 1
```

Now we can finally generate the “overkill” list of graphs, filter it, and plot the pics:

```
def filter(gs, target_nv):
"""Filter all improper graphs: those with not enough nodes,
those not fully connected, and those isomorphic to previously considered."""
mem = set({})
gs2 = []
for g in gs:
nv = len(set([i for e in g for i in e]))
if nv != target_nv:
continue
if not connected(g):
continue
if tuple(g) not in mem:
gs2.append(g)
mem |= set(permute(g, target_nv))
return gs2
# Main body
NV = 6
gs = make_graphs(NV)
gs = filter(gs, NV)
plot_graphs(gs, figsize=14, dotsize=20)
```

For plotting the graphs I wrote a small wrapper for the MatPlotLib-based NetworkX visualizer, splitting the figure into lots of tiny little facets using Matplotlib `subplot`

command. “Kamada-Kawai” layout below is a popular and fast version of a spring-based layout, that makes the graphs look really nice.

```
def plot_graphs(graphs, figsize=14, dotsize=20):
"""Utility to plot a lot of graphs from an array of graphs.
Each graphs is a list of edges; each edge is a tuple."""
n = len(graphs)
fig = plt.figure(figsize=(figsize, figsize))
fig.patch.set_facecolor("white") # To make copying possible (white background)
k = int(np.sqrt(n))
for i in range(n):
plt.subplot(k + 1, k + 1, i + 1)
g = nx.Graph() # Generate a Networkx object
for e in graphs[i]:
g.add_edge(e[0], e[1])
nx.draw_kamada_kawai(g, node_size=dotsize)
print(".", end="")
```

Here are the results. To build the anticipation, let’s start with something trivial: all graphs of 3 nodes:

All graphs of 4 nodes:

All graphs of 5 nodes:

Generating figures above is of course all instantaneous on a decent computer, but for 6 nodes (below) it takes a few seconds:

For 7 nodes (below) it takes about 5-10 minutes. It’s easy to see why: the brute-force approach generates all \(2^{\frac{n(n-1)}{2}}\) possible graphs, which means that the number of operations grows exponentially! Every increase of \(n\) by one, gives us \(n-1\) new edges to consider, which means that the time to run the program increases by \(~2^{n-1}\). For \(n=7\) it brought me from seconds to minutes, for \(n=8\) it would have shifted me from minutes to hours, and for \(n=9\), from hours, to months of computation. Isn’t it fun? We are all specialists in exponential growth these days, so here you are :)

The code is available as a Jupyter Notebook on my GitHub. I hope you enjoyed the pictures, and the read! Which of those charms above would bring most luck? Which ones seem best for divination? Let me know what you think! :)

]]>