Listening to macaques

Post by Josue L., an undergraduate in Computer Science and Interdisciplinary Neuroscience at Portland State University. Josue is a research intern at the Oregon National Primate Research Center where he studies animal vocalizations using computational methods.

Language is all around us

Language is all around us, extending far beyond human speech. Many animal species communicate through complex vocalizations, yet we still lack a clear understanding of what these signals represent. In some cases, humans have even been able to communicate directly with other species. One well-known example is Koko, a gorilla who was taught to use a form of sign language to interact with humans.

While cases like this suggest that some level of cross-species communication is possible, they also highlight a larger gap in our understanding. Animal vocalizations can convey information about environment, social structure, and behavior, yet their meaning is not always clear.

LEARN MORE: The Gorilla Foundation

LEARN MORE: Koko knows over 1100 signs in American Sign Language (ASL)

LEARN MORE: Sign Language: How the Brain Represents Phonology without Sound

LEARN MORE: The signing brain: the neurobiology of sign language

This raises a fundamental question: how can we begin to systematically analyze and understand a form of communication that we cannot directly translate?

I ended up getting lucky with this project, because it lined up really well with my interest in language. Growing up, I moved around quite a bit, and the more countries I lived in, the more I got into learning new languages. I grew up speaking English, Spanish, and Portuguese, and picked up German almost a decade ago. For me it was never just about learning basic vocabulary or grammar. I like understanding the culture and the history behind the language. Because of that, I’ve always been interested in how language works, and how different influences (including geography, culture, diet, and accent) create what feels like an endless range of variation.

LEARN MORE: Multilingual Brains!

LEARN MORE: Code Switching and the Bilingual Brain

LEARN MORE: The effect of bilingualism on brain development from early childhood to young adulthood

LEARN MORE: Want a younger brain? Learn another language

In early 2024 at Portland State University I enjoyed Winterreise, a German song cycle by Franz Schubert, which tells a deeply emotional story through music and poetry. It was sung in the original German, and was one of the most memorable performances that I’ve attended. My favorite song is ‘Die Post‘.

LEARN MORE: PSU Voice Area

LEARN MORE: PSU Music & Theater

LEARN MORE: Franz Schubert: Winterreise – Ian Bostridge Live Concert

Beyond the technical side, what I’ve always found interesting about language is how it shapes the way we think and interact. I noticed early on as a child that my personality shifts (and sometimes in drastic ways) depending on the language I’m speaking or the country I’m living in, influencing how confident, formal, and expressive I am. That idea became even more apparent while learning German. The lyrics alone only tell part of the story. Understanding the cultural background, historical setting, emotional themes, and even how German expresses certain ideas changes how the language feels and is interpreted.

Language is more than vocabulary. It carries history, culture, identity, and different ways of thinking.

And in many ways, that idea unintentionally carried over into this project. Instead of working with words or grammar, I was focusing on patterns in sound, trying to understand communication without directly knowing what was being said. It reminded me of when I first visited Germany (Berlin) shortly after starting my learning journey. I would listen closely and try to recognize patterns, slowly building a sense of how certain sounds connected to meaning.

Just as human language depends on context, macaque vocalizations also depend on social relationships, environment, hierarchy, behavior, and the situation in which the call was produced.

After spending more time in the lab and talking with other lab members, I started to realize that understanding animal communication is more complex than simply labeling sounds. Factors like male versus female, mating versus hunger, or young versus old can influence not only what types of calls are produced, but also how those calls are produced. We also have to consider acoustic features such as call duration, harmonics to noise ratio (dB), jitter, mean peak frequency, and interquartile range, all of which may help distinguish between different types of vocalizations.

LEARN MORE: Vocal variation across young rhesus macaques

LEARN MORE: Acoustic Measurements: Jitter, Shimmer, and Standard Deviation of Fundamental Frequency

As Charles Darwin recognized in his foundational 1872 work, The Expression of the Emotions in Man and Animals, studying emotional communication has the potential to improve our understanding of both how signals are produced and how they evolve over time. In the context of this project, it highlights that vocalizations are not simply random sounds, but products of biology, behavior, and evolutionary history. To fully understand what we are studying, all of these factors must be considered.

IMAGE SOURCE: A Framework for Studying Emotions across Species

LEARN MORE: Gestures, Vocalizations, and Memory in Language Origins

LEARN MORE: Evolution of vocal learning and spoken language

“Die Grenzen meiner Sprache bedeuten die Grenzen meiner Welt.”
— Ludwig Wittgenstein

Research context

One way researchers approach this problem is by studying the vocalizations of non-human primates, such as rhesus macaques. These primates produce a variety of calls that are used in different social and environmental contexts, including alarm signals, social interactions, and group coordination.

IMAGE SOURCE: The vocal repertoire of Tibetan macaques (Macaca thibetana)

LEARN MORE: Rhesus Macaques (Macaca mulatta) in Biomedical Research

Because of their similarities to humans in both brain structure and social behavior, rhesus macaques provide a valuable model for understanding the relationship between communication, behavior, and the brain.

Comparison of the prefrontal cortex (the shaded region in each brain) across four closely related primates, highlighting subtle differences in cell types and genetic expression

IMAGE SOURCE: Study shows differences between brains of primates

The brains of humans and other primates are remarkably similar at the cellular and genetic level. Much of the underlying brain structure is shared, with only small differences leading to significant changes in how each species processes information and communicates.

LEARN MORE: Connectivity reveals homology between the visual systems of the human and macaque brains

LEARN MORE: Molecular and cellular reorganization of neural circuits in the human lineage

LEARN MORE: Mapping macaque to human cortex with natural scene responses

LEARN MORE: Intentional communication between wild bonnet macaques and humans

LEARN MORE: Bio-Linguistics: Monkeys Break Through the Syntax Barrier

LEARN MORE: Communication tied to brain structure and social behavior

LEARN MORE: Brain structural networks underlying language

¡TO THE COMPUTER!

¡WAIT!

Before we can even begin to understand how auditory neurons in the rhesus macaque respond to different vocalizations, we first need data to work with. In my case, this meant working with a large collection of macaque audio recordings, turning them into something a computer can actually analyze. From there, the process involves “cleaning” the data, identifying meaningful segments, and transforming raw sound into visual representations that capture patterns over time.

LEARN MORE: Computational bioacoustics with deep learning: a review and roadmap

LEARN MORE: A Primer of Data Cleaning in Quantitative Research: Handling Missing Values and Outliers

LEARN MORE: Data Cleaning: Detecting, Diagnosing, and Editing Data Abnormalities

This sounds like a problem….

It’s not just noise, I promise.

Unlike in class, where you’re typically given clean datasets and everything is neatly set up for you, the data (most of the time) is messy, slow to process, and far from perfect. When presented with long continuous audio recordings filled with silence, background noise, and overlapping sounds, a computer can’t just “listen” to these files the way we do.

This is where spectrograms come in.

A spectrogram converts sound into a visual representation, showing changes over time. They map how a signal’s frequency components change simultaneously across both time and physical space.

LEARN MORE: Spectrogram MATLAB

LEARN MORE: spectrogram improve the accuracy of bioacoustic classification

By transforming audio into a spectrogram, we turn sound into an image-like representation that can then be expressed as numerical data. I find it interesting to think that something like an image can ultimately be represented as numerical patterns.

First spectrogram! It’s super basic, but I have to start somewhere 🙂

LEARN MORE: Listen to Rhesus Macaque sounds!

Pattern Recognition (literally)

Once the audio has been transformed into spectrograms, we can start teaching the computer what to look for. This begins with labeling the data, where different segments of sound are identified as specific types of vocalizations.

Labeling these vocalizations isn’t always straightforward. It usually comes down to a mix of both the sound itself and the surrounding context. While previous research gives us a good baseline for known call types, observing the animals’ behavior (such as social interactions, aggression, or isolation) helps provide additional meaning.

From the computational side, we also have to consider data quality. We prioritize clear vocalizations with minimal background noise, overlapping calls, or interruptions so the model can learn from the best possible examples. The goal right now is to build a strong foundation before moving into finer distinctions and more challenging subtypes.

Subtypes are more challenging because calls that may initially sound similar can differ in very subtle ways depending on context, behavior, social setting, or acoustic characteristics. They also often require larger datasets and more detailed distinctions in both behavior and acoustic features for the model to reliably separate them.

LEARN MORE: Impact of Data Quality on Detection

Some of the more common macaque vocalizations include:

Coo: Generally lower intensity calls often associated with social contact, group cohesion, or maintaining communication between individuals.
Squeals: High-pitched, tonal vocalizations often emitted by subordinates or juveniles during social interactions, signaling submission or to prevent aggression.
Grunts: Short calls commonly observed during social interactions and close range communication.
Screams: Higher intensity vocalizations frequently associated with conflict, distress, or aggressive encounters.
Threat calls / barks: Calls associated with alarm situations, warning signals, or defensive behavior.
Pant threats: More aggressive displays that can occur during dominance or social conflicts.

LEARN MORE: Variation in Rhesus Macaque vocalizations

These labeled examples are then fed into a computer model, which learns to recognize patterns associated with each type of call. Over time, the model improves its ability to classify new, unseen vocalizations by comparing them to patterns it has already learned. At its core, this process is about pattern recognition, where the computer is learning to make increasingly accurate predictions based on the data it has seen.

When I mention patterns, I’m mostly talking about the acoustic features themselves, things like frequency, duration, timing, harmonics, modulation, spectral energy, and pitch contour. These are the measurable pieces of information the model learns from. One advantage of computers is that they pick up subtle patterns and relationships that are not obvious to us or easy to detect manually.

At the same time, behavioral context matters because similar acoustic features can appear in very different situations. Whether a macaque is grooming, playing, showing aggression, giving an alarm call, caring for offspring, responding to another macaque, competing within the social hierarchy, or maintaining group cohesion, combining these acoustic features with behavior is what helps give a more complete picture.

LEARN MORE: 3Blue1Brown: Large Language models explained briefly

LEARN MORE: 3Blue1Brown: What is a neural network?

LEARN MORE: IBM: What are convolutional neural networks (CNN)?

LEARN MORE: Computerphile: Convolutional neural network explained

Neural versus Artificial

I ended up using the computer programming language Python for most of the workflow and project development, and for running the model. Python is commonly used for this type of work because of its strong ecosystem for machine learning, deep learning, signal processing, and scientific computing, with tools like NumPy, SciPy, Librosa, TensorFlow, PyTorch, and visualization libraries that help to process, analyze, and display data.

However, one of the biggest shifts for me in this project was learning how to think like a researcher. Much of the early effort wasn’t just working in Python, it was about understanding the problem at a fundamental level.

LEARN MORE: Python scientific computing ecosystem

WHAT are we trying to solve? WHERE do we begin? HOW do we plan to execute?

I set out to learn more about the primates we were studying, how their auditory neurons respond to different vocalizations, and what previous research has been done in this area, which, in many cases, was fairly limited.

LEARN MORE: Comparing human and monkey neural circuits for processing

LEARN MORE: Rhesus Vocalizations and Their Representation in the Ventrolateral Prefrontal Cortex

LEARN MORE: Macaques in research | How brain signals control movement

From there, the challenge became figuring out how to move forward.

This meant working with others in the lab to plan out a research strategy, while also dealing with the technical side of the project. Tasks like segmenting noisy audio, deciding which features to extract, managing high-dimensional data, and avoiding issues like overfitting or underfitting all became part of the process. At the same time, there was a personal learning curve, getting up to speed with new concepts, tools, and ways of thinking.

LEARN MORE: Cornell University: Underfitting vs Overfitting

It was a reminder that research isn’t just about applying what you already know, but about constantly learning and adapting to new situations and data. Sometimes that means figuring things out as you go.

Another challenge for me in this project was learning to think in two different modes: computational and neurological.

Coming from a computer science background, with most of my past experience in software engineering internships, I naturally tend to approach problems from a computational perspective, focusing on data, patterns, optimization, space/time complexity, algorithmic design/analysis, and more.

However, this project pushed me to also think in terms of biology (I used to be a biological anthropology major), how auditory neurons respond to sound, how vocalizations are produced, and what they might represent in a behavioral or social context.

Balancing these two perspectives wasn’t always straightforward. At times, I would focus too much on building a model without fully realizing the biological meaning behind the data, or the other way around. It was a bit tricky at the start, but over time it got easier.

¡BIG PICTURE!

IMAGE SOURCE: The Wedding at Cana

Alright, so what’s the point?

Well…

Being able to classify primate vocalizations (i.e., what do these vocalizations mean?) is an important first step toward understanding how communication is structured in the brain. Once we can reliably identify different types of calls, we can begin to connect those patterns to neural activity, behavior, and social context. This not only helps us better understand how animals communicate with each other, but also provides a framework for studying how the brain processes complex signals more generally. Building on this, more advanced models could analyze how specific acoustic features, like frequency, duration, or temporal patterns relate to neural responses, track how vocalizations change across different social or environmental conditions, and even begin to predict neural activity directly from sound.

Projects like this, which sit at the intersection of computer science and neuroscience, two fields that have traditionally been somewhat separate, are exactly where I want to be.

Being able to apply computational methods to biological and neurological problems opens up new ways of understanding how the brain works and how communication is processed. For me, this goes beyond just building models or analyzing data, it’s about using those tools to help bridge the gap between disciplines and contribute to something with real-world impact.

Whether that means improving our grasp of neural systems or helping develop new approaches to studying behavior and communication – or understanding what macaques are actually saying to each other! – for me the potential is incredibly exciting. It’s this combination of technical problem-solving and meaningful application that makes me want to continue pursuing research.