June 13, 2024


What was it that first drew you to the field of natural language processing?

Before college, I was equally interested in languages and science. At the time, I thought this was an unsolvable dichotomy, and that I had to make a hard choice between them. I decided to pursue computer science in college, and was starting to make peace with the thought that language would not be a significant part of my career going forward. However, a light bulb went on when I was looking for a topic for my college dissertation and found a project proposal for analyzing sentiment in Tweets.

It sounds cliché now, because sentiment analysis has been beaten to death since. But at the time, this topic brought together multiple elements with growing potential: deep learning, social media as an untapped data source, and natural language processing. In grad school, I studied machine translation (specifically, we were investigating whether recurrent neural networks should be bidirectional); I remember being fascinated with the idea of projecting language down into a vectorial semantic space, and all its philosophical implications, like the meaning of meaning. It felt like I had found my place at the intersection of computer science, language, and philosophy that I had initially given up on.

How has your approach to NLP evolved over time?

During my career, the most significant event so far has been the rise of transfer learning — reusing a large general-purpose language model like BERT or GPT-3 to solve almost any language task. My focus has shifted away from task-specific techniques (e.g. Should we use a bidirectional RNN for machine translation?) to broader transfer learning research: How can we improve the foundational models and thus benefit all NLP tasks at once?

More specifically, in the last few years, I studied ways of enhancing Transformers, the building block for transfer learning: how to make them more computationally effective, how to increase their multilingual ability, and how to prevent them from “hallucinating” (i.e., producing plausible-sounding falsehoods).

Based on your own career path, do you have any advice to share with early-career ML practitioners about the kinds of projects they should focus on?

I think machine learning is moving too fast to be strategic about the specific subfields or projects that you pursue. The most sustainable strategy is to follow your own curiosity and dig deep into whatever sparks your interest, be it out of style (like word2vec embeddings) or the shiniest new toy (e.g. text-to-image models).

Creativity starts when you truly internalize a concept or technology, are able to question it, improve it, or use it in a new context. For me, the highest ROI is diving straight into the code — which is the ground truth, immune from the faults of ML literature (embellishments, hyperbole, and forced comparisons with the human mind).

I remember having a hard time understanding the Transformers paper, until I looked at the code and realized that, behind the pretentious terminology (e.g. “multi-headed self-attention”) are just some matrix multiplications, which is what all models really boil down to. I went from being overwhelmed by a seemingly impenetrable piece of research to being asked to film a tutorial on Transformers for TensorFlow’s official YouTube channel.

What inspired you to publish on these topics for a broader audience—and how do you choose your topics?

When participating in reading groups at Google, I found a lot of satisfaction in presenting math-heavy research papers in the simplest terms possible, by stripping off the academic obfuscation and getting to the core message. I often received very encouraging feedback from my peers, and realized that, if this sort of reframing can benefit experts in the field, then it would have an even bigger impact on a broader and less specialized audience.

Regarding topic selection, I always write about topics that are closely relevant to my current work, since writing forces me to gain more clarity for my own projects.

Looking ahead, what changes do you hope to see in your field over the next year or two?

I strongly believe that the future of machine learning is multimodal. Transfer learning has centralized natural language processing and computer vision. It is now inconceivable to operate in any of these two fields without starting from a general-purpose pre-trained model.

Additionally, the scale of unimodal models (text-only, image-only) is nearing the ceiling of data available on the internet. It is only natural that the next great unification brings together multiple modalities: text, image, continuous sensor data, etc. The first signs are here: CLIP already fuses text and image into a single common semantic space and powers applications like DALL·E 2, with tremendous real-world impact. This trend will most likely continue, enabling well-rounded agents that can navigate the world around us in all its complexity.

The Most Sustainable Strategy Is to Follow Your Own Curiosity Republished from Source https://towardsdatascience.com/the-most-sustainable-strategy-is-to-follow-your-own-curiosity-8a852649bff3?source=rss—-7f60cf5620c9—4 via https://towardsdatascience.com/feed


Source link