Stanford CS330: Multi-Task and Meta-Learning, 2019 | Lecture 1 – Introduction & Overview


Okay. Let’s get started. Uh, so hi everyone. Welcome to CS330, if you’re expecting to be in a different class, then you’re probably in the wrong place. Uh, I’m Chelsea Finn. I’m a Professor in the Computer Science Department. Uh, yeah. So let’s get started. So first I’m gonna go over course logistics and then we’ll go over a little bit of content, uh, to get started on some, some aspects for motivation for why we care about multi-task learning and meta-learning. Um, so first for logistics, uh, some information and resources. So uh, this is the full staff for the course. Um, we have four TAs. I think one or two of them are here today. Uh, maybe you can kind of stand up and introduce yourself. Hey, I’m- And then there’s- great. So those are the first and last one- able to make it today. Um, but there are also TAs for the course as well. Uh, They’ll be great resources for you as you go about taking the course. Um, the course website is shown here and there’s a lot of information on the course website, uh, beyond what we’ll be kind of covering in the logistics here. Uh, also we have a Piazza, this will be for, for questions as, as they come up in the course. This is the staff mailing list as well. Um, and each of us will be holding one hour of office hours per week. My office hours are Wednesday after class and the other office hours will be posted on the course website. We’ll have office hours for the first time on Wednesday this week. Great. Um, so prerequisites and enrollment. Uh, the main prerequisite is something- is machine learning experience, basically covering machine learning and reinforcement learning. Uh, CS229 equivalent. we highly recommend having some previous reinforcement learning experience because a num- large portion of the course will include topics in reinforcement learning. Uh, if you’re not currently enrolled, please fill out the enrollment form on the website. Uh, and we will, ah, kind of go through that. We do still have some open spots in the course. Uh, and we’ll, we’ll go through that probably around Tuesday or Wednesday this week. So please fill it out as soon as possible if you’re not enrolled. Um, yeah, and then if, if you fill out the form and didn’t receive a provision number yet, it’s either because you filled out the form very recently, or because, um, we weren’t quite sure if you had the kind of necessary experience in order to succeed in this class. Um, then we also we’ll get back to everyone by, Wednesday this week. Okay, um, all the lectures are being recorded. Ah, and the lectures will be internally released on Canvas as soon as the recording is available. Um, they’ll also be publicly released after the course for the benefit of kind of the general public. Um, despite this, kind of things, like student anonymity will be preserved in the recording. Ah, and so please still feel free to ask questions, ah, throughout the course as well. There are around 20 remote students from the Stanford Center for Professional Development that are joining the course as well remotely. Um, and SCPD is actually the, the folks that will be, uh, allowing, for us to release all the videos online after the course by providing captioning services and other things. Um, so if you have any, any questions about anything up here? Okay. Assignments. So the assignments will all require training neural networks using TensorFlow. Uh, and so if you’re not familiar with TensorFlow already, uh, we’ll be holding a review session on Thursday of this week. Uh, and basically you should be able to understand kind of an overview of TensorFlow. And if you don’t understand an overview of TensorFlow, I’d highly encourage you to go to the review session and ask plenty of questions. Um, this is kind of your opportunity to, to get information about how to use TensorFlow if you aren’t familiar with it already. Um, topics that we’re covering in this course. So the types of topics, these are all listed, all the kind of lecture topics are, are listed on the course website, it’s going to include things like what is multi-task learning, uh, and meta-learning, uh, the basics of these algorithms, uh, including various approaches, ranging from black-box approaches to optimization based approaches, metric learning methods, uh, as well. Uh, how we can view these types of methods in a Hierarchical Bayesian framework, uh, as kind of learning priors over multiple tasks and using those priors. Uh, topics like multi-task reinforcement learning, goal condition reinforcement learning, hierarchical reinforcement learning, uh, topics and meta-reinforcement learning, uh, as well as some open problems, uh, invited lectures and research talks from, uh, faculty in the field. Uh, one thing worth noting here is that there’ll be an emphasis on deep learning, uh, as kind of indicated by the title of the course as well as an emphasis on reinforcement learning. So, uh, a little under-half of the course is going to be kind of focusing on different topics and reinforcement learning because, uh, that is where some of these, some of these types of techniques become a lot more interesting and a lot more challenging. Um, topics we won’t cover, ah, due to kind of the constraints of, of how much time we have in this course, we won’t cover topics in AutoML. Uh, this includes things like architecture search, like hyper-parameter optimization and learning optimizers. Uh, but beyond that, I think we’ll be covering most topics in meta-learning and multitask learning. Uh, but we’ll also have an emphasis on deep learning approaches so we won’t be cove- like thoroughly covering things in, ah, like things without neural networks for example. And I will kind of describe why, uh, later in this lecture. Okay, um, so the course format. So we’ll have three types of course sessions. Ah, we’ll have nine lectures, uh, that are kind of given by me. Uh, we’ll have seven student reading sessions, which will consist of presentations by you presenting different papers and we’ll have a discussion of different papers, uh, in the, uh, multitask learning and meta learning literature. Uh, and then we’ll also have three guest lectures. All students that are enrolled in the course are responsible for giving one group presentation of a paper. Uh, and in these kind of student reading sessions we’ll be presenting four papers and the papers are all posted on the course website. Um, and also further instructions on how to actually prepare your presentation will also- is also posted on Piazza. Uh, and beyond even if you’re not presenting a paper, we highly encourage, uh, you to actually to participate in the discussion of the paper and, and come to class. So while the courses are being recorded, uh, I think that you actually get a lot more from the course, uh, if you actually actively participate in, um, various discussions of papers to better understand the literature and kind of dig into why these papers are good or, or what their limitations are. Okay, um, and also this will change in future offerings. So, uh, in future offerings, we’ll probably be introducing more lectures into the course and fewer student reading sessions, um, as, as years go on. Okay. More details on assignments. So we’re gonna have three homework assignments. The first homework will cover things ranging from multitask data processing and blackbox meta-learning methods. You’ll actually be implementing, um, how to actually kind of go about processing data in a multi-task fashion such that you can apply these types of algorithms to them and also implementing, uh, a black-box meta-learning algorithm. In Homework 2, you’ll be implementing, ah, gradient-based meta-learning algorithms as well as metric learning algorithms. Uh, and then Homework 3, uh, that will be on things like goal condition, reinforcement learning and multitask reinforcement learning. Uh, so you’ll get the opportunity to play around with all these algorithms yourself, uh, and, and see how well they work on various domains including image classification domains, uh, and uh, simulated robotic control domains. Uh, and then, lastly, we’ll have a final project. Uh, so this is gonna be a research level project of your choice. Uh, we encourage you to kind of integrate any kind of research that you’re doing into this, uh, into this research pro- project, uh, for the course, as long as it’s on the topics that are, uh, kind of pertinent to the course, things like multitask learning and meta learning. Uh, and for this you’ll be allowed to form groups of 1-3 students, uh, and you’re welcome to start early. So you’re welcome to come start forming groups now and start thinking about what, um, what you actually want to do for your final project. Uh, for grading, it’ll be 20% on your paper presentation which is in a group, 30% on homework, 10% each as well as 50% on the final project. Okay, um, and then you’re also given five late days across homeworks and your project paper submission that you can kind of allocate as you see fit depending on things coming up. Great, and then the last thing, um, oh, yes, one more thing about the project is that we’ll also be posting kind of guidelines for the project and will have various milestones throughout the course for you to propose your project, kind of, ah, I think like communicate progress that you’ve made on your project and then of course present the final project. Um, the things that you need to do today, um, are sign-up for Piazza if you haven’t already. Um, fill out your paper presentation preferences so I- we have seven paper presentations with four papers each. So fill out your preferences for which of those 28 papers you’re most interested in presenting. Um, do this by Thursday so that we can try to get, uh, kind of assign, assign those as quickly as possible especially for the first presentation which will be happening next, happening next week on Wednesday. Um, and so these- the instructions for the presentations and also how to sign up is on Piazza. And then lastly, uh, start forming final project groups. If you want to work in a group and if you’re not familiar with TensorFlow review, um, review the TensorFlow intro or go to the review session on Thursday. Okay. Anything else about logistics? Any questions? Yeah. Could this be done in PyTorch? So we we’ll be providing the, a lot of infrastructure around the, um, around these kind of different problems in TensorFlow. If you want to do everything in PyTorch, ah, and kind of produce the deliverables that, um, the kind of the curves that you need to deliver and, and produce as part of your write up of the assignment, that is, uh, that’s fine, that’s completely fine. So if you want to only use PyTorch, uh, you can do that but it will probably involve writing a fair amount of code in addition to the code that you’d have to write in TensorFlow because of the, the infrastructure being written in TensorFlow. Yeah. Are all homeworks runnable on like laptops? Would it require a GPU? So the first homework will be runnable on a laptop, the second homework will um, will be a little bit more computer heavy and we’re still looking into various um, various cloud compute options for that, that we’ll be able to provide for people that don’t have, um, GPUs, for example, to be able to run their code on. And the third one will probably be runnable on a laptop as well. Uh, but we’re still finalizing that, uh, assignment. Yep. Will we be using TensorFlow 2 or 1.14? We’ll be using TensorFlow but not TensorFlow 2. [LAUGHTER]. Yeah. What’s the policy on showing final projects [inaudible]. Um, that’s a good question. Let me get back to you on that. I’ll post- we’ll post that on the project guidelines, um, on kind of what we allow in that regard. Okay. Um, two more things. Uh, first ask plenty of questions. You guys are off to a great start. Uh and uh, I mean, this, this course is really for your purpose not for, not for my purpose. Is to sit here- to stand here and, and, and talk. And so if there are things that are unclear, things that you’re kind of curious to learn more about, please, raise your hand and ask questions. Uh, and second, this course is new, uh it’s the first time that, uh, that we’re offering it. Uh, and so, as a result, the course will likely be rough around the edges. Uh, in comparison to courses that are run every single quarter. Um, and so please bear with me and the, and the rest of the course staff. Uh, and this is kind of all the more reason for you to ask plenty of questions, as we, uh, kind of figure out how the course is going to be run throughout the quarter. Okay. Great. So, um, now I’d like to talk about, well, like why should we actually care about multi-task learning and meta-learning and, and what are the sorts of things that you’ll actually be learning in this course? And what I would like to start, uh, this discussion with, is actually talk about some of my own research and some of the reasons why I care about multi-task learning and meta-learning. And then I’ll talk about kind of bro- more broadly where these types of algorithms are being used and why they’re really important and a, and a really fundamental aspect of machine learning research. Okay. So, um, one of the questions, uh, in my research group that we like to think about, is how can we enable agents to learn skills in the real world? Uh, and what I mean by the real world, is I mean robots. Uh, robots like, like these. Robots that can use tools to lift up objects and put them in the bowls, that can play with children’s toys, uh, they can watch, uh, kind of a video of a human doing something and learn from, from that. Uh, and then you may ask, “Well, why robots?” That seems like robots might be a lot of work to, to deal with. Uh, and I think that the reason why studying robots is really exciting is that robots can teach us things about intelligence. I know, okay, this might sound a little silly also because robots are not very smart. Uh, but I think that robots are, are faced with many of the challenges that humans are faced with, uh, as they learn, uh, and develop over the course of their lifetime. Robots are faced with the real world and they have to deal with it. Uh, robots have to be able to generalize across tasks, across objects, across environments, in order to be successful in real world settings. Uh, they need some sort of common sense understanding in order to do well. And lastly, uh, supervision can’t really readily be taken for granted. Um, it’s not easy to just provide labels or even figure out what labels mean in the context of getting a robot to do something. Uh, so I think that if you can build intelligent robots in the real world, uh, you can kind of convince yourself that you’ve solved some important problems regarding intelligence. Though, of co- of course, my goal in this lecture isn’t to convince you to work on robots. Uh, my goal is instead to kind of discuss some of the challenges that come up and, uh, and kind of guide this towards multi-task learning and meta-learning, and to kind of describe where this, um, where this fits in. So I will start with a bit of a story here. So, uh, at the beginning of my PhD, uh, like five or so years ago, um, I was, uh, in a lab at Berkeley. And, uh, I- there was this research project that was going on, oops, the video is not playing. Here we go. There’s a research, research project going on, trying to get the robot to learn how to assemble this toy airplane, by putting the wheels into, uh, the correct position. And the robot here is, uh, is learning how to do the task, uh, through trial and error. At the beginning, the robot has kind of started with very random motions, uh, and over time is able to kinda figure out how to do the task successfully. Uh, so I thought this is, this is really cool. Uh, the robot is actually kind of, you can see it kind of learning really, uh, this is 20x real time. But if you kind of watch the robot in real time, you’d actually be able to see the robot learning in real time, um, through trial and error. And I guess what’s really exciting about this, is the robot isn’t, isn’t just kind of- this isn’t just an algorithm for learning how to do that task, it’s an algorithm for- that can do really a wide range of manipulation tasks. Uh, and so in principle, this algorithm could be applied to a wide range of settings. Uh, now, one of the things that was a little bit disappointing about this is that, uh, it turns out the robots, uh, the robot essentially had its eyes closed. It was doing this, uh, completely blind. Uh, and just kind of essentially blindly trying to get the, uh, get the wheels into the right place. Uh, and so kind of this motivated some of the work that I did at the beginning of my PhD where we wanted to get the robot to do things, uh, while also seeing. Uh, do things that required, uh, vision. In this case, it needed to actually see to figure out where are the shape-sorting cube was in order to figure out where to, where to place it. Um, again, here you can see it kind of learning over time starting, uh, from scratch figuring out how to get the, the block into the respective hole. Um, and so, uh, I guess one thing that I mentioned here, is the kind of the reason why these sorts of algorithms I think are really exciting, is that the same sort of algorithm, the same reinforced learning algorithm, can be applied to many different tasks. So you can’t, uh, you don’t just learn how to insert the block into the shape sorting cube which it, it kind of eventually figures out here. Um, and here’s actually the kind of the final policy where I’m pulling the cube in front of the robot. Um, it can also be applied to learn things like placing the claw over to a hammer underneath a nail or, uh, screwing a cap onto a bottle. Um, or, uh, kind of the most challenging thing that we got to do with this particular algorithm, was use the spatula to lift up an object into a bowl. Uh, and this was actually really challenging, because you see the robot has to be fairly aggressive in maneuvering a spatula underneath the object. Um, so this was really exciting. Uh, at least we’re really excited about some of these results. Uh, and there were- a range of other people that built upon that method as well. Getting robots to do other tasks like, uh, using a hockey stick to, to hit a puck into a goal, opening a door, uh, throwing objects. Uh, and so while it kind of seems like this is pretty exciting and, and indeed at the time it was very exciting, um, we have a bit of an issue. Uh, the issue is that the robot didn’t learn how to use spatulas to lift objects into bowls. It learned how to use that spatula to lift that object into that bowl. And if you give it a different bowl or a different spatula or even a different table cloth in the background, the policy wouldn’t generalize, the policy would fail. Because it was trained in exactly that environment. So this doesn’t necessarily seem like that much of an issue. So you could just say, “Well, why don’t we just give the robot more spatulas, uh, and, and more bowls and more tablecloths and have it just learn in, in more settings?” Um, and this maybe this is kind of an easy problem to solve. Uh, in many ways you’d be right. That would be kind of be a reasonable approach to try to learn a more generalizable policy. But the tricky thing is that this algorithm was designed kind of with the intention of it being a single task learning algorithm in a single environment. Uh, and so let’s kind of look at this. So, uh, behind the scenes, if you look at one of these algorithms, um, especially algorithms that kind of required moving objects in different ways, this is kind of what the learning process looks like. Behind the scenes you- the robot kind of tries to do the task and then human comes in and [LAUGHTER] puts the puck back. Uh, and so by, by nature of kind of focusing on a single task learning problem, the methods aren’t actually scalable to learning more tasks, without actually starting again from scratch. Uh, and having, uh, my friend Yevgen here, kind of repeatedly reset the robot, uh, after each trial. Uh, and so Yevgen here is doing more, more work than the robot arguably. Uh, and it’s also just kind of more generally it’s not practical to collect a lot of data this way. Uh, so to build algorithms that are effective, at learning across many different tasks, we need to in many ways fundamentally rethink how we’re actually, uh, kind of designing these algorithms in the first place. Um, so kind of the issue here was that it relied on, uh, very detailed supervision, very detailed guidance that can’t be scaled to many different tasks. Um, and this is not just a problem with robot learning, it’s also a problem with kind of standard reinforcement learning algorithms as well. Uh, kind of things that learn how to play Atari games and locomotion, they require a lot and a lot of data, a lot of supervision in the form of reward functions, um, in order to learn effectively. And it’s also not just a problem with reinforcement learning. So, um, if you think about some of the, uh, uh, biggest successes in supervised learning systems, are these systems, are able to handle much more diverse data and learn across different users, different languages. Uh, but in many ways, these systems are still learning one task starting from scratch with very detailed supervision. So essentially these systems are what I would call specialists in that they’re trained for kind of in a single task learning setting. Um, instead if we kind of look about, think about well, how could we get a, get a more generalist system? Um, that’s kind of in, in many ways part of what this course is about in some ways. Uh, and kind of it one way to get inspiration for this is we could look at kind of how humans learn. Uh, humans don’t learn in the settings that I mentioned before. Humans learn kind of by rolling around on the floor, by interacting in kind of a very rich and diverse environment. [NOISE] Uh, humans are, are what I would call generalists. Humans, uh, learn many simple skills such as crawling, picking up objects, playing with toys, before trying to learn much more complex tasks. Uh, and perhaps to build machine learning systems that are generalists, we need them to look a bit more like this. Systems that can learn- can build upon previous experience and learn new things more quickly and learn many simple things before trying to learn more complex things. Um, and so if kind of if we make an analogy to, uh, the kind of the way that machine learning systems learn uh, to how humans learn this would kind of be like if we want to train a system to play Go, uh, from scratch, this would be kind of like trying to start from scratch and have a baby learn, uh, how to play Go, for example. Um, and so this seems a little bit sillier, a little bit off, uh, in some ways potentially. Okay. Um, so that was- that’s a bit why some of the reasons why I care about, uh, multi-task learning, uh, and, and meta-learning. Why should, uh, if you don’t- even if you don’t care about robots, why should, why should we generally care about these algorithms? Uh, and fortunately, there’s a lot of reasons to care beyond robots and beyond kind of trying to build more general purpose machine learning systems. Um, and first I want to start by why should we care about deep multi-task learning and meta-learning? Why do we care about deep learning in particular? Um, actually before I move on, are there any questions on the part that I mentioned before on the robots? All right. So, um, if we kind of go back, uh, at this point like, 10 years, uh, the standard approach to computer vision looks something like this. Uh, where you took some image. Uh, you then try to extract some mid-level features- some features of such- what were things called HOG features or SIFT features. These are features that are designed by hand by,- by researchers. Uh, and then on top of those features, you may have more mid-level features like deformable part models, um, kind of as visualized, uh, on the right, and then on top of that you would have a little classifier, uh, like a support vector machine on top of those features. And now if you kind of fast-forward 10 years, uh, until now basically, what modern computer vision looks like, uh, it’s something that is more like this. Something where you take an image and you pass it through a neural network and have it produce the desired output. Uh, and essentially kind of, one of the, uh, most salient differences between these two things is that, deep learning allows us to handle unstructured inputs. It allows us to operate directly on the- the image pixels that are shown on the left. Uh, it allows us to directly operate on things like language, like sensor readings, uh, without requiring hand engineered features like, HOG or SIFT, or Deformable Part Models. Um, and also with less domain knowledge. So it means that we can kind of apply the single class of, uh, of techniques of neural networks essentially to a wide range of problem domains. Uh, and so- so that’s one kind of benefit of- of deep learning systems. And the second benefit of course is that they work really well, uh, in, in, a variety of different situations. So if you look at, uh, results on the ImageNet dataset over the course of around five years, uh, look at the error rate, um, this dot right here is- is AlexNet, uh, in- in 2012. This is, uh, the, uh, in many ways one of the first neural networks that was successfully demonstrated on the ImageNet dataset. And before that, things were plateauing around, uh, 0.25, 0.3, uh, and then kind there was this really- this mode shift, uh, this kind of mode transition. Where after 2012, I think most if not all of the, the dots are- are deep learning models. Uh, and of course were- were way more successful than, uh, than previous approaches that were relying on hand-engineered features. Uh, and this isn’t specific to computer vision. So, uh, in machine translation, for example, um, in 2016, Google, uh, started switching Google Translate to a, uh, a- a system based on neural networks. Um, in this case PBMT is Phrase-Based Machine Translation whereas, uh, GNMT, uh, or NMT more broadly neural machine translation you start- starts using neural networks. Uh, and we see a pretty big difference. Um, this is showing the human evaluation scores. Uh, we see kind of ranging from 50% to 80% improvements by using neural networks. Okay, so this is why deep learning, uh, and kind of in-in two slides, there are other reasons as well. Um, now why the multitask learning and meta-learning? So um, if there’s something that we’ve learned from deep learning is that we can- if we kind of give these neural networks large and diverse datasets, they can achieve broad generalization. Um, so for example, the ImageNet dataset, uh, or things like like Transformers, GPT-2, these have been kind of extremely successful at producing models that generalize across, uh, across many different images across many different inputs. But the caveat is here is that they require a large and diverse data set. Uh, and what have you don’t have a large data set? uh, then you’re- you’re in a bit of trouble. These are domains, and there are a wide range of domains where we don’t have large datasets. Things like medical imaging, like robotics like I mentioned before, like education for you don’t have a lot of data for each individual student that you’re trying to teach. Uh, medicine recommendation systems, uh, translation systems. You don’t have a lot of paired data for every single pair of languages that’s out there. Uh, and so, as a result, it’s impractical to learn from scratch for each disease, for each robot, for each person, uh, for each language really for each task. And so instead, that’s where, uh, these multitask learning techniques can come in. Uh, and okay, so beyond the kind of settings where we don’t have a large dataset, what if your data has a long tail? Um, what if your data distribution looks something like this, where the y-axis is showing the number of data points and the x-axis is showing different objects that you’ve encountered in the world or different interactions with people or words that you’ve heard over time or driving sit-situations, uh, etc. When you’re in kind of a- a variety of situations on the left you’re kind of in good shape. This looks a lot like what you’ve seen before. Whereas if you’re on the right- in- in the kind of the tail on the right, that’s where these algorithms start to break down. Where supervised learning methods really struggle to perform well. And this is- for example, this is a really big problem in autonomous driving where cars can handle a wide variety of very common situations but when they see very weird situations, uh, humans can handle them perfectly well but these cars, uh, really struggle. Okay, um, and lastly, what if you need to quickly learn something new? Uh, what if you had previous knowledge and you wanna learn- learning something new, uh, about a person, for a new task, about a new environment, uh, without training from scratch? Uh, It turns out that people are pretty good at this. Uh, and in particular, uh, let’s see how good you are at it. So I want to give you a little test. Um, the left side is showing your training data. Uh, there are six data points. Uh, you have three paintings that are from Braque and three paintings that are from Cezanne. And your goal is to be able to classify the test data point that is on the right. And so, if you think the painting on the right is from Braque, raise your hand. And if you think it is from Cezanne, raise your hand. Great, okay. I think that there are more for Braque which is the correct answer. Um, and the way that you can tell that, is you can kind of look at the the types of kind of straight lines and curved lines that are prominent in- in the image. Uh, and so, how did you actually go about doing this? So this is the kind of problem that’s known as “few-shot learning” where you’re only given a few data points, in this case six data points, and your goal is to make predictions about new data points from that very small data-set. Um, the way that you accomplish this is that you weren’t learning from scratch. You have previous experience. You haven’t probably seen these exact paintings before and you probably haven’t necessarily even seen paintings from these painters before. But you’ve seen paintings before, you’ve probably seen, um, kind of you know what objects are, you know what textures are. Uh, and through that previous experience, you’re able to quickly identify, uh, the painting corresponding to the, um- corresponding to the correct painter. Okay. Um, So all of these things if you want more general-purpose machine learning systems, if you don’t have large datasets, if your data has long tails, if you want to be able to quickly learn something new, um, these are all settings where elements of multi-task learning and meta-learning can come into play, uh, and can help us out, and basically make machine learning, uh, more effective in these problem settings. Any questions on- on these four things before I move on? [BACKGROUND]. Okay, um, so, I’ve been talking a lot about multi-task learning. Um, but what is- what is a task? Uh, and this is- this actually is really important, because a task isn’t necessarily what we kind of associate with- with in kind of the English language. Um, for now the way I’m going to define a task is, something that takes us in a data-set- takes as input a data set and the loss function and gives you a model. Um, this is- we’re going to kind of generalize this later and make it potentially a bit more formal later in the course. Um, but for now, this is what we’re going to be considering. Essentially you can view a task as a machine learning problem, where you have some data, have some loss function, you want to optimize a loss function in order to produce a model. And so what this means is that if you have different tasks, different tasks could vary in different ways. It could vary for different objects like maybe, um, one task is to kind of be able to classify between one type of cat and another type of cat whereas another type-a-another task corresponds to being able to classify between different types of water bottles. So in this case, different tasks would correspond to different objects. Uh, they could also correspond to different people. If you want to be, uh, able to kind of personalize these systems, to be able to handle and effectively make predictions about, uh, new users. They can correspond to different objectives. Uh, maybe in one case you want to be able to classify, uh, the, um, classify someone’s age versus you wanna be able to classify someone’s height from- from an image. Those are kind of different objectives that you want to be able to accomplish. Could be something like different lighting conditions, uh, of your model, different words that you’re encountering, different languages that you’re encountering. Um, so it can encapsulate kind of a wide range of different things. So not just different tasks that you would think of a task as, uh, in kind of in English. Uh, really it can vary in a wide range of ways. Questions on that? It’s like a different distribution of the dataset and the loss function? Yeah, so it could correspond to a different- really different dataset or a different loss function or both. Um, so for example, in the case where you have different objects that would manifest in potentially the same loss function but like be able to match labels but a different dataset. Um, whereas something that looks like uh, different objectives like classifying between one thing versus another thing that might be some- something that looks like different loss functions with the same datasets. In this case, when we’re talking about the loss function we’re not talking about the cross-entropy, we’re talking about brochures versus datasets? Y- ye- yes I think that, that is the case. You could also have one task that has kind of like a cross entropy loss and another task that has like a mean squared error loss for example. Although of course you can generalize both of those as like log-likelihood. It’s okay. I think we are talking about different tasks here, we are not talking about that. Yeah. Okay. Yeah. [inaudible] going to be called as a task. Let’s say I have a network which is going from circuit classification through the circuit. Given a new specification, can you say that’s a new task? [NOISE] Yeah, so you could say that, that, that is that is precisely a new task. You could also- I think that there’s kind of this fluid notion of what is a new task where it’s kind of lumped into a single task learning problem and I’ll actually talk about that kind of morph- that fluidness in a couple of slides, but that is kind of one thing that you could view as different tasks. All right, so in multi-task learning and meta-learning, there’s one critical assumption. Uh, this is kind of where some of the bad news comes in, which is that different tasks need to share some structure in order to get a benefit from these algorithms. If the tasks don’t share, if the tasks you’re trying to learn across don’t share any structure, then you’re better off just using single task learning independently on each of those tasks and kind of then morphing those into a single model if you want to produce a single model. The good news is though, is that there are many tasks. And task distributions that have shared structure even if they don’t kind of on the surface appear to have shared structure. So, uh, and one kind of simple example here something like screwing a cap on to a lid versus screwing a bottle cap versus screwing, uh, kind of a pepper, grinder, all share a similar structure in terms of the underlying motion that needs to be performed. This is maybe an example where the shared structure is more explicit, uh, and even if there are tasks that are seemingly unrelated, uh, there is things that are still underlying. So there is structure that is still underlying those tasks because the laws of physics are, are kind of underlying the real data that, that we have, uh, people are all organisms that have intentions and so even if two people are very different, they still have some commonalities. The rules of English are underlying all of like- well not all but a fair amount of English-language data, uh, and languages are kind of developed for similar purposes and, and, and so on. So and these kind of may seem like superficial, uh, relationships between different tasks, but in reality this leads to a far greater structure than having completely random tasks, uh, because completely random tasks would essentially look like completely ran- random inputs, random labels, uh, and in practice kind of the real world is underlying, uh, a lot of the data that we’re looking at. Yeah. Are these, uh, assumptions sort of [inaudible] multitask- I mean when you separate meta-learning and multi-tasking will you kind of get different things or [OVERLAPPING]. It applies to meta-learning as well. Yeah. So I’ll give problem definitions of multi-tasking and meta-learning on the next slide, but in essence like in order- meta-learning is all about kind of learning the structure of underlying tasks such that you can more quickly learn a new task, and if you can’t- if there isn’t any shared structure then you won’t be able to learn more quickly than learning from scratch. All right. So let’s informally go over some of the problem definitions in the coming lectures. Actually in the next lecture we’ll formalize these lectures, these definitions, these problem definitions a lot more, but just kind of just to give you a rough idea of what I’m talking about with these, with these kinds of problem definitions, here’s kind of roughly what things look like. So you can think of the multi-task learning problem as trying to learn all of the tasks that you’re provided with more quickly or more proficiently than learning the tasks independently from one another. And then what the meta-learning problem is looking at is given data or experience on a set of previous tasks that you’re given. You want to be able to learn a new task more quickly and/or more proficiently than learning from scratch by leveraging your experience on the previous tasks. So essentially the difference between these two things is that the first one you’re trying to learn a set of tasks and do well on those training tasks, and then in the second problem setting, you’re trying to use experience on training tasks in order to do well at new tasks, in order to basically be able to more quickly learn new tasks given a dataset. Uh, and in this course, we won’t necessarily be covering everything that’s considered a multi-task learning algorithm or a meta-learning algorithm. That will be, really, anything that solves one of these two problem statements will be fair game for including in the course. Uh, so things that allow you to build on previous experience to quickly learn new tasks even if they aren’t through learning to learn techniques, um, I’ll try to touch on them in this course. Questions on these problem statements. Yeah. How is meta-learning different from transfer learning in [inaudible]. Yeah. So I guess in many ways I think that this is the tran- a form of the transfer learning problem statement where you wanna take some data and use that- use knowledge acquired from that data to do well at other tasks. I think that one aspect about this problem statement is that you want to be able to learn a new task more quickly, whereas in transfer learning you may also want to be able to just form a well- performing a new task well while in zero shot where you kind of just want to share representations. I actually kind of view transfer learning as something that encapsulates both of these things, uh, where you’re thinking about how you can transfer information between different tasks and that could actually also correspond to the multitask learning problem, uh, as well as the meta-learning problem. Yeah. I thought meta-learning was kind of like learning to learn. Would you say that is kind of like consequence of that definition or kind of like [inaudible]. Yeah that’s a good question. So I guess what I’m defining here is the, the meta-learning problem and I think that meta-learning algorithms are all learning to learn and they solve this particular problem. Uh, and they’re not the only way to solve this particular problem. Does that answer your question? There could be no meta-learning with one single task. Uh, that’s a good point. Yeah, so in, in principle, you could still perform meta learning in the context of a single task and what you’ll be doing in that case is probably- actually, in some ways breaking down that single task into sub-tasks or into kind of sub-components, uh, and then using that when kind of- when you’re facing something new in that single task, using that experience to more quickly learn in the future. Yeah, so that’s a good point. The tasks in some ways could be something that’s kind of latent to your underlying problem. Yeah. The meta-learning problem strikes me as quite similar to the problem of domain adaptation. Uh-um. Are they the same would you say or are there some clear distinction? Yeah, so I’ll formally cover the distinction with domain adaptation in the next lecture, but, um, in some ways they are similar. I guess in some ways, uh, one is more specific than the other and in some ways it’s kind of the opposite. So in domain adaptation, um, you typically do want to, it’s kind of a form of transfer learning in some ways where you want to transfer from one to another. Um, one thing and I guess when I get into the more formal definitions of these problems, this will become more clear. One thing you typically see in the meta-learning problem is that the tasks that you’re seeing at test time you assume to be in the distribution of the tasks that you’re seeing it during training, whereas many techniques in domain adaptation are considering a setting where your task domain may be out of distribution from what you’re seeing during training, uh, and yeah, so that’s, that’s in, in many ways one of those distinctions there. Okay. Um, now one question that was asked a bit before is kind of- I think it, it was in the context of circuits is what, um, is it- is, is something a single-task learning problem or multi-task learning problem? Uh, in some ways it’s gets- gets out of the question of, doesn’t multitask learning reduce to just a single-task learning problem? Uh, and in particular what you could do is you could just say, “Okay, I have a dataset for each task, and we’re going to take the union of those datasets into, uh, a single dataset. And likewise, I’ll take the loss function for each and just kind of sum and get a loss function. And now we have a single task learning problem, uh, where we have one dataset and one loss function. Uh, and we’re done [LAUGHTER]. Um, and, uh, in many ways, uh, yes, uh, it [LAUGHTER] aggregating the data across the task and- and learning is- is one very, uh, successful approach for multi-task learning. But we can often do better. And in particular, we can exploit the fact that we know the data is coming from different tasks, uh, and use this to achieve greater performance. And basically, exploit that structure in the optimization in order to perform better. Okay. Um, now, I think the- one of the last things I want to cover is- is why now? Why should we be- be studying this topic now, uh, rather than, uh, in- in 10 years or 10 years ago? Um, well, people were actually studying this problem a long time ago. Uh, so this is 12 years ago at this point. Um, 22 years ago at this point [LAUGHTER], and, uh, this is by a survey in which Caruana’s thinking about how we can train tasks in parallel while using shared representations. Uh, they, uh, can do multitask inductive transfer, and add extra tasks to a backpropagation neural network. Uh, in 1998, Sebastian Thrun was thinking about this problem, where we want to be able to, uh, exploit an enormous amount of training data and experiences that stem from, uh, from one another. Um, stem from other related learning tasks in order to generalize to new tasks, uh, even from a single training example. Uh, and, uh, actually even earlier, uh, in 1992, Samy Bengio and Yoshua, and others were looking about the possibility of learning, uh, a learning rule that can be used to solve new tasks. Uh, so these ideas are, are by no means new. Um, people have been kind of studying these, uh, for, for a very long time at this point. Um, but I think that right now it’s actually a particularly exciting, uh, time to be studying these algorithms because they think that they’re studying, and actually continuing to play a fundamental role in machine learning research, uh, especially with the advent of powerful neural network function approximators. The amount of compute that we have right now, as well as the- the kind of datasets that we’re looking at. Um, and so, so as some examples of very recent works that have kind of leveraged some of these algorithms to do well. Um, here’s a paper, uh, looking at machine translation across over 100 languages. Uh, thinking about how you can learn, um, algorithm things- algorithms surpass the strong, uh, base- baselines that use only two languages. Um, here’s some work from my lab that actually previewed for at the beginning is, uh, we can kind of use these types of algorithms to learn from a video of a human. So uh, this is actually one of your TAs who’s, uh, showing a video to the robot performing a task, and then the robot can use that video to learn, um, learn a policy that can successfully place the peach into the bowl just for that single example, and the policy can generalize to different positions of the- of the bowl. So what- we’ll cover kind of the algorithms underlying this. And people have also looked at, uh, multi-domain learning. So in here, different tasks would correspond to different domains. Um, and they- uh, this paper they constructed different simulated domains with different textures, different environments, and they showed that they could use only this data in simulation in order to enable a, uh, a quadcopter to fly in the real world, uh, and navigate in the real world. Um, and actually just like within the past two days, this paper was published on how YouTube is using multitask object- multitask and multi-objective systems in order to make recommendations, um, and being developing algorithms that can handle multiple competing objectives. Um, so I think that these, these types of algorithms are playing a huge role, uh, in robotics in- in kind of deployed YouTube machine learning systems, uh, as well as in other research. Uh, I also think that, uh, they’re playing an increasing role in, uh, machine learning research. So if you look at, uh, Google search queries over the course of the past few years, we see a trend that looks like this where blue is meta-learning and red as multitask learning. Uh, and we see an increase starting around 2014 and 2015. Um, and if we also look at kind of paper citations for things that cover things like fine tuning, we see an increasing trend, uh, meta-learning algorithms, uh, as well as multi-task learning algorithm. We see that these algorithms are, um, becoming of increasing interest. And I think that’s because these algorithms could be really important in the future for enabling things like learning from small datasets, uh, etc. Uh, and lastly, I think that the success of multitask learning algorithms and meta-learning algorithms will be really critical for making deep learning more widely accessible. Uh, as I mentioned before, the kind of settings where deep learning has been very successful before is settings where you have 1.2 million images, 40.8 million paired sentences, 300 hours of labeled data, uh, and in a wide range of settings that’s just not feasible. So if we look at for example, um, a, a diabetic retinopathy detection dataset, it has around 35,000 labeled datasets, I think labeled images. I think this is actually one of the larger medical image datasets, um, and yet it still has two orders of magnitude less data than kind of the datasets on the left- on the top [NOISE]. Um, there was a really interesting paper that was looking at reinforcement learning for epilepsy treatment. Uh, in that case they had less than an hour of data. Uh, and in some of the work that we’ve done in robotics, we’ve had less than 15 minutes of data per individual task that we want the robot to learn. So if you care about it kinda making these- making deep learning algorithm successive, deep learning more widely accessible to these types of domains, then it’s gonna be critical to build these kinds of algorithms. Um, and lastly, beyond, um, the things that I’ve talked about, there are still many open questions and challenges in multitask learning. And I think that makes it a really exciting thing to study right now because there’s a lot of problems to be solved. Great. Um, so that’s it for today as a reminder, please do these four things. Please sign up for paper presentations, um, sign up for Piazza, um, etc. And on Wednesday, I’ll be covering multitask learning and meta-learning basics. I’ll see you on Wednesday.

Leave a Reply

Your email address will not be published. Required fields are marked *