Data science at Yale takes shape
As Yale embraces the future of data-based research, the Department of Statistics and Data Science is busy building up its program, with more students, more faculty, a new major, and an assortment of new classes.
Over the past two years, since the former Department of Statistics expanded into what is now called DS2, its leaders have assembled faculty with a wide-ranging body of research and academic expertise. Their collaborations delve into fields as varied as astrophysics, genetics, forestry, engineering, economics, computer science, radiology, mathematics, and law.
It’s no accident. If data can be found in every part of Yale, DS2 leaders say, then DS2 should be there as well.
“Like many things at Yale, it has started with teaching and education,” said John Lafferty, the John C. Malone Professor of Statistics and Data Science, who joined the department in 2017. “The new Statistics and Data Science major has been popular right from the start when it was introduced last year. Going forward, Yale is embracing the interdisciplinary nature of data science. It will be important to join forces across traditional departmental boundaries to advance data science and its application in different fields.”
The first DS2 undergraduate degrees were awarded last year, with 25 seniors. This year there are 45 seniors and 120 undergraduates overall. Just a month ago, DS2 organized its first “Project Pitch,” in which 10 faculty members from other departments pitched potential research projects to DS2 graduate and undergraduate students — an event the department wants to do every semester.
“Everyone is engaged, and part of the reason for that is because our students are so excited,” said acting department chair Daniel Spielman, Sterling Professor of Computer Science, and professor of statistics and data science, and of mathematics. “Our senior undergraduates are helping advise the junior undergrads, and they’re giving us great feedback on the courses. Two of our undergraduates set up a system to help us hire our undergraduate learning assistants this year. Everyone is stepping up.”
Sekhar Tatikonda, an associate professor in the department and director of undergraduate studies, said DS2 students are “creating their own culture. They are studying together, organizing activities, and it’s been wonderful to see.”
Meanwhile, the faculty continues to grow. The three most recent additions are:
- Roy Lederman, an assistant professor, rejoined the Yale community this year with a dual appointment in DS2 and the Quantitative Biology Institute. He earned his Ph.D. at Yale in 2015. One of Lederman’s main research areas is developing algorithms for extracting information from imaging technologies, such as cryo-electron microscopy.
- Elisa Celis, an assistant professor, arrives at Yale this semester after a stint as research scientist at the École Polytechnique Fédérale de Lausanne. Her expertise involves issues of fairness, accountability, and transparency in machine learning; part of her work has been to design algorithms that promote fairness. One of her algorithms is being used in an election in Switzerland to ensure diversity in the ages and gender of people elected to Parliament.
- Zhou Fan, an assistant professor, comes to Yale from Stanford University, where he was a Ph.D. student in statistics. Fan is a theoretical statistician who also focuses on statistical applications in genetics and computational biology.
Spielman said the new hires represent two important goals for the department: bringing in faculty who enhance data science as a discipline by developing new methods and ways of analyzing data, and expanding the department’s expertise in different application areas.
Along with the new faculty, there are new courses enlivening data science at Yale.
One prominent example is YData, an introductory undergraduate course that will debut in 2019. Taught by assistant professor Jessi Cisewski and Lafferty, the class will offer students from all disciplines a core understanding of data science tools and how to use them.
A variety of other courses are being rolled out or revamped as well. There are plans for lab courses devoted to astronomy, political science, and text data; lecturer Susan Wang has received rave reviews for her course on case studies in data science; Lederman is creating a new course called computational mathematics in data science; department chair Harry Zhou, who has been on leave, will teach a class on neural nets; and Lafferty, whose expertise includes innovative work in machine learning, continues to expand Yale’s machine learning course offerings.
“The thoughtful, deliberate, but intensive way that Yale is approaching data science is very exciting,” Lafferty said. “After meeting with faculty from many different departments and schools, my colleagues and I have been amazed by how much researchers already know about data science, including many aspects of machine learning. We all have a lot to talk about, but we need to get outside of our traditional departments to engage. The energy and excitement around this is palpable.”
Indeed, another sign of Yale’s interest is the launch earlier this year of the Center for Biomedical Data Science (CBDS), which will be a hub for innovation, research, and education in biomedical data science at Yale. CBDS will be responsible for taking in large amounts of data from various sources around the university and hospital and performing analytics — from statistical associations to complex computational models — for biomedical data at molecular, cellular, organismic, individual, and populational levels.
Earlier this year, a university committee named data science as a priority for investment in science research at Yale. The University Science Strategy Committee recommended that Yale invest in a university-wide initiative to integrate data science and mathematical modeling research across campus.
Together, these efforts are likely to impact students and researchers at Yale in a variety of ways, Lafferty explained: “I think it will be a mixture. On the one hand, there will be students and researchers who adopt advanced machine learning methods, push the envelope, and become quite sophisticated in their use of data science approaches to scientific questions, or to problem-solving in the humanities, medicine, etc. This is already happening in parts of physics, for example. But another mode will be the more gradual adoption of quantitative, data-centric methods as a standard way of carrying out research. There will be a shift in culture.”
Spielman said that culture shift is already starting to happen. He noted that in short order, DS2 has gone from a department that could fit in one small building to one that is spread over three buildings.
“What most universities have done in data science is build separate research institutes and centers. We’re pretty unusual in that we have started by building a department with an undergraduate program and a Ph.D. program,” Spielman said. “There’s no fighting here over who should own data science. Everyone is just helping.”