Why take YData? Because data science shouldn’t be a ‘black box’

A young woman looks into an open book, and is bombarded by light, math, DNA sequencing, and other data.
(Illustration by Michael S. Helfenbein)

I don’t want students leaving Yale and going through life with a feeling that everything powered by big data and algorithms is a black box,” said Alan Gerber, dean of social science and the Charles C. & Dorathea S. Dilley Professor of Political Science. Gerber said he believes the new Yale course, “YData: An Introduction to Data Science,” will help to “demystify” data science for many students.

Next semester, YData debuts as a three-day-a-week, introductory-level lecture co-taught by Jessi Cisewski, assistant professor of statistics and data science, and John Lafferty, the John C. Malone Professor of Statistics & Data Science. The inspiration for YData — a highly interdisciplinary, introductory data science course called Data8 — debuted at the University of California-Berkeley in 2015 and quickly became one of Berkeley’s most popular courses. Much like Data8, YData is also being billed as a data science course “for everyone.”

Cisewski said she and Lafferty hope to make YData “an unintimidating course for students who don’t have a background in statistics, math, or computer science and aren’t necessarily interested in pursuing that as their degree.” Although, she added, of course it would be great if the class inspires a student to pursue the newly revamped statistics and data science (DS2) as a major.

The class does offer useful foundations for those who are already considering statistics or data science as a course of study, but YData will teach traditional statistics and data science concepts using computing to simplify the math and minimize rote memorization of formulas.

Prospective students are only expected to have high school-level math experience and will be taught Python 3 computing over the course of the semester. The professors have partnered with Ben Evans from Yale’s Center for Research Computing to make the computing interface as seamless and user-friendly as possible for the new coders.

YData has been tailored to fit Yale’s scale and course style. Where Berkeley’s Data8 now offers more than a dozen one-credit “modules” per semester — abbreviated courses that apply data science topics in different fields of study — in its first year YData will offer three half-credit seminars for students to take concurrently with the main course on diverse topics in applied data science.

Joshua Kalla, a new assistant professor of political science at Yale, will join Cisewski and Lafferty in instructing the initial YData seminars. Kalla’s section will focus on the use of data science for political campaigns while Cisewski and Lafferty will teach on how data science is used in exoplanet astronomy and analyzing electronic sources of written text, respectively. Cisewski says the hope is to eventually “build a data science hub at Yale around this class” so that more YData seminars can be offered across a wider array of academic disciplines.  

The seminars will meet once a week and function as “half-lab, half-seminar,” in Cisewski’s words. “They might teach part of the time about the topic as it relates to data science and then put that into practice with relevant datasets,” she said.

I love these connector courses for YData,” said Gerber. “They reinforce the main ideas, themes, and techniques learned in the core class by allowing students a second chance to apply them in an area of substantive interest. They’ll give students a much higher level of mastery and understanding of the material.”

However, the core YData lecture will still provide ample opportunities for students to practice applying the statistics and data science concepts the professors cover. In their work for the course, students might be asked to consider, for example, how to create a representative sample of the Yale population to survey, or how to account for randomness in controlled human drug trials, or even how to fine-tune algorithms in targeted online marketing to ensure ads are reaching their intended audience.

Cisewski said she hopes that “students will also have the takeaway of knowing how to incorporate data science methods better into their later research projects, papers, theses,” regardless of their ultimate field of study. “We want to teach them the data science way of thinking,” she said, adding that whether the data-driven information comes from news articles or scientific studies, she expects students who’ve taken the class to have the ability to take “a critical look at the information that they’re given or bombarded with in their daily lives.”

For more information about YData, click here to visit the course’s webpage. If you are a prospective student of the course, search for “YData: An Introduction to Data Science” in the Yale Blue Book under the codes S&DS 123 or 523.

Media Contact

Kendall Teare: kendall.teare@yale.edu, 203-836-4226