Inaugural Data Science Workshop highlights social sciences

The daylong event reflects Yale's increased commitment to becoming a center for machine learning and data science.
Abstract drawing of streams of data


From quantifying the wisdom of crowds to measuring polarization in political speech, data science is enhancing a wide range of social science research.

On Oct. 20, Yale held its first Data Science Workshop, a daylong event focused on computational social science. Computer science professor Dragomir Radev and other organizers said the workshop reflected Yale’s increased commitment to data science, and may become part of an ongoing series of workshops. Earlier this year, Yale renamed its statistics department as the Department of Statistics and Data Science and launched a new undergraduate major in statistics and data science.

The goal is to make Yale one of the centers of machine learning and data science,” said department chair Harrison Zhou.

Daniel Spielman, Yale’s Henry Ford II Professor of Computer Science and Statistics and Data Science, pointed out that data science is being used in teaching and research across disciplines at Yale, including social sciences. “Computational social science has a large intersection with data science,” he said. “Neither one of them subsumes the other.”

Jenn Wortman Vaughan of Microsoft Research gave the workshop’s first presentation, on the human contributions within machine learning. She explained that although most media reports on machine learning take on a “machines versus humans” tone, the more important story is how humans and machines can “work together to achieve more than either could achieve on their own.”

Photo of Professor Brian Scassellati speaking in front of a blackboard
Yale's Brian Scassellati discussed his work with socially assistive robots that help with a variety of human needs.

Vaughan said humans provide much of the data involved in machine learning (political polls, prediction markets), participate in crowdsourcing work (via such avenues as Amazon Mechanical Turk), and are the ones interpreting the results (everyone from CEOs and government regulators, to data scientists and the general public).

We have humans in the loop — and it matters that we have them in the loop,” she said.

Several sessions at the workshop focused on language: Yale linguistics professor Claire Bowern shared information about how data science is helping her understand the origins and spread of languages in Australia. Ryan Cotterell of Johns Hopkins University talked about using linguistic typology in developing new languages, with references to Google Translate and even the Dothraki language from “Game of Thrones.”

Jesse Shapiro of Brown University gave a presentation on “Measuring Polarization in High-Dimensional Data” within congressional speech; Brendan O’Connor of the University of Massachusetts-Amherst discussed taking a computational approach to searching news archives for data on police killings of civilians in the U.S.; Duncan Learmouth of Durham University demonstrated how he is using data science to understand cultural variation in Australian Aboriginal societies.

Vineet Kumar, assistant professor of marketing at the Yale School of Management, shared his research on whether restricting information makes crowds more accurate or less accurate. Such data can be a boon for developing digital products and services, he noted.

The workshop also offered new opportunities for data science collaboration. For example, Yale’s Brian Scassellati, professor of computer science, cognitive science, and mechanical engineering, described his ongoing research with socially assistive robots.

Beginning with a series of studies about how robots interact with children who have autism spectrum disorder, Scassellati’s lab has spent years conducting research on how socially assistive robots can help humans. That research now includes robots helping to teach nutrition and sign language. Scassellati’s work on autism continues, as well, with an ongoing effort to place 30 robots in home settings for 30 days.

We’re in the middle of our largest collection of data to date,” Scassellati told the workshop participants. “It’s more data than I know what to do with.”

Scassellati said he would welcome ideas and assistance from data scientists in analyzing the terabytes of information he is gathering from his research. “Come help us,” he encouraged.

Share this with Facebook Share this with X Share this with LinkedIn Share this with Email Print this

Media Contact

Jim Shelton:, 203-361-8332