April was an exciting month for Skyline College’s data science students who showcased their hard work and data analysis skills in the DataJam community college pilot and the national Two-Year College DataFest competition.
The DataJam is a semester-long data for good project where teams work with local and state-wide open data under the mentorship of university students. This project was supported by Dr. Judy Cameron from The DataJam organization and Anthony Suen from UC Berkeley’s Data Science Discovery Program. Skyline College students worked hard throughout the semester with weekly meetings with UC Berkeley and San Francisco State University mentors leveraging data science techniques to tackle transportation and social issues in our community. What has been a program for high school students, DataJam was piloted for the first time with community college students with three teams from Skyline College and another three from Oakland’s Laney College.
The “Wednesday Team,” Ashley Gutierrez Carreto, Dasha Baitazarova, Tyler Kuwada, and Hansen Xiao analyzed socioeconomic factors affecting crime rates in San Francisco. As Hansen shared, “It was a great experience going through the entire data analysis process, beginning with developing a research question and finding data sets and being able to apply many data science concepts to the real world.” Teammate Tyler added, “Applying my data skills in a real team environment like this was a good learning experience. Learning these skills in class is one thing. Figuring out how to apply them to a real dataset is another.”
Not only did DataJam teams present their work to a panel of judges – seasoned data science experts in industry and academia, some also presented at Skyline College’s uSOAR research symposium on April 19 and UC Berkeley’s Discovery Symposium on May 1. Computer science major Dasha reflected, “Presenting our findings at UC Berkeley’s Discovery Symposium and uSOAR was both exciting and beneficial for my public speaking and presentation skills.” Ashley added, “I was excited to present our findings to audiences like DataJam and UC Berkeley as we saw multiple inspiring projects for future data science work.” A bonus was meeting their mentors in person and members of Laney College’s DataJam teams.
The “Tuesday Team,” Camille Catolos and Zaw San used San Francisco open data to analyze factors impacting Muni ridership. “I learned about time and team management, which I can apply to future projects!” reflected Camille.
Skyline College’s “Friday Team,” with members Amapola Garcia, Pan Cho Thu, Eden Huang, Veronica Joanhson, and Yuting Duan examined barriers to equitable electric vehicle charging station distribution in San Mateo County. “We faced a lot of uncertainty over the semester about the direction of the project, but we worked hard and were able to pull everything together to win first place,” said Veronica Johnson. Teammate and data science major Eden Huang said, “Presenting at uSOAR and the UC Berkeley Data Science Discovery Symposium was a huge bonus in terms of experience, allowing us to practice our presentation skills and summarize our research in a short time.”
Simultaneously, three Skyline teams from the Data Science Club immersed themselves in the DataFest—an intensive three-day competition from the American Statistical Association on April 19 – 21. Students analyzed, visualized, and modeled a dataset about the CourseKata online statistics course learning platform to improve learning and the online learning experience for statistics and data science students.
Computer science major Yuting also participated in DataFest and learned some new skills for the competition. She said, “We learned a lot about how to deal with huge datasets that are too big to be uploaded to GitHub and how to merge different datasets to come up with useful visualizations.” PythonPuppies teammate Leo Lin noted, “DataFest was my second Data Science event that I had participated in, and as a team, we won the award in the category of Best Visualization.”
While DataFests are hosted nationwide during April, having a competition specifically for community college students made it welcoming to students at all levels of data science and statistics experience. Computer science major Chris Tse said, “This was a great learning experience where I was able to learn how to shape data in a way that was consumable for analysis.” Antal Berenyi added, “I did my best: finish the project on time and submit all required material. I didn’t win, but it motivated me to do another project!”
DataFest, DataJam, and the beginner-friendly DataLab Python for Data Analysis workshop series are just a few of the data science and coding opportunities for both new and seasoned statistics and data science students at Skyline College. Amapola, who participated in both DataJam and DataFest said, “This year I got to work with many of my peers on various projects together. Throughout this semester, I participated in Hack for Impact, SFSU Hackathon, DataJam, DataFest, and our very first Data analysis project hosted by Our Data Science Club. For each project, I got to work on different programming languages and tools. For anyone thinking of starting a project, I say take the initiative and just do it. You’ll be amazed at what you can do, and it’s okay to be intimidated at first. You’ll find what works for you.”
Furthermore, these opportunities have helped students get internships. Yuting, a first-year college student brand new to data science last semester, credits these opportunities for her upcoming summer internship at San Jose State University to work on deep learning, a method in artificial intelligence (AI). She said, “The biggest reason I am selected as an intern is that the mentors want some computer science majors who know Python to do data analysis so that they can train the AI model. I never thought I could get an internship and win a prize in the first year in a new country and study environment.”
This work is supported by grants from the National Science Foundation through the Improving Undergraduate STEM Education: Education and Human Resources under award number 2021488. The California Education Learning Lab Grand Challenge INCLUDES the project and San Mateo County Measure K.
Article by Denise Hum