{"id":979917,"date":"2023-10-26T13:45:23","date_gmt":"2023-10-26T20:45:23","guid":{"rendered":"https:\/\/www.microsoft.com\/en-us\/research\/?post_type=msr-research-item&#038;p=979917"},"modified":"2023-11-27T13:19:41","modified_gmt":"2023-11-27T21:19:41","slug":"investigating-student-mistakes-in-introductory-data-science-programming","status":"publish","type":"msr-research-item","link":"https:\/\/www.microsoft.com\/en-us\/research\/publication\/investigating-student-mistakes-in-introductory-data-science-programming\/","title":{"rendered":"Investigating Student Mistakes in Introductory Data Science Programming"},"content":{"rendered":"<p><span dir=\"ltr\" role=\"presentation\">Data Science (DS) has emerged as a new academic discipline where <\/span><span dir=\"ltr\" role=\"presentation\">students are introduced to data-centric thinking and generating <\/span><span dir=\"ltr\" role=\"presentation\">data-driven insights through programming. Unlike traditional intro<\/span><span dir=\"ltr\" role=\"presentation\">ductory programming education, which focuses on program syntax <\/span><span dir=\"ltr\" role=\"presentation\">and core\u00a0 computer Science (CS) topics (e.g., algorithms and data <\/span><span dir=\"ltr\" role=\"presentation\">structures), introductory DS education emphasizes skills such as <\/span><span dir=\"ltr\" role=\"presentation\">studying the data at hand to gain insights and making effective use <\/span><span dir=\"ltr\" role=\"presentation\">of programming libraries (e.g.,<\/span> <span dir=\"ltr\" role=\"presentation\">re<\/span><span dir=\"ltr\" role=\"presentation\">,<\/span> <span dir=\"ltr\" role=\"presentation\">NumPy<\/span><span dir=\"ltr\" role=\"presentation\">,<\/span> <span dir=\"ltr\" role=\"presentation\">pandas<\/span><span dir=\"ltr\" role=\"presentation\">,<\/span> <span dir=\"ltr\" role=\"presentation\">scikit-learn<\/span><span dir=\"ltr\" role=\"presentation\">). <\/span><span dir=\"ltr\" role=\"presentation\">To better understand learners\u2019 needs and pain points when they <\/span><span dir=\"ltr\" role=\"presentation\">are introduced to DS programming, we investigated a large online <\/span><span dir=\"ltr\" role=\"presentation\">course on data manipulation designed for graduate students who <\/span><span dir=\"ltr\" role=\"presentation\">do not have a CS or Statistics undergraduate degree. We qualita<\/span><span dir=\"ltr\" role=\"presentation\">tively analyzed incorrect student code submissions for computa<\/span><span dir=\"ltr\" role=\"presentation\">tional notebook-based programming assignments in Python. We <\/span><span dir=\"ltr\" role=\"presentation\">identified common mistakes and grouped them into the following <\/span><span dir=\"ltr\" role=\"presentation\">themes: (1) programming language and environment misconcep<\/span><span dir=\"ltr\" role=\"presentation\">tions, (2) logical mistakes due to data or problem-statement misun<\/span><span dir=\"ltr\" role=\"presentation\">derstanding or incorrectly dealing with missing values, (3) semantic <\/span><span dir=\"ltr\" role=\"presentation\">mistakes from incorrect usage of DS libraries, and (4) suboptimal <\/span><span dir=\"ltr\" role=\"presentation\">coding. Our work provides instructors valuable insights to under<\/span><span dir=\"ltr\" role=\"presentation\">stand student needs in introductory DS courses and improve course <\/span><span dir=\"ltr\" role=\"presentation\">pedagogy, along with recommendations for developing assessment <\/span><span dir=\"ltr\" role=\"presentation\">and feedback tools to better support students in large courses<\/span><\/p>\n","protected":false},"excerpt":{"rendered":"<p>Data Science (DS) has emerged as a new academic discipline where students are introduced to data-centric thinking and generating data-driven insights through programming. Unlike traditional introductory programming education, which focuses on program syntax and core\u00a0 computer Science (CS) topics (e.g., algorithms and data structures), introductory DS education emphasizes skills such as studying the data at [&hellip;]<\/p>\n","protected":false},"featured_media":0,"template":"","meta":{"msr-url-field":"","msr-podcast-episode":"","msrModifiedDate":"","msrModifiedDateEnabled":false,"ep_exclude_from_search":false,"_classifai_error":"","msr-author-ordering":null,"msr_publishername":"ACM","msr_publisher_other":"","msr_booktitle":"","msr_chapter":"","msr_edition":"","msr_editors":"","msr_how_published":"","msr_isbn":"","msr_issue":"","msr_journal":"","msr_number":"","msr_organization":"","msr_pages_string":"","msr_page_range_start":"","msr_page_range_end":"","msr_series":"","msr_volume":"","msr_copyright":"","msr_conference_name":"Technical Symposium on Computer Science Education","msr_doi":"","msr_arxiv_id":"","msr_s2_paper_id":"","msr_mag_id":"","msr_pubmed_id":"","msr_other_authors":"","msr_other_contributors":"","msr_speaker":"","msr_award":"","msr_affiliation":"","msr_institution":"","msr_host":"","msr_version":"","msr_duration":"","msr_original_fields_of_study":"","msr_release_tracker_id":"","msr_s2_match_type":"","msr_citation_count_updated":"","msr_published_date":"2024-1-1","msr_highlight_text":"","msr_notes":"","msr_longbiography":"","msr_publicationurl":"","msr_external_url":"","msr_secondary_video_url":"","msr_conference_url":"","msr_journal_url":"","msr_s2_pdf_url":"","msr_year":0,"msr_citation_count":0,"msr_influential_citations":0,"msr_reference_count":0,"msr_s2_match_confidence":0,"msr_microsoftintellectualproperty":true,"msr_s2_open_access":false,"msr_s2_author_ids":[],"msr_pub_ids":[],"msr_hide_image_in_river":0,"footnotes":""},"msr-research-highlight":[],"research-area":[13560],"msr-publication-type":[193716],"msr-publisher":[],"msr-focus-area":[],"msr-locale":[268875],"msr-post-option":[],"msr-field-of-study":[249775,248116],"msr-conference":[],"msr-journal":[],"msr-impact-theme":[],"msr-pillar":[],"class_list":["post-979917","msr-research-item","type-msr-research-item","status-publish","hentry","msr-research-area-programming-languages-software-engineering","msr-locale-en_us","msr-field-of-study-computer-science-education","msr-field-of-study-data-science"],"msr_publishername":"ACM","msr_edition":"","msr_affiliation":"","msr_published_date":"2024-1-1","msr_host":"","msr_duration":"","msr_version":"","msr_speaker":"","msr_other_contributors":"","msr_booktitle":"","msr_pages_string":"","msr_chapter":"","msr_isbn":"","msr_journal":"","msr_volume":"","msr_number":"","msr_editors":"","msr_series":"","msr_issue":"","msr_organization":"","msr_how_published":"","msr_notes":"","msr_highlight_text":"","msr_release_tracker_id":"","msr_original_fields_of_study":"","msr_download_urls":"","msr_external_url":"","msr_secondary_video_url":"","msr_longbiography":"","msr_microsoftintellectualproperty":1,"msr_main_download":"","msr_publicationurl":"","msr_doi":"","msr_publication_uploader":[{"type":"file","viewUrl":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/DS_SIGCSE_2024.pdf","id":"979926","title":"ds_sigcse_2024","label_id":"243109","label":0}],"msr_related_uploader":"","msr_citation_count":0,"msr_citation_count_updated":"","msr_s2_paper_id":"","msr_influential_citations":0,"msr_reference_count":0,"msr_arxiv_id":"","msr_s2_author_ids":[],"msr_s2_open_access":false,"msr_s2_pdf_url":null,"msr_attachments":[{"id":979926,"url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-content\/uploads\/2023\/10\/DS_SIGCSE_2024.pdf"}],"msr-author-ordering":[{"type":"text","value":"Anjali Singh","user_id":0,"rest_url":false},{"type":"text","value":"Anna Fariha","user_id":0,"rest_url":false},{"type":"text","value":"Christopher Brooks","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Gustavo Soares","user_id":39183,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Gustavo Soares"},{"type":"user_nicename","value":"Austin Henley","user_id":41326,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Austin Henley"},{"type":"user_nicename","value":"Ashish Tiwari","user_id":39171,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Ashish Tiwari"},{"type":"user_nicename","value":"Chethan M","user_id":42234,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Chethan M"},{"type":"text","value":"Heeryung Choi","user_id":0,"rest_url":false},{"type":"user_nicename","value":"Sumit Gulwani","user_id":33755,"rest_url":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/microsoft-research\/v1\/researchers?person=Sumit Gulwani"}],"msr_impact_theme":[],"msr_research_lab":[],"msr_event":[],"msr_group":[663303],"msr_project":[],"publication":[],"video":[],"msr-tool":[],"msr_publication_type":"inproceedings","related_content":[],"_links":{"self":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/979917","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item"}],"about":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/types\/msr-research-item"}],"version-history":[{"count":1,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/979917\/revisions"}],"predecessor-version":[{"id":979923,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-item\/979917\/revisions\/979923"}],"wp:attachment":[{"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/media?parent=979917"}],"wp:term":[{"taxonomy":"msr-research-highlight","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-research-highlight?post=979917"},{"taxonomy":"msr-research-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/research-area?post=979917"},{"taxonomy":"msr-publication-type","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publication-type?post=979917"},{"taxonomy":"msr-publisher","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-publisher?post=979917"},{"taxonomy":"msr-focus-area","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-focus-area?post=979917"},{"taxonomy":"msr-locale","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-locale?post=979917"},{"taxonomy":"msr-post-option","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-post-option?post=979917"},{"taxonomy":"msr-field-of-study","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-field-of-study?post=979917"},{"taxonomy":"msr-conference","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-conference?post=979917"},{"taxonomy":"msr-journal","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-journal?post=979917"},{"taxonomy":"msr-impact-theme","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-impact-theme?post=979917"},{"taxonomy":"msr-pillar","embeddable":true,"href":"https:\/\/www.microsoft.com\/en-us\/research\/wp-json\/wp\/v2\/msr-pillar?post=979917"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}