6  Communication and Ethics

6.1 Data Science Communication Skills

This section was written by Matt Elliott.

6.1.1 Introduction

Hi! My name is Matt Elliott and I decided to start this presentation off by introducing myself and what I hope to become moving forward! I am currently a Senior aiming to graduate in Fall 2024 with a Bachelor’s of Arts in Individualized Data Science under the CLAS’ IISP program. My primary advisor is our professor Jun Yan! Moving forward, I hope to learn valuable skill sets and ideas from both this course and the Data Science field. Even using Quarto instead of typical Google Slides is a step for me in creating new skills! The topic I chose to discuss today in class is “Data Science Communication Skills” . I find this to be one of the most crucial topics to discuss about Data Science; since Data Scientists are often the glue that keeps projects, companies, and ideas together.

6.1.2 Describing Data Science and its Rise

  • According to IBM, Data Science “combines math and statistics, specialized programming, advanced analytics, artificial intelligence (AI), and machine learning with specific subject matter expertise to uncover actionable insights hidden in an organization’s data.

  • These insights are then used to “guide decision making” and create “strategic planning”

  • According to the U.S. Bureau of Labor Statistics, Data Science is
    “projected to grow 35 percent from 2022 to 2032, much faster than the average for all occupations”

  • The median annual pay for Data Scientists in May 2022 was around $103,500, showing its high demand in a monetary light

Data Science

6.1.3 Why does Communication Matter?

  • Communication matters in any professional setting in our lives, and especially in a field that can be extremely confusing and perplexing to those who are viewing it from an outside perspective.

  • The Data Incubator states that “You may work alongside data analysts or other scientists as part of a team, especially when handling large datasets or working on big projects. Beyond this, you may also frequently work with other teams of professionals who don’t work with data. Thus, it’s essential to be an excellent communicator to work with others effectively”. This is an important idea as being able to be a cooperative person will create partnerships that flow in the correct manner.

  • Data Scientists often present insights to other partners in order to facilitate goals and achievements in a professional setting.

  • Karen Church for Medium writes that “Communication enables data scientists to gather all the necessary information, clarify needs and expectations of stakeholders, and align their work with broader business goals.”

Communication

6.1.4 Inherent Communication Skills

  • Using the right communication methods
  • Friendliness
  • Confidence
  • Volume and tone
  • Empathy
  • Respect
  • Cues

Inherent Communication

6.1.5 Identify your audience

  • The Data Incubator states that “Transferring knowledge across departments is crucial, so it’s vital to share insights and analyses in simple, clear terms that don’t overwhelm individuals with jargon or technical details.”

  • Identifying your audience and speaking their language is an important step, as it can vary to low familiarity to full comprehension of the topic

  • In a 2018 HBR.org article, Hugo Bowne-Anderson interviewed 35 data scientists on his podcast and found that their main issues were: “lack of management/financial support,” “lack of clear questions to answer,” “results not used by decision makers,” and “explaining data science to others.” from Harvard Business Review

  • Knowing your audience in this situation can cover your back on how Data Scientists are treated in the field, communicating creates cooperation that can lead to the avoidance of these issues found.

Audience

6.1.6 Data Applictions

  • Data scientists have their hands full as many different fields and professions can use data analytics and information to facilitate their operations

  • These fields include:

    • Healthcare
    • Media/Entertainment
    • Retail
    • Telecommunication
    • Automotive
    • Digital Marketing
    • Cyber security
  • Data science communication skills vary in these fields, as some may prefer verbal information or visual information.

Applications

6.1.7 Storytelling

  • Storytelling in the context of Data Science gives the audience a shared goal within understanding the topics and information given.
  • The goals of promoting improved customer service, innovation, or operation optimization need to be conveyed in a manner that is direct and professional.
  • According to Sonali Verghese for Medium, here are the possibilities of telling a story within the Data Science field:
    • Explain how you arrived at a particular conclusion
    • Justify rationally why you approached a problem in a specific manner
    • Convey interesting insights in a way that gets people to think or act differently
    • Persuade your audience that your results are conclusive and can be turned into something actionable
    • Express why your findings are valuable and how they fit into the overall picture
  • This is an inspiring quotation that I found while researching this presentation. Valentin Mucke for Medium: “Data science is about humans. Data scientists must remember that, and not just when presenting to people with a non-technical background. It’s important to find common ground with everyone you work with to build trust and move forward effectively.

Storytelling

6.1.8 Data Visualization

  • The idea of expressing data through visualization has been a vital step for Data Scientists in the field
  • Examples of Data Visualization for Data Scientists are:
    • Many of these forms of visualization can be combined with other learned skill sets that will be mentioned in the next topic

Visuals

6.1.9 Usable Skill Sets for Data Communication

  • Coding languages: Python, Structured Query Language, R, Visual Basic for Applications, Julia

  • Statistical programming: the process of using computer programming languages to analyze and manipulate data for statistical purposes

  • Statistics and probability: help predict the likelihood of future events and understand patterns in data

  • Machine learning/Artificial intelligence: automates the data analysis process and makes predictions in real-time without human involvement, leading to further building and training of a data model to make real-time predictions

  • Statistical visualization: the graphical representation of information and data that uses visual elements like charts, graphs, and maps, and tools to provide an accessible ways to see and understand trends, outliers, and patterns in data

  • Data management: process of collecting, storing, organizing and maintaining data to ensure that it is accurate and accessible to those who need it reliably throughout the data science project lifecycle

skillsets

6.1.10 Gather questions and Feedback

  • According to the Data Incubator, a step to take before finalizing the project or an end of a report is to “consider soliciting direct feedback from your audience. It doesn’t matter if you have to prompt them to ask you questions or if they’re impatient to put your knowledge to the test—this form of interaction can help you improve your communication skills and establish a successful career as a data scientist.”

  • Being able to interact with your audience gives them a better understanding of the topic at hand, and can help avoid ambiguity that would occur if communication was not present

Feedback

6.1.11 Sources

https://towardsdatascience.com/tell-stories-with-data-communication-in-data-science-5266f7671d7
https://hbr.org/2019/01/data-science-and-the-art-of-persuasion
https://towardsdatascience.com/communicating-as-a-data-scientist-why-it-matters-and-how-to-do-it-well-f1c34d28c7c4
https://www.thedataincubator.com/blog/2022/10/13/improve-your-data-science-communication/
https://emeritus.org/in/learn/why-communication-skills-are-important-for-a-data-analyst/
https://medium.com/intercom-rad/the-most-underrated-skill-in-data-science-communication-7ed2fab82801
https://www.ibm.com/topics/data-science
https://www.bls.gov/ooh/math/data-scientists.htm
https://medium.com/analytics-vidhya/introduction-to-data-science-28deb32878e7
https://blog.jostle.me/customerresources/3-actionable-communication-tips
https://www.breathehr.com/en-gb/blog/topic/employee-performance/effective-communication-is-key-to-your-business-success
https://ideas.ted.com/before-your-next-presentation-or-speech-heres-the-first-thing-you-must-think-about/
https://d1jnx9ba8s6j9r.cloudfront.net/blog/wp-content/uploads/2018/12/Data-Science-Applications-Edureka.jpg  https://lectera.com/info/storage/img/20210805/fa586bb6c04bf0989d70_808xFull.jpg
https://thenewstack.io/7-best-practices-for-data-visualization/
https://www.learningtree.com/blog/the-6-major-skill-areas-of-data-science/
https://www.poynter.org/reporting-editing/2019/cohort4/

6.2 Ethical Considerations for Data Scientists

The field of data science, with its vast capabilities for societal impact, necessitates a robust ethical framework. Below, we outline key ethical principles that should guide data scientists in their work, referencing foundational works that have contributed to the ongoing discourse on ethics in data science.

6.2.1 Privacy and Anonymity

Privacy refers to the right of individuals to control information about themselves and decides who can access it. Anonymity is closely related, allowing individuals to act or communicate without revealing their identities. In data science, respecting privacy means ensuring that personal data is used in a way that is consistent with the expectations of the individuals it pertains to and adheres to applicable laws and ethical guidelines. Anonymity protects individuals from potential harm that could arise from the disclosure of their identity alongside their data. Techniques like data anonymization are employed to protect privacy and anonymity, stripping datasets of personally identifiable information to prevent the tracing of data back to an individual.

Ensuring the privacy and anonymity of data subjects is paramount in data science. O’Neil (2016) and Noble (2018) discuss the challenges and implications of privacy in the age of big data, highlighting the necessity for data scientists to employ techniques such as anonymization and secure data handling to protect individuals’ identities and personal information.

6.2.2 Bias and Fairness

Bias in data science refers to systematic errors that favor certain outcomes over others, which can stem from the data collection process, algorithmic design, or model interpretation stages. Fairness is the principle that seeks to ensure equitable treatment and outcomes for all individuals, particularly across different demographic groups. Addressing bias and ensuring fairness involve critically evaluating and adjusting datasets and algorithms to prevent discrimination against any individual or group. This can include diversifying training data, employing statistical methods to identify and correct for biases, and designing algorithms that account for fairness metrics.

The presence of bias in data and algorithms represents a significant ethical challenge, potentially leading to discrimination and unfair treatment. Kearns and Roth (2019) explore mechanisms for detecting and mitigating bias to ensure fairness in algorithmic decision-making. Data scientists must critically examine both the data and the algorithms they use to prevent perpetuating or amplifying biases.

6.2.3 Transparency and Accountability

Transparency in data science involves openness about the methodologies, data sources, and algorithms used in developing models, allowing others to understand and evaluate the decision-making process. Accountability means that data scientists and their organizations take responsibility for the outcomes of their data-driven decisions, including addressing any negative impacts. Achieving transparency and accountability requires thorough documentation, sharing of methodologies and data (where possible), and the creation of mechanisms for auditing and challenging algorithmic decisions.

Transparency in the development and deployment of data science models, along with accountability for their outcomes, is critical. O’Neil (2016) argues for the necessity of making data science processes transparent and accountable, particularly when models influence significant decisions affecting individuals’ lives.

6.2.4 Data Integrity and Quality

Data integrity refers to the accuracy, consistency, and reliability of data throughout its lifecycle. Quality of data means that the data is fit for its intended uses in operations, decision-making, and planning. In data science, ensuring data integrity and quality is critical for building models that accurately represent the world and make reliable predictions. This involves rigorous data collection, cleaning, and validation processes to ensure that data is not corrupted, accurately reflects the phenomena it is supposed to represent, and is used appropriately in models.

Maintaining the integrity and quality of data is fundamental to ethical data science practice. Saltz and Dewar (2019) emphasize the importance of ensuring that data used and produced by data scientists are accurate, valid, and reliable, supporting the credibility of data science findings and applications.

6.2.5 Respect for Intellectual Property

Respecting intellectual property in data science means acknowledging and adhering to the legal and moral rights of creators and owners of data, algorithms, software, and other resources used in data science projects. This includes proper attribution of sources, complying with licensing agreements, and not using copyrighted materials without permission. Ethical practice requires data scientists to be aware of the origins of the data and tools they use, to ensure that their use respects the rights of the creators and is in line with any terms and conditions of use.

Data scientists must respect intellectual property rights, properly attributing data sources and adhering to the terms of use for data and software. The ethical use of data and software tools, acknowledging creators and complying with licensing, is essential for fostering a culture of integrity within the field (Saltz and Dewar 2019).