Fair Is Digital Poster

Summary:

In response to continually growing instances of algorithmic harms perpetrated through machine learning applications, “fairness” as an approach, set of standards practices has been offered as a potential remedy to “bias” in machine learning research. Building on the extensive conversation on definitions and limits to fairness as a way of approaching algorithmic harm and research conducted for INFO 656 Machine Learning and INFO 640 Data Analysis. Through topic modeling, specifically Latent Dirichlet Allocation I hoped to answer the question “ what do machine learning researchers mean by fairness?” by identifying different approaches or understandings of “fairness” in a collection of machine learning literature.

Methods:

For this third iteration I returned to the datasets created in the first two iterations and combined them into a single dataset. I cleaned the dataset moving back and forth between the pandas python library and Rstudio. Preprocessing steps conducted included tokenization, forming bigrams, parts of speech tagging and lemmatization though I ultimately only analyzed unigram tokens. I used the Gensim python library to create a dictionary and document term matrix and then fit four LDA models. Finally I used the pyLDAvis python library to generate the visualization on this digital poster.

Summary:

Methods:

Title Topics where k=21

Title Topics where k=√n/2

Abstract Topics where k=21

Abstract Topics where k=√n/2