THE LUSOPHONE DIGITAL HUMANITIES AND WHAT THEY (WE) ARE DOING FROM THE SOUTH: TEXTUAL CORPUS ANALYSIS AND FAIR PRINCIPLES TO TACKLE HEGEMONY.

1. Abstract

Introduction

The year 2018 witnessed a significant growth of research groups and laboratories dedicated to Digital Humanities in Brazil, however, without producing for this international community. Nowadays, the Digital Humanities are beginning to gain greater public interest in Brazil and other countries in South America.

In this perspective, our research would like to discuss what is produced in the Lusophone-speaking Digital Humanities in South America. The difficulty in diagnosing such production is evident because the global information regime has surrendered to the socio-technical and cultural monopoly mediated by Google, Amazon, Facebook, Apple, and Microsoft, major technological players (Fiormonte & Sordi, 2019). In the case of the Portuguese-speaking world, it is noticeable the language barrier often puts the debate and dialogue-less in evidence. In addition, English literature as it is evidently English-speaking escapes from the larger question of problematization in the face of critical thinking, which is decolonization.

Methodology

The present work came from the research data of Gomes et al (2018) that retrieved Digital Humanities academic papers, thesis, and books written in the Portuguese language from Google Scholar. Despite Google Scholar’s public access, it does not apparently provide consistent means that meet FAIR principles - Findable, Accessible, Interoperable, Reusable (Wilkinson, 2016), due to concentration and opacity of information retrieved, that it may be visible but not operable.

This empirical study analyzed 454 abstracts which composed the textual corpus through text mining techniques with IRaMuTeQ - Interface "R" for Multidimensional Analysis of Texts and Questionnaires (Marchand & Ratinaud, 2012).

Results

The result of the similitude analysis (graph theory) unveiled possible thematic convergences of the Portuguese language production in the Digital Humanities. The graph showed a central cluster represented by the term digital, which has a semantic attraction with the following terms: conceito, texto, meio, ferramenta, linguagem, and interação. In addition, this cluster has two subclusters identified by artigo and processo, as well it links other opposing clusters: novo, estudo, and pesquisa. On the top right, the cluster novo has two subclusters social and nao. In the first subcluster, the term social presented a possible connection between the highlighted terms: comunicacao, informacao, social, and rede. The second subcluster demonstrated a balance between the terms: internet, possivel and acesso. On the bottom left, the cluster estudo exposed the term analisar in evidence, and it links to the cluster pesquisa which is on the extremity of the graph. From this analysis, it can infer the term set reflects actions, products, secondary research objects, and methods, besides the problems and challenges of non-internet access in the South.

The word cloud allowed the quick visualization and identification of the main keywords of the textual corpus: digital, analisar, pesquisa, novo, comunicacao, pesquisa, tecnologia and nao. Furthermore, this result reinforced the perception of the similitude analysis.

Conclusions

The usual bibliometric retrieval based on Web of Science (WOS) and Scopus databases does not show the plethora of academic papers produced on Global South. From a decolonizing perspective, this study shows that scraping Google Scholar data could bring a broader result if you want to analyse portuguese scientific production. In addition, the use of Zenodo allowed the research result to have a visibility to the public outside Brazil allowing that the South American production could integrate Lusophone Digital Humanities in the global context, thus representing an important and necessary technopolitical action for researchers from that language community.

References

1.Gomes, J.; Castro, F.; Pimenta, R. (2018). Google Scholar como fonte de medição da produção científica lusófona. LATmetrics - Anais do I Congresso de Altmetria e Ciência Aberta na América Latina. Dataset accessible: .

2.Fiormente, D. & Sordi, P. (2019). Humanidades Digitales del Sur y GAFAM. Para uma geoppol[ítica del conocimento digital. Liinc em Revista, Rio de Janeiro, v.15, n.1, pp.108-130.

3.Marchand, P. & Ratinaud, P. (2012). L'analyse de similitude appliquée aux corpus textuels: les primaires socialistes pour l'élection présidentielle française. Actes des les Journées Internationales d'Analyse Statistique des Données Textuelles, pp.687-699.

4. Wilkinson, M. D., et al. (2016) The FAIR Guiding Principles for scientific data management and stewardship. Sci. Data 3. DOI: 10.1038/sdata.2016.18