SWiP Project

SWiP Project banner at Nelson Mandela University
SWiP Project 2025

The SWiP project makes use of language, data and knowledge technologies to promote language equality among all of South Africa's official languages. The linguistic hegemonic status of English (and to a lesser extent Afrikaans) has resulted in English being the language of learning and teaching[1] which downplays an African epistemology,[2] thus local African languages are commonly under resourced.[3] The acronym"SWiP" describes the three main partners in a national collaboration between SADiLaR, the free encyclopedia Wikipedia and PanSALB who are working alongside local speech and language communities within Academica, to address language equality using digital technologies, especially Wikipedia.[4]

Under apartheid, certain languages were marginalised, including isiNdebele, Siswati, Xitsonga and Tshivenda.[5] To address the underrepresentation of South Africa's indigenous languages, three organisations are collaborating to build better low-resource languages corpora. These organisations are:[6]

Wikipedia is a common source of language data for natural language processing (NLP).[7] Low-resource languages have limited corpora of text (speech data, annotated text and other forms of linguistic data) for LLMs to draw on for NLP. The SWiP project has introduced a variety of alternative possibilities for the collection and compilation of corpora of suitable text for low-resource languages, and rolled this out on a national scale. These corpora can be used to create corpus-based dictionaries or semi-automatic translation.[8]

This collaborative project is also intended to promote, preserve, and digitise South Africa's indigenous languages and cultural knowledge by enhancing their presence on digital platforms such as Wikipedia.[9] By partnering with cultural and linguistic organisations, the project was designed to close the digital gap and ensure that local languages and cultural narratives are preserved and shared online.[6]

  1. ^ http://hdl.handle.net/10962/d1003556
  2. ^ Ntombela, B X S (2024). "The hegemony of the English language and the plight of African languages: towards linguistic revolution". African Perspectives of Research in Teaching and Learning. 8 (1): 184–195. doi:10.70875/v8i3article14 (inactive 1 May 2025).{{cite journal}}: CS1 maint: DOI inactive as of May 2025 (link)
  3. ^ Bird, Steven (2022). "Local Languages, Third Spaces, and other High-Resource Scenarios". Proceedings of the 60th Annual Meeting of the Association for Computational Linguistics (Volume 1: Long Papers). Association for Computational Linguistics. pp. 7817–7829. doi:10.18653/v1/2022.acl-long.539.
  4. ^ https://www.pansalb.org/wp-content/uploads/External-Newsletter-April-2023-March-2024_compressed.pdf
  5. ^ Mudau, Thama; Jonker, Euane; Maila, Anthony; Malema, Maropeng (2024). "SWiP Project Launch" (PDF). PanSALB News. pp. 8–10.
  6. ^ a b "The SWiP project brings isiNdebele to Wikipedia's main platform, expanding access and visibility for official South African languages – SADiLaR". sadilar.org. Retrieved 2025-04-14.
  7. ^ "Wikipedia's value in the age of generative AI". 12 July 2023.
  8. ^ Setaka-Bapela, M; Van Zaanen, M (July 2024). Corpus-based dictionaries for low-resource languages (PDF). The African Association for Lexicography. Retrieved 15 April 2025.
  9. ^ "SWiP project to champion SA's indigenous languages online – SADiLaR". sadilar.org. Retrieved 2025-04-14.

© MMXXIII Rich X Search. We shall prevail. All rights reserved. Rich X Search