Publications – Shikha Soneji

Signed, Sealed,... Confused: Exploring the Understandability and Severity of Policy Documents

Soneji, S., Panda, S., Neve, S., & Dodge, J.

arXiv preprint, arXiv:2502.08743, 2025.

Abstract

In general, Terms of Service (ToS) and other policy documents are verbose and full of legal jargon, which poses challenges for users to understand. To improve user accessibility and transparency, the "Terms of Service; Didn't Read" (ToS;DR) project condenses intricate legal terminology into summaries and overall grades for the website's policy documents. Nevertheless, uncertainties remain about whether users could truly grasp the implications of simplified presentations. We conducted an online survey to assess the perceived understandability and severity of randomly chosen cases from the ToS;DR taxonomy. Preliminary results indicate that, although most users report understanding the cases, they find a bias towards service providers in about two-thirds of the cases. The findings of our study emphasize the necessity of prioritizing user-centric policy formulation. This study has the potential to reveal the extent of information imbalance in digital services and promote more well-informed user consent.

View on arXiv Policy Usability ToS;DR User Study Privacy

Demystifying Legalese: An Automated Approach for Summarizing and Analyzing Overlaps in Privacy Policies and Terms of Service

Soneji, S., Hoesing, M., Koujalgi, S., & Dodge, J.

arXiv preprint, arXiv:2404.13087, 2024.

Abstract

The complexities of legalese in terms and policy documents can bind individuals to contracts they do not fully comprehend, potentially leading to uninformed data sharing. This work addresses that challenge by developing language models capable of producing automated, accessible summaries and scores for such documents, improving user understanding and enabling informed decision-making. We compared transformer-based and conventional models using our dataset, with RoBERTa achieving the highest performance at an F1-score of 0.74. Using this best-performing model, we detected redundancies and potential guideline violations by identifying overlaps in GDPR-required documents, emphasizing the need for stronger GDPR compliance.

View on arXiv Privacy NLP Policy Analysis

Revolutionizing Digital Consent: An Automated Approach to Simplifying and Deciphering Privacy Policies for Empowered User Understanding

Soneji, S.

29th International Conference on Intelligent User Interfaces, 2024.

Abstract

Terms of service (ToS) documents provide guidelines that users must follow to utilize a company’s services. Although posting ToS is not always legally required, certain regulations mandate it in specific cases. Users are expected to read and understand these terms before using a service — yet the familiar checkbox of “Yes, I have read and agree” is often called the “biggest lie on the internet.” ToS documents are typically lengthy and complex, making them difficult for users to read. Research shows participants spent an average of 51 seconds reading a ToS that should take 15–17 minutes to read thoroughly, with information overload being a key factor. This work uses Natural Language Processing (NLP), Machine Learning (ML), and Explainable AI (XAI) to create a browser extension that automatically simplifies and categorizes complex documents like privacy policies, producing summarized and annotated versions for faster comprehension. By bridging the knowledge gap, this technology empowers users to make informed decisions, improves digital consent transparency, builds trust, and ensures that terms are truly understood and agreeable.

Paper / Slides Consent Summarization

Systems Engineers’ Effectiveness in an Organization: Text and Visual Analytics Approach

Zavala, A., Soneji, S., Kothari, S., Ramirez-Marquez, J. E., Tao, H. Y. S., & Hutchison, N.

IEEE Systems Journal, 14(4), 5049–5060, 2020.

Abstract

The Helix project is a multiyear longitudinal research study focused on understanding what makes systems engineers effective. Data were gathered through interviews, site visits, and surveys with individuals performing systems engineering (SE) work, supervising systems engineers, using SE products, or otherwise knowledgeable about SE. Responses were documented and analyzed using text mining techniques such as topic modeling and word similarity. The results were interpreted through visualizations generated for three scenarios: full corpus, by organization type, and by domain. These visuals reveal the common language among systems engineers within topics and the relationships between words in the three scenarios. The article also offers insights into the factors that enable systems engineers to be proficient and effective.

Publisher Text Mining Analytics

Text analysis approach to systems engineers’ effectiveness in an organization

Soneji, S., Zavala, A., Tao, H. Y. S., Burke, P., Kothari, S., Luna, S., Hutchison, N., & Ramirez-Marquez, J.

2019 IEEE International Systems Conference (SysCon), 1–8.

Abstract

The Helix project of the Systems Engineering Research Center (SERC) seeks solutions to help organizations increase the effectiveness of their systems engineering workforce by gathering data through interviews, site visits, and surveys with systems engineers and their management peers. The study documents and analyzes these responses, using text analysis and visualization techniques to transform large volumes of qualitative data into accessible, interpretable formats. This enables organizations to visually explore trends and insights, guiding targeted changes to improve systems engineering effectiveness.

IEEE Xplore NLP Org Effectiveness

How to Measure Human-AI Prediction Accuracy in Explainable AI Systems

Koujalgi, S., Anderson, A., Adenuga, I., Soneji, S., Dikkala, R., Nader, T. G., et al.

arXiv preprint, arXiv:2409.00069, 2024.

Abstract

Assessing an AI system's behavior-particularly in Explainable AI Systems-is sometimes done empirically, by measuring people's abilities to predict the agent's next move-but how to perform such measurements? In empirical studies with humans, an obvious approach is to frame the task as binary (i.e., prediction is either right or wrong), but this does not scale. As output spaces increase, so do floor effects, because the ratio of right answers to wrong answers quickly becomes very small. The crux of the problem is that the binary framing is failing to capture the nuances of the different degrees of "wrongness." To address this, we begin by proposing three mathematical bases upon which to measure "partial wrongness." We then uses these bases to perform two analyses on sequential decision-making domains: the first is an in-lab study with 86 participants on a size-36 action space; the second is a re-analysis of a prior study on a size-4 action space. Other researchers adopting our operationalization of the prediction task and analysis methodology will improve the rigor of user studies conducted with that task, which is particularly important when the domain features a large output space.

View on arXiv Explainable AI Human-AI collaboration

BibTeX

@article{koujalgi2024measure,
          title={How to Measure Human-AI Prediction Accuracy in Explainable AI Systems},
          author={Koujalgi, S. and Anderson, A. and Adenuga, I. and Soneji, S. and Dikkala, R. and Nader, T. G. and others},
          journal={arXiv preprint arXiv:2409.00069},
          year={2024}
        }

Helix: Developing an understanding of organizational systems engineering effectiveness

Hutchinson, N. A., Verma, D., Burke, P., Giffin, R., Luna, S., Makwana, D., Kothari, S., Soneji, S., et al.

INCOSE International Symposium, 29(1), 652–668, 2019.

Abstract

There is significant interest in the DoD, as well as in Congress, in ensuring that DoD can characterize and manage its systems engineering (SE) workforce. Establishing a baseline understanding of the SE workforce is essential for measuring the impact of SERC and other DoD human capital initiatives, including recruitment and retention programs. This research aims to answer one primary research question—how organizations can improve the effectiveness of their SE workforce—supported by three sub-questions focused on impact, critical enabling factors, and shifts needed for different SE approaches. In 2018, the Helix team engaged seven new organizations and added over 100 interviewees to the dataset, bringing total participation to 464 individuals and 29 organizations, supplemented by consultant interviews. Results inform both DoD policy and SERC’s human capital research program.

Publisher Systems Engineering

Evolution of the Helix Project: From Investigating the Effectiveness of Individual Systems Engineers to Systems Engineering Organizations

Hutchison, N., Tao, H. Y. S., Burke, P., Luna, S., Zavala, A., Kothari, S., Soneji, S., et al.

Stevens Institute of Technology, 2019.

Abstract

The U.S. Department of Defense (DoD) and the Defense Industrial Base (DIB) have faced persistent systems engineering challenges in recent years. Launched in 2012, the Helix project is a multi-year longitudinal research study focused on understanding what makes systems engineers effective. Previous Helix work centered on individual systems engineers, forming a critical foundation for the current research. In 2018, the scope expanded beyond workforce focus to include how organizations can strengthen systems engineering as a discipline. This includes identifying organizational characteristics that enhance workforce effectiveness and enable better SE practices. This paper updates research findings and highlights the shift toward studying organization-level enablers of systems engineering success.

Report

Key Techniques used for Security in the Cloud

Kotak, K. C., Wadhwani, R., Soneji, S., & Sahu, S.

IJSRD – International Journal for Scientific Research & Development, 5(4), 2017.

Abstract

Over recent years, Cloud Computing has advanced rapidly, with more companies moving workloads to the cloud. This growth has intensified the need to protect vast volumes of user data stored and processed on centralized cloud platforms. Core challenges include securing, protecting, and processing user-owned data while maintaining its integrity. This paper reviews and evaluates techniques applied in cloud computing environments to ensure data security and integrity for individual users.

Journal Cloud Security

Survey on Privacy Preserving Cloud Auditing for Shared Data

Kotak, K. C., Wadhwani, R., Soneji, S., & Sahu, S.

International Journal of Engineering Sciences Research Technology, 6(2), 2017.

Abstract

Cloud computing enables both storage and sharing of data at scale, but raises critical concerns over data integrity and security during sharing. Verifying integrity and consistency of stored cloud data is challenging for both the user and the cloud server. Public Auditing methods address this by allowing a trusted Third Party Auditor (TPA) to verify the data. This paper surveys three landmark methodologies in privacy-preserving cloud auditing for shared data, reviewing their mechanisms and comparative strengths.

Journal Cloud Privacy