Ibrahim Khalilov
I build and evaluate AI-enabled systems, with a focus on agent behavior, privacy, security, robustness, and reproducible experimentation.
About Me
I am a PhD student in Computer Science at Johns Hopkins University. My research focuses on AI and agent evaluation, privacy and security, and reproducible systems research.
A major part of my recent work examines how computer-use agents behave under adversarial interface conditions, including fine-print injections and dark-pattern interfaces, and where human oversight can improve outcomes. I also build mobile-system research infrastructure for controlled and repeatable experimentation.
Before pursuing my PhD, I worked as a software engineer at various companies, including Virginia Institute for Spaceflight and Autonomy (VISA), Thrillworks, and Fairly AI. This industry experience has given me a unique perspective on how to bridge the gap between academic research and practical, real-world applications.
Research
The Obvious Invisible Threat: LLM-Powered GUI Agents’ Vulnerability to Fine-Print Injections
Chaoran Chen, Zhiping Zhang, Bingcan Guo, Shang Ma, Ibrahim Khalilov, Simret A Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li
Toward a Human-Centered Evaluation Framework for Trustworthy LLM-Powered GUI Agents
Chaoran Chen, Zhiping Zhang, Ibrahim Khalilov, Bingcan Guo, Simret A. Gebreegziabher, Yanfang Ye, Ziang Xiao, Yaxing Yao, Tianshi Li, Toby Jia-Jun Li
Dark Patterns Meet GUI Agents: LLM Agent Susceptibility to Manipulative Interfaces and the Role of Human Oversight
Jingwei Tang, Chaoran Chen, Junnan Li, Zhiping Zhang, Bingcan Guo, Ibrahim Khalilov, Simret A. Gebreegziabher, Yanfang Ye, Tianshi Li, Toby Jia-Jun Li
Projects
PriviSense: Reproducible Mobile Systems Evaluation
PublishedAn on-device Android instrumentation toolkit for running controlled experiments with mobile applications. PriviSense uses dynamic instrumentation to modify sensor and system signals at runtime, making it possible to study how apps respond to different simulated contexts without rewriting the target applications.
Key Highlights:
- •Runtime sensor and system-signal spoofing in unmodified Android apps
- •Reproducible testing workflows using automation, screenshots, and logs
- •Published as an ICSE 2026 demo paper
Technologies:
Safety and Robustness Evaluation for Computer-Use Agents
OngoingResearch on how LLM-powered computer-use agents behave under adversarial interface conditions. This work studies whether agents recognize or fall for manipulative UI elements, where failures occur, and how different forms of human oversight can improve outcomes.
Key Highlights:
- •Evaluated agent behavior under fine-print injections and dark-pattern interfaces
- •Contributed to reproducible testbeds for analyzing agent failure modes
- •Studied when human oversight helps improve agent robustness
Technologies:
Software Engineering Experience
HFC Specialist
Scale AI
Selected as a specialist with the Human Frontier Collective, contributing to expert evaluation work for frontier AI systems and technical reasoning projects.
Software Engineer
Thrillworks
Built and shipped multiple mobile and web applications for large corporations including President's Choice (PC Financial, PC Insurance) and Ryobi. Created responsive and accessible websites using modern web technologies.
Key Achievements:
- •Delivered production applications for major corporations like PC Financial
- •Built mobile and responsive apps with Flutter, Gatsby, React, and Tailwind CSS
- •Developed and maintained microservices using NestJS
Technologies Used:
Robotics Software Intern
Virginia Institute for Spaceflight and Autonomy
Developed applications to control robots using WamV ROS network for remote control and feedback reception. Fine-tuned pretrained models for regression and classification tasks, integrating them into control systems.
Key Achievements:
- •Built robot control application using WamV ROS network architecture
- •Fine-tuned ML models for autonomous navigation and control systems
- •Implemented effective communication between ROS nodes and Docker containers
Technologies Used:
Full-stack Developer
Fairly AI
Built scalable web applications for AI-powered fairness assessment tools. Worked with machine learning teams to integrate ML models into production web applications.
Key Achievements:
- •Implemented real-time notification system using WebSockets
- •Built AI risk management dashboard with Material-UI
- •Integrated Firebase authentication and developed user management system
Technologies Used:
Get in Touch
Let's Connect
I'm always interested in discussing research collaborations, potential projects, or just having a conversation about AI, mobile development, and the future of technology. Feel free to reach out!