Mobile handheld devices have become friends in people’s daily lives. Frequent usage of available applications, especially video streaming, causes exponential growth in mobile IP traffic. Service providers and application developers need to know the tradeoff between the end-to-end (e2e) performance and cost since, not fully met expectations of customers from those applications cause reduced usage of services, revenue, and growth in the churn rate. The user-centric approach, which involves users into the assessment of the performance of a particular service or application, has become important within the inter-disciplinary research field Quality of Experience (QoE). The ultimate goal is to obtain simplified QoE models on particular applications based on the underlying network-based performance metrics as well as other non-technical metrics related to the end-user. Android smartphones that use open-source code and well-documented Application Programming Interfaces (API), facilitate researchers to do low-level and network-based performance analysis on end-user mobile devices while considering user feedback. In this thesis, the influential factors for Android smartphone-based QoE are studied. The relation between the quantified user-perceived QoE metric, i.e., Mean Opinion Score (MOS), and the artifacts in real-time video streaming such as blockiness and jerkiness caused by network-level metrics, e.g., Packet Delay Variation (PDV), Maximal Burst Size (MBS), and video bit rate are identified. Challenges in assessing the user-perceived QoE of video with the focus on memory effects are discussed. The relation between the objective metric of user reaction time and the user-perceived QoE is presented. Furthermore, different methods to assess end-user-perceived QoE such as Day Reconstruction Method (DRM), Experience Sampling Method (ESM), and preliminary online survey are described. Further influential factors, e.g., context, user routines, user lifestyle, and Quality of Service (QoS) metrics such as Round Trip Time (RTT) and Server Response Time (SRT), are studied. The thesis is concluded with preliminary findings that relate the instantaneous total power consumption to the jerkiness of a real-time video stream with evidences such as stalling events.