Camera projection with the pinhole model

A camera is a mapping between the 3D world (object space) and a 2D image.

In general, the camera projection matrix P has 11 degrees of freedom: \[ P=K[R\ \ \ t] \]

Component # DOF Elements Known As
K 5 \(f_x, f_y, s,p_x, p_y\) Intrinsic Parameters; camera calibration matrix
R 3 \(\alpha,\beta,\gamma\) Extrinsic Parameters
t (or \(\tilde{C}\)) 3 \((t_x,t_y,t_z)\) Extrinsic Parameters

3D world frame ----- R, t ----> 3D camera frame ------ K -----> 2D image


  • P: Projective camera, maps 3D world points to 2D image points.

  • K: Camera calibration matrix, 3 x 3, \(x=K[I|0]X_{cam}\), given 3D points in camera coordinate frame \(X_{cam}\), we can project it into 2D points on image \(x\).


  • R and t: Camera Rotation and Translation, rigid transformation. \(X_{cam}=( X,Y,Z,1)^T\) is expressed in the camera coordinate frame. In general, 3D points are expressed in a different Euclidean coordinate frame, known as the world coordinate frame. The two frames are related via a rigid transformation (R, t).

Some other terms you may see

  • P: 3x4, homogeneous, camera projection matrix, \(P=diag(f,f,1)[I|0]\). P is K without considering \((x_{cam},y_{cam})\) in the image. (In other words, it simplify \((p_x, p_y)=(0,0)\).