Isotope distribution을 계산하기 위한 new method

Transcript Isotope distribution을 계산하기 위한 new method

2010. 11. 26.
박형서
Abstract
• New method
• Molecular formula 와 elemental isotope abundances 으로부터 isotope
distribution을 계산
• Fourier transform method를 사용
• Finite resolution의 peak profile을 만듬
• Peak profile function은 실험기구 결과와 match 되도록 조절할수 있음
• Fast, accurate, economical(memory)
• Large molecules에 대해 사용됨
2
Introduction
• Isotopically complex molecules
•
•
have mass distributions
다음과 같은 factor로부터 계산될수 있다.
• high molecular weight
• a large number of distinct elements in the molecular formula
• the presence of elements containing a large number of isotopes
3
Introduction
• Polynomial method
•
•
•
Producing a peak list of mass/intensity pairs
Each item in the list represents a specific isotopic composition
실험 data와 비교할수 있는 형태의 결과를 얻기 위해서 peak list를 finite width
peak
shape function를 가지고 convolute 해야함
• computational overhead 발생
• Improvement polynomial method
• computational effort를 줄이고, pruning strategry 사용(Yergey)
• mass spacing이 정확하게 1 Da 증가한다고 가정, dalton당 하나의 stick을 가
지고 stick spectra를 만듬(Kubinyi)
4
Mathematical development
• Convolution theorem
• Convolution in one domain corresponds to multiplication in the
trasformed domain
• For example, consider Cl and Cl2
• mass domain에서 Cl의 isotope distribution은 두개의 peak으로 구성
• a δ function at mass 34.969 having a relative intensity of 0.75529
• a δ function at mass 36.966 having a relative intensity of 0.24471
FCl (m)  0.75529 (m  34.969)  0.24471 (m  36.969)
(1)
f Cl (  )  0.755290e34.969( i 2 )   0.24471e36.966( i 2 ) 
(2)
FCl 2 (m)  0.5705 (m  69.938)  0.3696 (m  71.935)  0.0598 (m  73.932)
(3)
f Cl2 ( )  0.5705e69.938(i 2 )   0.3696e71.935(i 2 )   0.0598e73.932(i 2 ) 
(4)
5
Mathematical development
•
•
이러한
procedure 어떤 복잡한 분자에도 확장될수 있다.
For example, hypothetical cluster Mg10Cl19
f Mg10Cl19 ( )  [ f Mg ( )]10[ fCl ( )]19
•
•
(5)
μ domain 표현에서 대응되는 factor를 만든다.
Isotope distribution은 pre-exponential factors 와 exponent로부터 직접
분자식에 있는 각 원소는
구할수 있다.
•
이러한 방식은 비효율적인 polynomial method로 알려졌기 때문에, 이 논문에
서는
isotope distribution을 계산하기 위한 효율적 방법을 제안
6
Mathematical development
• New method에서는 (5)식의 형태에서 μ domain function을 계산하는데,
discrete intervals과 Fourier transforms f(μ)는 isotope distribution의
discretely sampled version을 만들기 위해서 계산한다.
•
선택된
peak shape function(S(m))을 포함하기 위해 다음식을 수행
s (  )  IFT[ S (m)]
(6)
• where IFT denotes the discrete inverst Fourier transform operation
• s(μ) is the inverse Fourier transform of S(m)
F (m)  FT[ s (  ) f (  )]
(7)
• where FT denotes the discrete Fourier transform operation
7
Mathematical development
•
효율적으로 계산하기 위해 고려해야 할 것
• sampling point의 수가 2의 거듭제곱근인 FFT 알고리즘을 사용
• Isotope distribution을 질량 0 근처로 shifting하는 heterodyne function f (μ)사
용
F (m  m0 )  FT[e  m0 (i 2 )  s(  ) f (  )]
(8)
• where m0 is average molecular weight
• Complex plane에서 많은 계산을 하기 위해 polar coordinates 사용
8
Mathematical development
• Algorithm
1.
2.
3.
4.
5.
μ domain function을 계산(eq(1))
분자에 대해 μ domain function을 계산(eq(5))
mass domain에서 peak shape function을 선택하고 fast Fourier inverse
transform method 사용해서 μ domain을 계산 (eq(6))
두개의 μ domain function을 곱함 (eq(7))
Isotope distribution에서 평균분자량을 선택하고, heterodyne function을
각 원소에 대해
사용
6.
7.
(eq(8))
Resulting function을 사용하면서, FFT를 적용해서 isotope distribution의
mass domain을 계산 (eq(8))
Heterodyne에서 사용된 질량을 다시 더해서 정확한 mass axis을 restore
9
Mathematical development
• 계산을 수행하기전에 결정되어야 할 parameter
• The number of points per dalton
• The mass range to include in the calculation
• A parameter that relates to the width of the peak profile
• Mass range
mass range  k (1   )
2
1
2
• δ 는 전체 isotope distribution의 표준편차
• k 값은 큰 분자량에 대해서는10 으로 하는것이 좋음
10
Sample calculations
• 알고리즘의 구현환경
• Mercury 프로그램, Borland Turbo C 컴파일러 사용
• 배열의 크기는 4096 double precision points, 2048 complex number
• FFT 계산에서 배열은 wrap-around 순서로 arrange된다.
•
•
•
•
N개의 complex number을 갖는 배열
N/2 중 first half 는 μ ≥ 0 에서 sampled point를 갖는다.
N/2 중 second half 는 μ < 0 에서 sampled point를 갖는다.
μ domain에 대응되는 peak shape function은 Gaussian으로 가정
11
Sample calculations
s(  j )  e  [( j 1)
e
2
w2 / N 2 ]
 [( j  N 1) 2 w 2 / N 2 ]
(1  j 
(
N
)
2
N
1  j  N )
2
(9)
(10)
• w는 사용자가 입력할수 있는 peak width parameter로 8일때 가장 좋은 결과가 나오
기 때문에 default는 8
• w = 8 일때 mass domain에 있는 peak은 peak center로 부터 ±4 grid points의 차
이를 가짐
12
Sample calculations
• 논문에서 제안한 new method가 improved polynomial method
(Kubinyi)보다 개선된점
• Small molecules에 대해서는 계산속도가 느리지만 large molecules에 대해서
는
계산속도가 빠름
• Intensity 계산이 정확함
• Peak들이 정확한 질량 위치에 놓임
• Peak profile을 이용해 실험기구와 match되는 것을 만들수 있고 ultrahigh
resolution 계산으로 확장가능
• Figure 1은 large biomolecule에 대한 isotope distribution의 계산시간이
new method는 0.99초 improved polynomial method는 359초 걸리고,
improved polynomial method에서 miss된 범위가 new method에서는 나타
남을 보여줌
13
Sample calculations
• Isotope distribution 계산의 quality는 두가지 기준에 의해 판단
• Compound의 평균분자량을 얼마나 잘 예측하는가
• Isotope distribution의 표준편차를 얼마나 잘 예측하는가
• 평균분자량과 표준편차를 구하는 방법
• 분자식과 elemental isotopic abundance로 직접 구할수 있다.
• The moments of the calculated molecular isotope distribution.
•
두가지 방법에 의해 나온 값들이 같다면, 계산된
isotope distribution은 정확함
14
Sample calculations
• Large DNA oligomer에 대해 improved polynomial method
의 문제점
• Distribution에서 모든 peak들을 record하지 않음
• Some of Peaks은 wrong intensity value를 갖는다.
• Pruning threshold가 적용되서 스펙트라의 distortion 발생
• Metal cluster compound들은 isotope distribution 알고리즘을
적용하는데 어려움이 있다.
• 왜냐하면 metallic element들은 isotope-rich 하기 때문
• Figure 2는 Hg10에 대한 isotope distribution의 계산시간이 new method는
0.6초
improved polynomial method는 20초 이내에 소요된다. 또한 improved
polynomial method에서는 중요한 peak들이 miss됨을 보여줌
15
Sample calculations
• New method와 improved polynomial method는 isotopically
simpler molecules에 대해서는 거의 같은 성능을 보임
• Figure 3은 simpler compound에 대해 isotope distribution의 계산시간이
new method는 0.88초 improved polynomial method는 8초가 소요된다.
반면에 improved polynomial method는 miss된 peak들이 상대적으로 작다.
16
Discussion
• Improved polynomial method에서 pruning strategy
• Large molecule에서 pruning은 계산의 효율성때문에 필요하지만 신뢰성의 문
제가 있음, 하지만 pruning을 하지 않는다면 peak list의 크기가 증가하고 계산시
간이 오래걸림
• New method
• pruning strategy를 사용하지 않기 때문에 Improved polynomial method
에 비해 빠르고 정확함
17
Discussion
• New method의 계산시간은 두부분으로 나눌수 있다.
• μ domain data array를 만드는데 필요한 시간
• Data를 transform하는데 필요한 시간
•
•
•
이러한 시간은 data array에 있는
존재하는
분리된 isotope의 수와 비례한다.
이 논문에서는
data array에 있는 point들의 수를 constant로 가정한다.
따라서, (CN2)와 S2는 계산시간이 거의같다.
• 왜냐하면 각각 4개의
•
point들의 수와 비례하고, 현재 compound에
또한,
isotope가 존재하기 때문에
Hg1과 Hg100은 계산시간이 똑같다.
• 왜냐하면 각각 7개의
isotope가 존재하기 때문에
18
Discussion
• 각 isotope peak의 넓이는 분자량의 제곱근에 비례헤서 증가하고,
계산의 대한 resolution 또한 분자량의 제곱근에 비례헤서 증가한다.
• 즉, distribution의 대한 표준편차는 분자량의 제곱근에 비례하고,
질량의 범위는 분자량의 제곱근에 비례해서 증가한다.
19
Conclusions
• Isotope distribution을 계산하기 위한 new method
•
•
•
•
Fourier transform을 사용해 개발
Large molecule에 대해서 improved polynomial method보다 빠르고 정확
계산적 노력과 메모리의 요구는 상대적으로 적음
배열의 크기를 늘리면
ultrahigh resolution 계산이 가능
20

Isotope distribution을 계산하기 위한 new method

Transcript Isotope distribution을 계산하기 위한 new method

Directory