富士通实习总结

Transcript 富士通实习总结

北京富士通研发中心实习报告
邱诚
报告主题

富士通的工作

Auto-Regressive and Moving Average Model
(ARMA)介绍

RHadoop介绍
富士通的工作

研究数据选择方式；





TBSC
均值法
指示性片段
优化ARMA模型和SVR模型；
动态结合ARMA模型和SVR模型；
均值法描述

基本步骤





查找与预测天1~9点的欧式距离最接近的五天；
将所得到的五天通过10~20点的欧式距离进行展；
将前两步得到的全部天通过k-means聚成两类；
挑选预测天之前最接近的同一工作日作为判定天，
和两个聚类中心计算欧式距离，挑选距离较小的聚
类；
将所得聚类中的各天求平均值作为预测结果。
ARMA模型介绍

ARMA模型原理

ARMA模型优化

R中ARMA模型的使用
ARMA基本原理
Auto-Regressive model
Moving Average model
100
90
80
70
60
50
40
30
20
10
0
X0 X1 X2 X3 X4 X5 X6 X7 X8 X9 X10 X11 X12
ARMA基本原理

自回归模型描述的是当前值与历史值之间的关
系；

滑动平均模型描述的是自回归部分的误差累计；

ARMA模型就是通过将自回归模型的预测值与
累计误差相结合；
ARMA模型的优化

Akaike’s Information Criterion (AIC)

AIC, Bias Corrected (AICc)

Bayesian Information Criterion (BIC)

以上优化都是针对通过最大似然估计进行拟合得到的
ARMA模型
AIC优化指标
：代表最大似然；
：代表模型的参数个数；
R中ARMA模型的使用

arima

auto.arima
arima函数
arima (
)
x,
order = c(0, 0, 0),
seasonal = list(order = c(0, 0, 0), period = NA),
xreg = NULL,
include.mean = TRUE,
transform.pars = TRUE,
fixed = NULL,
init = NULL,
method = c("CSS-ML", "ML", "CSS"),
n.cond,
optim.method = "BFGS",
optim.control = list(),
kappa = 1e6
R中arima参数说明
auto.arima函数
auto.arima(
)
x,
d=NA, D=NA,
max.p=5, max.q=5, max.P=2, max.Q=2, max.order=5,
start.p=2, start.q=2, start.P=1, start.Q=1,
stationary=FALSE,
ic=c("aicc","aic", "bic"),
stepwise=TRUE,
trace=FALSE,
approximation=(length(x)>100 | frequency(x)>12),
xreg=NULL,
test=c("kpss","adf","pp"),
seasonal.test=c("ocsb","ch"),
allowdrift=TRUE, lambda=NULL, parallel=FALSE,
num.cores=NULL
Nowadays, we have lots of data. BIG DATA!
What is R?
What is R?
Why R?
Why R?
What need?


There is a need for more than counts and
averages on these big data sets
Analyzing all of the data can lead to insights
that sampling or subsets can’t reveal
Why R and Hadoop?
Why R and Hadoop?
Why R and Hadoop?
Why R and Hadoop?
RHadoop介绍
Rhadoop用途
The open-source RHadoop project makes it easier
to extract data from Hadoop for analysis with R, and to
run R within the nodes of the Hadoop cluster -essentially, to transform Hadoop into a massively-parallel
statistical computing cluster based on R.
Rhadoop
rhdfs

Manipulate HDFS directly from R

Mimic as much of the HDFS Java API as
possible
rhdfs Functions
rmr



Designed to be the simplest and most
elegant way to write MapReduce programs
Gives the R programmer the tools necessary
to perform data analysis in a way that is “R”
like
Provides an abstraction layer to hide the
implementation details
rmr mapreduce Function
Thank you!

富士通实习总结

Transcript 富士通实习总结

Directory