发布于 2016-06-28 04:51:58 | 244 次阅读 | 评论: 0 | 来源: 网友投递

这里有新鲜出炉的精品教程,程序狗速度看过来!

Apache Spark

Spark是UC Berkeley AMP lab所开源的类Hadoop MapReduce的通用的并行,Spark,拥有Hadoop MapReduce所具有的优点;但不同于MapReduce的是Job中间输出结果可以保存在内存中,从而不再需要读写HDFS,因此Spark能更好地适用于数据挖掘与机器学习等需要迭代的map reduce的算法。


Apache Spark 1.6.2 发布了,

改进日志如下:

    Sub-task

  •  [SPARK-15613] - Incorrect days to millis conversion

  •  [SPARK-15723] - SimpleDateParamSuite test is locale-fragile and relies on deprecated short TZ name

    Bug

  • [SPARK-8428] - TimSort Comparison method violates its general contract with CLUSTER BY

  • [SPARK-10722] - Uncaught exception: RDDBlockId not found in driver-heartbeater

  • [SPARK-11327] - spark-dispatcher doesn't pass along some spark properties

  • [SPARK-11507] - Error thrown when using BlockMatrix.add

  • [SPARK-12655] - GraphX does not unpersist RDDs

  • [SPARK-12712] - test-dependencies.sh script fails when run against empty .m2 cache

  • [SPARK-12941] - Spark-SQL JDBC Oracle dialect fails to map string datatypes to Oracle VARCHAR datatype

  • [SPARK-13023] - Check for presence of 'root' module after computing test_modules, not changed_modules

  • [SPARK-13207] - _SUCCESS should not break partition discovery

  • [SPARK-13227] - Risky apply() in OpenHashMap

  • [SPARK-13242] - Moderately complex `when` expression causes code generation failure

  • [SPARK-13327] - colnames()<- allows invalid column names

  • [SPARK-13352] - BlockFetch does not scale well on large block

  • [SPARK-13444] - QuantileDiscretizer chooses bad splits on large DataFrames

  • [SPARK-13522] - Executor should kill itself when it's unable to heartbeat to the driver more than N times

  • [SPARK-13566] - Deadlock between MemoryStore and BlockManager

  • [SPARK-13622] - Issue creating level db file for YARN shuffle service if URI is used in yarn.nodemanager.local-dirs

  • [SPARK-13631] - getPreferredLocations race condition in spark 1.6.0?

  • [SPARK-13642] - Properly handle signal kill of ApplicationMaster

  • [SPARK-13648] - org.apache.spark.sql.hive.client.VersionsSuite fails NoClassDefFoundError on IBM JDK

  • [SPARK-13652] - TransportClient.sendRpcSync returns wrong results

  •  [SPARK-13697] - TransformFunctionSerializer.loads doesn't restore the function's module name if it's '__main__'

  • [SPARK-13705] - UpdateStateByKey Operation documentation incorrectly refers to StatefulNetworkWordCount

  • [SPARK-13711] - Apache Spark driver stopping JVM when master not available

  • [SPARK-13755] - Escape quotes in SQL plan visualization node labels

  • [SPARK-13772] - DataType mismatch about decimal

  • [SPARK-13803] - Standalone master does not balance cluster-mode drivers across workers

  • [SPARK-13806] - SQL round() produces incorrect results for negative values

  • [SPARK-13845] - BlockStatus and StreamBlockId keep on growing result driver OOM

  • [SPARK-13850] - TimSort Comparison method violates its general contract

  • [SPARK-13901] - We get wrong logdebug information when jump to the next locality level.

  • [SPARK-13958] - Executor OOM due to unbounded growth of pointer array in Sorter

  • [SPARK-14006] - Builds of 1.6 branch fail R  check

  • [SPARK-14074] - Use fixed version of install_github in SparkR build

  • [SPARK-14159] - StringIndexerModel sets output column metadata incorrectly

  • [SPARK-14187] - Incorrect use of binarysearch in SparseMatrix

  • [SPARK-14204] - [SQL] Failure to register URL-derived JDBC driver on executors in cluster mode

  • [SPARK-14219] - Fix `pickRandomVertex` not to fall into infinite loops for graphs with one vertex

  • [SPARK-14232] - Event timeline on job page doesn't show if an executor is removed with multiple line reason

  • [SPARK-14243] - updatedBlockStatuses does not update correctly when removing blocks

  • [SPARK-14261] - Memory leak in Spark Thrift Server

  • [SPARK-14298] - LDA should support disable checkpoint

  • [SPARK-14322] - Use treeAggregate instead of reduce in OnlineLDAOptimizer

  • [SPARK-14357] - Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job failure

  • [SPARK-14363] - Executor OOM due to a memory leak in Sorter

  • [SPARK-14368] - Support python.spark.worker.memory with upper-case unit

  • [SPARK-14454] - Better exception handling while marking tasks as failed

  • [SPARK-14468] - Always enable OutputCommitCoordinator

  • [SPARK-14495] - Distinct aggregation cannot be used in the having clause

  • [SPARK-14563] - SQLTransformer.transformSchema is not implemented correctly

  • [SPARK-14665] - PySpark StopWordsRemover default stopwords are Java object

  • [SPARK-14671] - Pipeline.setStages needs to handle Array non-covariance

  • [SPARK-14679] - UI DAG visualization causes OOM generating data

  • [SPARK-14739] - Vectors.parse doesn't handle dense vectors of size 0 and sparse vectors with no indices

  • [SPARK-14757] - Incorrect behavior of Join operation in Spqrk SQL JOIN : "false" in the left table is joined to "null" on the right table

  • [SPARK-14915] - Tasks that fail due to CommitDeniedException (a side-effect of speculation) can cause job to never complete

  • [SPARK-14965] - StructType throws exception for missing field

  • [SPARK-15165] - Codegen can break because toCommentSafeString is not actually safe

  •  [SPARK-15209] - Web UI's timeline visualizations fails to render if descriptions contain single quotes

  • [SPARK-15260] - UnifiedMemoryManager could be in bad state if any exception happen while evicting blocks

  • [SPARK-15262] - race condition in killing an executor and reregistering an executor

  • [SPARK-15528] - conv function returns inconsistent result for the same data

  • [SPARK-15601] - CircularBuffer's toString() to print only the contents written if buffer isn't full

  • [SPARK-15736] - Gracefully handle loss of DiskStore files

  • [SPARK-15754] - org.apache.spark.deploy.yarn.Client changes the credential of current user

  • [SPARK-15892] - Incorrectly merged AFTAggregator with zero total count

  • [SPARK-15975] - Improper Popen.wait() return code handling in dev/run-tests

  • [SPARK-16017] - YarnClientSchedulerBackend now registers backends as IPs instead of Hostnames which causes all tasks to run with RACK_LOCAL locality.

  • [SPARK-16035] - The SparseVector parser fails checking for valid end parenthesis

  • [SPARK-16086] - Python UDF failed when there is no arguments

  • [SPARK-16173] - Can't join describe() of DataFrame in Scala 2.10

    Documentation

  • [SPARK-14618] - RegressionEvaluator doc out of date

  • [SPARK-15223] - spark.executor.logs.rolling.maxSize wrongly referred to as spark.executor.logs.rolling.size.maxBytes

    Improvement

  • [SPARK-13599] - Groovy-all ends up in spark-assembly if hive profile set

  • [SPARK-13601] - Invoke task failure callbacks before calling outputstream.close()

  • [SPARK-13663] - Upgrade Snappy Java to 1.1.2.1

  • [SPARK-13810] - Add Port Configuration Suggestions on Bind Exceptions

  • [SPARK-14058] - Incorrect docstring in Window.orderBy

  • [SPARK-14107] - PySpark spark.ml GBT algs need seed Param

  • [SPARK-14149] - Log exceptions in tryOrIOException

  • [SPARK-14242] - avoid too many copies in network when a network frame is large

  • [SPARK-14787] - Upgrade Joda-Time library from 2.9 to 2.9.3

  • [SPARK-15205] - Codegen can compile the same source code more than twice

  • [SPARK-15827] - Publish Spark's forked sbt-pom-reader to Maven Central

    New Feature

  • [SPARK-11515] - QuantileDiscretizer should take random seed

  • [SPARK-13465] - Add a task failure listener to TaskContext



历史版本 :
Apache Spark 2.2.0 正式发布,提高可用性和稳定性
Spark 2.0 时代全面到来 —— 2.0.1 版本发布
Apache Spark 2.0.0 发布,APIs 更新
Apache Spark 1.6.2 发布,集群计算环境
Spark 2.0 预览:更简单,更快,更智能
Spark 2.7.6 发布,开源集群计算环境
Apache spark 1.6.1 发布,集群计算环境
Apache Spark 2.0 最快今年4月亮相
Apache Spark 1.6 正式发布,性能大幅度提升
Apache Spark 1.6 预览版:更简便的搜索
Apache Spark 1.5.2 发布,开源集群计算环境
Apache Spark 1.5.1 发布,开源集群计算环境
最新网友评论  共有(0)条评论 发布评论 返回顶部

Copyright © 2007-2017 PHPERZ.COM All Rights Reserved   冀ICP备14009818号  版权声明  广告服务