Scala文件读取
E盘根目录下scalaIO.txt文件内容如下:
文件读取示例代码:
//文件读取 val file=Source.fromFile("E:\\scalaIO.txt") for(line <- file.getLines) { println(line) } file.close
说明1:file=Source.fromFile(“E:\scalaIO.txt”),其中Source中的fromFile()方法源自 import scala.io.Source源码包,源码如下图:
file.getLines(),返回的是一个迭代器-Iterator;源码如下:(scala.io)
Scala 网络资源读取
//网络资源读取 val webFile=Source.fromURL("http://spark.apache.org") webFile.foreach(print) webFile.close()
fromURL()方法源码如下:
/** same as fromURL(new URL(s)) */ def fromURL(s: String)(implicit codec: Codec): BufferedSource = fromURL(new URL(s))(codec)
读取的网络资源资源内容如下:
<!DOCTYPE html> <html lang="en"> <head> <meta charset="utf-8"> <meta http-equiv="X-UA-Compatible" content="IE=edge"> <meta name="viewport" content="width=device-width, initial-scale=1.0"> <title> Apache Spark™ - Lightning-Fast Cluster Computing </title> <meta name="description" content="Apache Spark is a fast and general engine for big data processing, with built-in modules for streaming, SQL, machine learning and graph processing."> <!-- Bootstrap core CSS --> <link href="/css/cerulean.min.css" rel="external nofollow" rel="stylesheet"> <link href="/css/custom.css" rel="external nofollow" rel="stylesheet"> <script type="text/javascript"> <!-- Google Analytics initialization --> var _gaq = _gaq || []; _gaq.push(['_setAccount', 'UA-32518208-2']); _gaq.push(['_trackPageview']); (function() { var ga = document.createElement('script'); ga.type = 'text/javascript'; ga.async = true; ga.src = ('https:' == document.location.protocol ? 'https://ssl' : 'http://www') + '.google-analytics.com/ga.js'; var s = document.getElementsByTagName('script')[0]; s.parentNode.insertBefore(ga, s); })(); <!-- Adds slight delay to links to allow async reporting --> function trackOutboundLink(link, category, action) { try { _gaq.push(['_trackEvent', category , action]); } catch(err){} setTimeout(function() { document.location.href = link.href; }, 100); } </script> <!-- HTML5 shim and Respond.js IE8 support of HTML5 elements and media queries --> <!--[if lt IE 9]> <script src="https://oss.maxcdn.com/libs/html5shiv/3.7.0/html5shiv.js"></script> <script src="https://oss.maxcdn.com/libs/respond.js/1.3.0/respond.min.js"></script> <![endif]--> </head> <body> <script src="https://code.jquery.com/jquery.js"></script> <script src="//netdna.bootstrapcdn.com/bootstrap/3.0.3/js/bootstrap.min.js"></script> <script src="/js/lang-tabs.js"></script> <script src="/js/downloads.js"></script> <div class="container" style="max-width: 1200px;"> <div class="masthead"> <p class="lead"> <a href="/" rel="external nofollow" > <img src="/images/spark-logo.png" style="height:100px; width:auto; vertical-align: bottom; margin-top: 20px;"></a><span class="tagline"> Lightning-fast cluster computing </span> </p> </div> <nav class="navbar navbar-default" role="navigation"> <!-- Brand and toggle get grouped for better mobile display --> <div class="navbar-header"> <button type="button" class="navbar-toggle" data-toggle="collapse" data-target="#navbar-collapse-1"> <span class="sr-only">Toggle navigation</span> <span class="icon-bar"></span> <span class="icon-bar"></span> <span class="icon-bar"></span> </button> </div> <!-- Collect the nav links, forms, and other content for toggling --> <div class="collapse navbar-collapse" id="navbar-collapse-1"> <ul class="nav navbar-nav"> <li><a href="/downloads.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >Download</a></li> <li class="dropdown"> <a href="#" rel="external nofollow" rel="external nofollow" class="dropdown-toggle" data-toggle="dropdown"> Libraries <b class="caret"></b> </a> <ul class="dropdown-menu"> <li><a href="/sql/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >SQL and DataFrames</a></li> <li><a href="/streaming/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >Spark Streaming</a></li> <li><a href="/mllib/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >MLlib (machine learning)</a></li> <li><a href="/graphx/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >GraphX (graph)</a></li> <li class="divider"></li> <li><a href="http://spark-packages.org" rel="external nofollow" rel="external nofollow" >Third-Party Packages</a></li> </ul> </li> <li class="dropdown"> <a href="#" rel="external nofollow" rel="external nofollow" class="dropdown-toggle" data-toggle="dropdown"> Documentation <b class="caret"></b> </a> <ul class="dropdown-menu"> <li><a href="/docs/latest/" rel="external nofollow" >Latest Release (Spark 1.5.1)</a></li> <li><a href="/documentation.html" rel="external nofollow" >Other Resources</a></li> </ul> </li> <li><a href="/examples.html" rel="external nofollow" >Examples</a></li> <li class="dropdown"> <a href="/community.html" rel="external nofollow" rel="external nofollow" class="dropdown-toggle" data-toggle="dropdown"> Community <b class="caret"></b> </a> <ul class="dropdown-menu"> <li><a href="/community.html" rel="external nofollow" rel="external nofollow" >Mailing Lists</a></li> <li><a href="/community.html#events" rel="external nofollow" >Events and Meetups</a></li> <li><a href="/community.html#history" rel="external nofollow" >Project History</a></li> <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark" rel="external nofollow" rel="external nofollow" >Powered By</a></li> <li><a href="https://cwiki.apache.org/confluence/display/SPARK/Committers" rel="external nofollow" rel="external nofollow" >Project Committers</a></li> <li><a href="https://issues.apache.org/jira/browse/SPARK" rel="external nofollow" rel="external nofollow" >Issue Tracker</a></li> </ul> </li> <li><a href="/faq.html" rel="external nofollow" >FAQ</a></li> </ul> </div> <!-- /.navbar-collapse --> </nav> <div class="row"> <div class="col-md-3 col-md-push-9"> <div class="news" style="margin-bottom: 20px;"> <h5>Latest News</h5> <ul class="list-unstyled"> <li><a href="/news/submit-talks-to-spark-summit-east-2016.html" rel="external nofollow" >Submission is open for Spark Summit East 2016</a> <span class="small">(Oct 14, 2015)</span></li> <li><a href="/news/spark-1-5-1-released.html" rel="external nofollow" >Spark 1.5.1 released</a> <span class="small">(Oct 02, 2015)</span></li> <li><a href="/news/spark-1-5-0-released.html" rel="external nofollow" >Spark 1.5.0 released</a> <span class="small">(Sep 09, 2015)</span></li> <li><a href="/news/spark-summit-europe-agenda-posted.html" rel="external nofollow" >Spark Summit Europe agenda posted</a> <span class="small">(Sep 07, 2015)</span></li> </ul> <p class="small" style="text-align: right;"><a href="/news/index.html" rel="external nofollow" >Archive</a></p> </div> <div class="hidden-xs hidden-sm"> <a href="/downloads.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" class="btn btn-success btn-lg btn-block" style="margin-bottom: 30px;"> Download Spark </a> <p style="font-size: 16px; font-weight: 500; color: #555;"> Built-in Libraries: </p> <ul class="list-none"> <li><a href="/sql/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >SQL and DataFrames</a></li> <li><a href="/streaming/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >Spark Streaming</a></li> <li><a href="/mllib/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >MLlib (machine learning)</a></li> <li><a href="/graphx/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >GraphX (graph)</a></li> </ul> <a href="http://spark-packages.org" rel="external nofollow" rel="external nofollow" >Third-Party Packages</a> </div> </div> <div class="col-md-9 col-md-pull-3"> <div class="jumbotron"> <b>Apache Spark™</b> is a fast and general engine for large-scale data processing. </div> <div class="row row-padded"> <div class="col-md-7 col-sm-7"> <h2>Speed</h2> <p class="lead"> Run programs up to 100x faster than Hadoop MapReduce in memory, or 10x faster on disk. </p> <p> Spark has an advanced DAG execution engine that supports cyclic data flow and in-memory computing. </p> </div> <div class="col-md-5 col-sm-5 col-padded-top col-center"> <div style="width: 100%; max-width: 272px; display: inline-block; text-align: center;"> <img src="/images/logistic-regression.png" style="width: 100%; max-width: 250px;" /> <div class="caption" style="min-width: 272px;">Logistic regression in Hadoop and Spark</div> </div> </div> </div> <div class="row row-padded"> <div class="col-md-7 col-sm-7"> <h2>Ease of Use</h2> <p class="lead"> Write applications quickly in Java, Scala, Python, R. </p> <p> Spark offers over 80 high-level operators that make it easy to build parallel apps. And you can use it <em>interactively</em> from the Scala, Python and R shells. </p> </div> <div class="col-md-5 col-sm-5 col-padded-top col-center"> <div style="text-align: left; display: inline-block;"> <div class="code"> text_file = spark.textFile(<span class="string">"hdfs://..."</span>)<br /> <br /> text_file.<span class="sparkop">flatMap</span>(<span class="closure">lambda line: line.split()</span>)<br /> .<span class="sparkop">map</span>(<span class="closure">lambda word: (word, 1)</span>)<br /> .<span class="sparkop">reduceByKey</span>(<span class="closure">lambda a, b: a+b</span>) </div> <div class="caption">Word count in Spark's Python API</div> </div> <!-- <div class="code" style="margin-top: 20px; text-align: left; display: inline-block;"> text_file = spark.textFile(<span class="string">"hdfs://..."</span>)<br/> <br/> text_file.<span class="sparkop">filter</span>(<span class="closure">lambda line: "ERROR" in line</span>)<br/> .<span class="sparkop">count</span>() </div> --> <!--<div class="caption">Word count in Spark</div>--> </div> </div> <div class="row row-padded"> <div class="col-md-7 col-sm-7"> <h2>Generality</h2> <p class="lead"> Combine SQL, streaming, and complex analytics. </p> <p> Spark powers a stack of libraries including <a href="/sql/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >SQL and DataFrames</a>, <a href="/mllib/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >MLlib</a> for machine learning, <a href="/graphx/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >GraphX</a>, and <a href="/streaming/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >Spark Streaming</a>. You can combine these libraries seamlessly in the same application. </p> </div> <div class="col-md-5 col-sm-5 col-padded-top col-center"> <img src="/images/spark-stack.png" style="margin-top: 15px; width: 100%; max-width: 296px;" usemap="#stack-map" /> <map name="stack-map"> <area shape="rect" coords="0,0,74,95" href="/sql/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" alt="Spark SQL" title="Spark SQL" /> <area shape="rect" coords="74,0,150,95" href="/streaming/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" alt="Spark Streaming" title="Spark Streaming" /> <area shape="rect" coords="150,0,224,95" href="/mllib/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" alt="MLlib (machine learning)" title="MLlib" /> <area shape="rect" coords="225,0,300,95" href="/graphx/" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" alt="GraphX" title="GraphX" /> </map> </div> </div> <div class="row row-padded" style="margin-bottom: 15px;"> <div class="col-md-7 col-sm-7"> <h2>Runs Everywhere</h2> <p class="lead"> Spark runs on Hadoop, Mesos, standalone, or in the cloud. It can access diverse data sources including HDFS, Cassandra, HBase, and S3. </p> <p> You can run Spark using its <a href="/docs/latest/spark-standalone.html" rel="external nofollow" >standalone cluster mode</a>, on <a href="/docs/latest/ec2-scripts.html" rel="external nofollow" >EC2</a>, on Hadoop YARN, or on <a href="http://mesos.apache.org" rel="external nofollow" >Apache Mesos</a>. Access data in <a href="http://hadoop.apache.org/docs/stable/hadoop-project-dist/hadoop-hdfs/HdfsUserGuide.html" rel="external nofollow" >HDFS</a>, <a href="http://cassandra.apache.org" rel="external nofollow" >Cassandra</a>, <a href="http://hbase.apache.org" rel="external nofollow" >HBase</a>, <a href="http://hive.apache.org" rel="external nofollow" >Hive</a>, <a href="http://tachyon-project.org" rel="external nofollow" >Tachyon</a>, and any Hadoop data source. </p> </div> <div class="col-md-5 col-sm-5 col-padded-top col-center"> <img src="/images/spark-runs-everywhere.png" style="width: 100%; max-width: 280px;" /> </div> </div> </div> </div> <div class="row"> <div class="col-md-4 col-padded"> <h3>Community</h3> <p> Spark is used at a wide range of organizations to process large datasets. You can find example use cases at the <a href="http://spark-summit.org/summit-2013/" rel="external nofollow" >Spark Summit</a> conference, or on the <a href="https://cwiki.apache.org/confluence/display/SPARK/Powered+By+Spark" rel="external nofollow" rel="external nofollow" >Powered By</a> page. </p> <p> There are many ways to reach the community: </p> <ul class="list-narrow"> <li>Use the <a href="/community.html#mailing-lists" rel="external nofollow" >mailing lists</a> to ask questions.</li> <li>In-person events include the <a href="http://www.meetup.com/spark-users/" rel="external nofollow" >Bay Area Spark meetup</a> and <a href="http://spark-summit.org/" rel="external nofollow" >Spark Summit</a>.</li> <li>We use <a href="https://issues.apache.org/jira/browse/SPARK" rel="external nofollow" rel="external nofollow" >JIRA</a> for issue tracking.</li> </ul> </div> <div class="col-md-4 col-padded"> <h3>Contributors</h3> <p> Apache Spark is built by a wide set of developers from over 200 companies. Since 2009, more than 800 developers have contributed to Spark! </p> <p> The project's <a href="https://cwiki.apache.org/confluence/display/SPARK/Committers" rel="external nofollow" rel="external nofollow" >committers</a> come from 16 organizations. </p> <p> If you'd like to participate in Spark, or contribute to the libraries on top of it, learn <a href="https://cwiki.apache.org/confluence/display/SPARK/Contributing+to+Spark" rel="external nofollow" >how to contribute</a>. </p> </div> <div class="col-md-4 col-padded"> <h3>Getting Started</h3> <p>Learning Spark is easy whether you come from a Java or Python background:</p> <ul class="list-narrow"> <li><a href="/downloads.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" >Download</a> the latest release — you can run Spark locally on your laptop.</li> <li>Read the <a href="/docs/latest/quick-start.html" rel="external nofollow" >quick start guide</a>.</li> <li> Spark Summit 2014 contained free <a href="http://spark-summit.org/2014/training" rel="external nofollow" >training videos and exercises</a>. </li> <li>Learn how to <a href="/docs/latest/#launching-on-a-cluster" rel="external nofollow" >deploy</a> Spark on a cluster.</li> </ul> </div> </div> <div class="row"> <div class="col-sm-12 col-center"> <a href="/downloads.html" rel="external nofollow" rel="external nofollow" rel="external nofollow" rel="external nofollow" class="btn btn-success btn-lg" style="width: 262px;">Download Spark</a> </div> </div> <footer class="small"> <hr> Apache Spark, Spark, Apache, and the Spark logo are trademarks of <a href="http://www.apache.org" rel="external nofollow" >The Apache Software Foundation</a>. </footer> </div> </body> </html> Process finished with exit code 0
//网络资源读取 val webFile=Source.fromURL("http://www.baidu.com/") webFile.foreach(print) webFile.close()
读取中文资源站点,出现编码混乱问题如下:(解决办法自行解决,本文不是重点)
Exception in thread "main" java.nio.charset.MalformedInputException: Input length = 1
以上就是本文的全部内容,希望对大家的学习有所帮助,也希望大家多多支持好代码网。