<?xml version="1.0" encoding="UTF-8"?><!-- generator="wordpress.com" -->
<rss version="2.0"
	xmlns:content="http://purl.org/rss/1.0/modules/content/"
	xmlns:wfw="http://wellformedweb.org/CommentAPI/"
	xmlns:dc="http://purl.org/dc/elements/1.1/"
	>

<channel>
	<title>clustering &amp;laquo; WordPress.com Tag Feed</title>
	<link>http://wordpress.com/tag/clustering/</link>
	<description>Feed of posts on WordPress.com tagged "clustering"</description>
	<pubDate>Fri, 05 Sep 2008 05:55:51 +0000</pubDate>

	<generator>http://wordpress.com/tags/</generator>
	<language>en</language>

<item>
<title><![CDATA[Uncover the hood of JEE Clustering]]></title>
<link>http://hailunyan.wordpress.com/?p=43</link>
<pubDate>Thu, 04 Sep 2008 03:12:54 +0000</pubDate>
<dc:creator>allany</dc:creator>
<guid>http://hailunyan.wordpress.com/?p=43</guid>
<description><![CDATA[1. Uncover the hood of J2EE Clustering
2. Scaling Your Java EE Applications
]]></description>
<content:encoded><![CDATA[<p>1. <a href="http://www.theserverside.com/tt/articles/article.tss?l=J2EEClustering">Uncover the hood of J2EE Clustering</a></p>
<p>2. <a href="http://www.theserverside.com/tt/articles/article.tss?l=ScalingYourJavaEEApplications">Scaling Your Java EE Applications</a></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[New Tool]]></title>
<link>http://bklynwriter.wordpress.com/?p=16</link>
<pubDate>Wed, 03 Sep 2008 04:03:40 +0000</pubDate>
<dc:creator>bklynwriter</dc:creator>
<guid>http://bklynwriter.wordpress.com/?p=16</guid>
<description><![CDATA[Clustering. Mind-mapping. Brainstorming.
Whatever you want to call it, I LOVE it. I did my first clu]]></description>
<content:encoded><![CDATA[<p>Clustering. Mind-mapping. Brainstorming.</p>
<p>Whatever you want to call it, I LOVE it. I did my first clustering exercise today, and even though my inner editor tried to slow me down, I think it was a success.  I think that the more I become comfortable with this tool, the better it will be.</p>
<p>I cannot believe I'd never used this tool before. I'd certainly heard of it, but somehow I didn't make the connection as to how it could work for me.  This will allow me to be free-er in my writing.  I am still working on the concept of the infamous <em>shitty first draft</em>.  I had this idea in my head that the document that I labeled "first draft", had to ultimately resemble a publishable manuscript.  I took the long scenic route to the realization.  A "first draft" can range anywhere from a bunch of notecards wrapped with a rubber band, to a 200K opus.  It is ground zero.  It is the starting line. It is the story translated from head to paper.  And once it is on paper, you the writer, shape and mold it until it ready for you to let it go out into the world.</p>
<p><a href="http://bklynwriter.files.wordpress.com/2008/09/deb-signature.png"><img class="alignnone size-full wp-image-22" src="http://bklynwriter.wordpress.com/files/2008/09/deb-signature.png" alt="" width="68" height="40" /></a></p>
<p><img src="/Users/Deb/AppData/Local/Temp/moz-screenshot.jpg" alt="" /></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Meanshift Clustering]]></title>
<link>http://gebaar.wordpress.com/?p=22</link>
<pubDate>Tue, 26 Aug 2008 00:29:02 +0000</pubDate>
<dc:creator>dhiaurrahman</dc:creator>
<guid>http://gebaar.wordpress.com/?p=22</guid>
<description><![CDATA[I have written the meanshift algorithm in C++ (can be accessed in here) as a &#8220;copy&#8221; of M]]></description>
<content:encoded><![CDATA[<p>I have written the meanshift algorithm in C++ (can be accessed in <a href="http://image.chonnam.ac.kr/scraps/data/mean_shift.zip">here</a>) as a "copy" of Matlab code published in <a href="http://www.mathworks.com/matlabcentral/fileexchange/loadFile.do?objectId=10161&#38;objectType=file">here</a>. </p>
<p>Well, the code is not clean. </p>
<p>To test the performance, I fixed the index generation in both systems. (the modified Matlab script is available in <a href="http://image.chonnam.ac.kr/scraps/data/mean_shift_matlab.zip">here </a>- loading the data from a .mat file) </p>
<p>If you have any questions, please don't hesitate to contact me. I'm happy to have discussion with anybody! :D</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Free Dell Server and Storage Stencils, including EMC]]></title>
<link>http://thebackroomtech.wordpress.com/?p=625</link>
<pubDate>Thu, 14 Aug 2008 15:55:29 +0000</pubDate>
<dc:creator>Julie</dc:creator>
<guid>http://thebackroomtech.wordpress.com/?p=625</guid>
<description><![CDATA[I&#8217;m working on a Visio drawing of our proposed Groupwise upgrade environment, and needed some ]]></description>
<content:encoded><![CDATA[<p>I'm working on a Visio drawing of our proposed Groupwise upgrade environment, and needed some better stencils to represents servers and SAN storage than the ones that come with Visio 2003.</p>
<p>I found some very nice stencils at <a href="http://www.visiocafe.com/" target="_blank">visiocafe.com</a> for <a href="http://www.visiocafe.com/dell.htm" target="_blank">Dell servers and storage</a> and <a href="http://www.visiocafe.com/emc.htm" target="_blank">EMC storage</a>, including Dell branded EMC storage and EqualLogic.</p>
<p>I can use these along with my <a href="http://www.novell.com/communities/node/5784/novell-visio-stencils-groupwiseclusteringedirectory" target="_blank">eDirectory, clustering and Groupwise stencils</a> to detail everything the administrators will need to know to build my design.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Putting the Olympics on a different map]]></title>
<link>http://protomat.wordpress.com/?p=6</link>
<pubDate>Thu, 14 Aug 2008 04:52:19 +0000</pubDate>
<dc:creator>tcubed1</dc:creator>
<guid>http://protomat.wordpress.com/?p=6</guid>
<description><![CDATA[
Mike Katz put out the challenge to get in the Olympic MATLAB spirit and analyze some of the Olympic]]></description>
<content:encoded><![CDATA[<div class="content">
<p>Mike Katz put out the challenge to get in the Olympic MATLAB spirit and analyze some of the Olympic data (<a href="http://blogs.mathworks.com/desktop/">http://blogs.mathworks.com/desktop/</a>) using a medal statistics site as a starting point (<a href="http://www.nbcolympics.com/medals/2008standings/index.html">http://www.nbcolympics.com/medals/2008standings/index.html</a>) and the urlread function.</p>
<p>While I won't address analyzing or predicting the results, per se, this is a fun problem to look at some visualization techniques at visualizing high-dimensional data.   I'll be the first to admit for 3 variables (gold, silver, and bronze medals), there are simpler visualizations (hundreds!), but my purpose here is to introduce the technique with the ability to know what it should like from the data.</p>
<p>Ted</p>
<h3>Contents</h3>
<div>
<ul>
<li><a href="#1">Reading current Olympic Results</a></li>
<li><a href="#2">Sammon mapping</a></li>
<li><a href="#3">Self-Organizing Maps</a></li>
<li><a href="#4">Gold Medals</a></li>
<li><a href="#5">Silver Medals</a></li>
<li><a href="#6">Bronze Medals</a></li>
<li><a href="#7">Summary</a></li>
</ul>
</div>
<h3>Reading current Olympic Results<a name="1"></a></h3>
<p>First, let's read the HTML page, from which we'll extract our data.</p>
<pre class="codeinput"><span style="color:#000080;">s=urlread(<span class="string">'http://www.nbcolympics.com/medals/2008standings/index.html'</span>);

<span class="comment">% We'll parse the data table in HTML page according to the &#60;tr&#62; and &#60;td&#62;</span>
<span class="comment">% elements, careful to strip out the anchor tags.</span>
k1=findstr(s,<span class="string">'&#60;tbody&#62;'</span>);
k2=findstr(s,<span class="string">'&#60;/tbody&#62;'</span>);
s0=s(k1:k2);
rowstart=findstr(s0,<span class="string">'&#60;tr'</span>);rowend=[rowstart(2:end)-1 length(s0)];

n=length(rowstart);country=cell(n,1);medals=zeros(n,3);
<span class="comment">% loop through each country (i.e. row)</span>
<span class="keyword">for</span> i=1:length(rowstart),
   tmprow=s0(rowstart(i):rowend(i));
   datstart=findstr(tmprow,<span class="string">'&#60;td'</span>);datend=[datstart(2:end)-1 length(tmprow)];
   <span class="comment">% loop through the elements of interest in each row (i.e.</span>
   <span class="comment">% country,gold,silver, bronze)</span>
   <span class="keyword">for</span> j=[2 4 5 6],
      tmpdat=tmprow(datstart(j):datend(j));
      c=regexp(tmpdat,<span class="string">'[^&#60;&#62;]*'</span>,<span class="string">'match'</span>);
      <span class="keyword">switch</span>(j),
         <span class="keyword">case</span> 2, country{i}=c{3};
         <span class="keyword">case</span> 4, medals(i,1)=str2double(c{2});  <span class="comment">% gold medal</span>
         <span class="keyword">case</span> 5, medals(i,2)=str2double(c{2});  <span class="comment">% silver medal</span>
         <span class="keyword">case</span> 6, medals(i,3)=str2double(c{2});  <span class="comment">% bronze medal</span>
      <span class="keyword">end</span>;
   <span class="keyword">end</span>;
<span class="keyword">end</span>;</span></pre>
<h3>Sammon mapping<a name="2"></a></h3>
<p>One way to visualize the data is to perform a Sammon projection into two projections. This is similar to principal component analysis, where we've created a mapping of our three variables that is represented in a "best" sense in two dimensions. The axes don't have a particularly useful definition, other than they represent the dimensions in this projection space. The USA and China are pretty far away from everyone else, and fairly distanced from each other as well, because of their differences in the medals they've won (China has more gold, USA has more silver at the time of writing).</p>
<pre class="codeinput"><span style="color:#000080;">p=sammon(medals,2);
plot(p(:,1),p(:,2),<span class="string">'b.'</span>);
text(p(:,1),p(:,2),country);</span></pre>
<pre class="codeoutput">computing mutual distances
iterating</pre>
<p> </p>
<div class="content"><a href="http://protomat.wordpress.com/files/2008/08/anlz_01.png"><img class="alignnone size-full wp-image-8" src="http://protomat.wordpress.com/files/2008/08/anlz_01.png" alt="" width="460" height="345" /></a></div>
<h3>Self-Organizing Maps<a name="3"></a></h3>
<p>A really interesting way to view some data is using a dimensionality reduction technique called Self-Organizing Maps (SOM) or similarity maps. SOM is a neural network technique where nodes are arranged in a 2-D lattice. Whereas principle component analysis (PCA) finds the best hyperplane through the data, think of a SOM as an elastic sheet that spreads and stretches over<br />
the data during learning. The SOM tends to have many nodes where there is a lot of data, and few where it is sparse (or non-existent).</p>
<p>Without going into detail at the moment (and others have done far better), we can use the medal information to create a low-dimensional representation of "how similar" each of the countries are with respect to each other using a distance metric (e.g. the Euclidean norm, by default). Unlike the Sammon mapping, which is a projection, SOM is a clustering technique, where countries are classified as belonging to different unit cells.</p>
<p>The Computer Science department at the University of Helsinki has a SOM Toolbox for Matlab (<a href="http://www.cis.hut.fi/projects/somtoolbox/">http://www.cis.hut.fi/projects/somtoolbox/</a>) that has numerous mapping, clustering, and visualization tools (including the Sammon mapping from earlier). I'm using the SOM Toolbox here in the current demonstration. Of course, Mathworks has there own SOM implementation in their Neural Network toolbox, but I'll leave exploration of those to another time.</p>
<p>The most work we'll do here is creating a customized label matrix for each map cell.</p>
<pre class="codeinput"><span style="color:#000080;"><span class="comment">% create the SOM data structure</span>
sD=som_data_struct(medals,<span class="string">'labels'</span>,country);
sD.comp_names={<span class="string">'Gold'</span>,<span class="string">'Silver'</span>,<span class="string">'Bronze'</span>};
<span class="comment">% make the SOM</span>
smap=som_make(sD,<span class="string">'name'</span>,<span class="string">'Olympic Medals'</span>,<span class="string">'msize'</span>,[8 8],<span class="string">'tracking'</span>,0);

<span class="comment">% use observation labels</span>
maxlbl=5;

<span class="comment">% create the observation labels, [n_mapunits,n_rows_per_label]</span>
maplen=length(smap.labels);
maplbl=cell(maplen,maxlbl);[maplbl{:}]=deal(<span class="string">''</span>);
<span class="comment">% get total number of medals, the unit counts (hits), and the best matching</span>
<span class="comment">% units (bmus) for each country.</span>
nummedals=sum(medals,2);
hits=som_hits(smap,sD);bmus=som_bmus(smap,sD);
<span class="keyword">for</span> i=1:maplen,
   idx=find(bmus==i);
   <span class="keyword">if</span>(isempty(idx)),<span class="keyword">continue</span>;<span class="keyword">end</span>;
   <span class="comment">% sort the countries by the number of medals</span>
   [sv,si]=sort(nummedals(idx),<span class="string">'descend'</span>);
   <span class="comment">% only include up to the maximum number of labels per cell</span>
   k=si(1:min(length(si),maxlbl));

   <span class="comment">% put the country and total number of medals into the label matrix</span>
   <span class="keyword">for</span> j=1:length(k),
      v=country{idx(k(j))};
      <span class="keyword">if</span>(ischar(v)),maplbl{i,j}=[char(v) <span class="string">' ('</span> num2str(nummedals(idx(k(j)))) <span class="string">')'</span>];<span class="keyword">end</span>;
   <span class="keyword">end</span>;
<span class="keyword">end</span>;
smap.labels=maplbl;

<span class="comment">% whew!  OK, let's start showing some maps!</span></span></pre>
<h3>Gold Medals<a name="4"></a></h3>
<p>The "location" on the map for each country remains the same, but the color coding of the particular component "plane" gives you a visual indication of that slice of the data. Here, China and the USA are cleary leading (at the time of writing) in Gold Medals. The color scale is a measure of the cluster component value; here, the number of gold medals of the cluster prototype vector. It may not represent the actual number very well, but hey we're mapping similarity here!</p>
<pre class="codeinput"><span style="color:#000080;">smap.name=<span class="string">'Gold Medals'</span>;
som_show(smap,<span class="string">'comp'</span>,1);
h1=som_show_add(<span class="string">'label'</span>,smap,<span class="string">'TextSize'</span>,8,<span class="string">'TextColor'</span>,[0 0 0]);</span>

<a href="http://protomat.wordpress.com/files/2008/08/anlz_02.png"><img class="alignnone size-full wp-image-15" src="http://protomat.wordpress.com/files/2008/08/anlz_02.png" alt="" width="460" height="345" /></a></pre>
</div>
<div class="content">
<h3>Silver Medals<a name="5"></a></h3>
<p>Similarly, we see the Silver Medal slice shows a different look to the data, where countries who are "similiar" in their Silver Medal performance are in the same bands of color.</p>
<pre class="codeinput"><span style="color:#000080;">smap.name=<span class="string">'Silver Medals'</span>;
som_show(smap,<span class="string">'comp'</span>,2);
h1=som_show_add(<span class="string">'label'</span>,smap,<span class="string">'TextSize'</span>,8,<span class="string">'TextColor'</span>,[0 0 0]);</span>

<a href="http://protomat.wordpress.com/files/2008/08/anlz_03.png"><img class="alignnone size-full wp-image-16" src="http://protomat.wordpress.com/files/2008/08/anlz_03.png" alt="" width="460" height="345" /></a></pre>
<p> </p>
<h3>Bronze Medals<a name="6"></a></h3>
<p>And again, the bronze medals are shown here, with the USA leading (At the time of writing).</p>
<pre class="codeinput"><span style="color:#000080;">smap.name=<span class="string">'Bronze Medals'</span>;
som_show(smap,<span class="string">'comp'</span>,3);
h1=som_show_add(<span class="string">'label'</span>,smap,<span class="string">'TextSize'</span>,8,<span class="string">'TextColor'</span>,[0 0 0]);</span>

<a href="http://protomat.wordpress.com/files/2008/08/anlz_04.png"><img class="alignnone size-full wp-image-14" src="http://protomat.wordpress.com/files/2008/08/anlz_04.png" alt="" width="460" height="345" /></a></pre>
<p> </p>
<h3>Summary<a name="7"></a></h3>
<p>Well, there you go. Not a lot of explanation here today, but hopefully we've introduced some cool ways to visualize high dimensional data. Check out the SOM Toolbox.</p>
<p>Kudos to Mike for suggesting the problem and providing a URL to scrape!</p>
<p>Isn't this fun? ;)</p>
<p>Ted</p>
<p class="footer">Published with MATLAB® 7.1</p>
<p class="footer">P.S. This was not fun formatting for wordpress (or blogger), I may have to find a new home for these posts.</p>
</div>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Oracle dejara de soportar Raw a partir de 12g]]></title>
<link>http://lscheng.wordpress.com/?p=45</link>
<pubDate>Sun, 10 Aug 2008 12:42:21 +0000</pubDate>
<dc:creator>lscheng</dc:creator>
<guid>http://lscheng.wordpress.com/?p=45</guid>
<description><![CDATA[Hace unos dias Oracle anuncio que dejara de dar soporte a los dispositivos raw en todas las platafor]]></description>
<content:encoded><![CDATA[<p>Hace unos dias Oracle anuncio que dejara de dar soporte a los dispositivos raw en todas las plataformas a partir de Oracle 12g, aun falta pero hay que ir preparando porque aun veo gente que monta raw sobre RAC 10gR2, totalmente innecesario.</p>
<p>Que pasara con OCR y Voting Disk? Hasta ahora, incluido 11gR1, o usas raw o usas un Cluster Filesystem (no es muy habitual CFS por el coste excepto OCFS2), la mayoria optan por raw. Pues si se ha anunciado para 12g preparemos para ver algo nuevo en la 11gR2, saldra alguna funcionalidad para remediar este tema.</p>
<p>No se yo.... creo que va a evolucionar bastante ASM en la Release 2 de 11g.....</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Is Personalized Search Easy?]]></title>
<link>http://philipobrien.wordpress.com/?p=29</link>
<pubDate>Thu, 07 Aug 2008 12:54:02 +0000</pubDate>
<dc:creator>philipobrien</dc:creator>
<guid>http://philipobrien.wordpress.com/?p=29</guid>
<description><![CDATA[According to research out of Microsoft presented at 2008 ACM Web Search and Data Mining Conference b]]></description>
<content:encoded><![CDATA[<p>According to <a title="How Hard is Search? With Personalization? With Backoff?" href="http://sifaka.cs.uiuc.edu/~qmei2/pub/wsdm08-entropy.pdf">research out of Microsoft</a> presented at <a title="Wed Search and Data Mining 2008" href="http://www.wsdm2008.org/">2008 ACM Web Search and Data Mining Conference</a> back in February, it is. I read this article a couple days after the conference and found it interesting. Search engine companies have been struggling with personalized search for a couple years now; even the Big 3. <a title="Ken Church Homepage" href="http://research.microsoft.com/users/church/">Church</a> et al., however, suggest that personalized search should be easier because it reduces the space complexity of search to a handful of bits, measured using <a href="http://en.wikipedia.org/wiki/Information_entropy">entropy</a>. So why then have most recent solutions to personalization really been solutions to customization with minimal progress to truly personalized search?</p>
<p>The notion that personalization reduces the complexity of search by reducing the search space is a great one, and one that's true for sure. Depending on how much faith you put into conditional entropy models, stating that personalization helps is good news! Personalized search is "easier" because the search space is smaller. I like this idea as it's intuitively sensible. Search engines have a large set of possibly relevant pages, but for a single web user, only a tiny fraction are every visited. There are hundreds of book stores in any large city, but <em>an individual</em> is only likely to ever visit a few (4 or 5?) of these in a typical year. Predicting this should be easier than predicting "which bookstore will be visited by a customer today", right?</p>
<p>If so, why has so much research and development gone into personalized search with only subtle progress in improving the user's search experience?</p>
<p>It's because although personalized search as a notion is easier than vanilla search (because users are grouped into clusters based on common interests, persona, geography, and so on), the perfect characterization of a user or set of users has thus far eluded us. Clustering users by, for example, their profession works, but is incomplete. I have vastly different interests than my co-workers and this will throw off any current search engine that personalizes based on profession. But on the other extreme, every time we introduce a new dimension into our characterization of a search user, we are over-fitting our model and reducing the set of search results returned.</p>
<p>This is the classic precision-recall trade-off that has been addressed and improved, but not perfected. Personalization can be approached and aided gradually by tweaking the way we model and group users, but significant progress cannot be made until we make significant break-throughs in how we understand users. The question then is not "How hard is personalized search?" but "How hard is user modelling for personalization?"</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[MySQL Clustering on Ubuntu]]></title>
<link>http://bieg.wordpress.com/2008/08/03/mysql-clustering-ubuntu/</link>
<pubDate>Sun, 03 Aug 2008 13:41:22 +0000</pubDate>
<dc:creator>bieg</dc:creator>
<guid>http://bieg.wordpress.com/2008/08/03/mysql-clustering-ubuntu/</guid>
<description><![CDATA[I spent some time getting MySQL clustering working with Ubuntu after reading a guide on Howto Forge.]]></description>
<content:encoded><![CDATA[<p>I spent some time getting MySQL clustering working with Ubuntu after reading a guide on Howto Forge. The guide however went into the details of compiling and installing MySQL from source so I'm creating this to show the steps needed to get it set up on a fresh Ubuntu installation.</p>
<p>For a correct setup you will need 3 machines. The first machine will serve as the management node, and the other two will be storage nodes.</p>
<p>At the time of writing, the current stable version of Ubuntu is 8.04.1 and the MySQL version that is installed is 5.0.51</p>
<p>During the configuration I log onto the machines and use the command</p>
<pre>sudo su -</pre>
<p>to gain permanent root access and saving myself from having to type sudo in front of every command. Use your own discretion.</p>
<h3><span style="color:#000000;">Installing MySQL</span></h3>
<p>Using apt this is straight forward. Just type the following command on all three machines to install MySQL server.</p>
<pre>apt-get install mysql-server</pre>
<p>Once asked to, set the root password to the MySQL database. You'll need to remember this one. Once MySQL server is installed we'll proceed to configure the management node.</p>
<h3>Configuring the Management Node</h3>
<p>Create and edit the file <strong>/etc/mysql/ndb_mgmd.cnf</strong>. Copy and paste the text bellow changing the ip addresses to match your setup as necessary.</p>
<pre>[NDBD DEFAULT]
NoOfReplicas=2
DataMemory=80M    # How much memory to allocate for data storage
IndexMemory=18M   # How much memory to allocate for index storage
# For DataMemory and IndexMemory, we have used the
# default values. Since the "world" database takes up
# only about 500KB, this should be more than enough for
# this example Cluster setup.
[MYSQLD DEFAULT]
[NDB_MGMD DEFAULT]
[TCP DEFAULT]
# Section for the cluster management node
[NDB_MGMD]
# IP address of the management node (this system)
HostName=192.168.1.5

# Section for the storage nodes
[NDBD]
# IP address of the first storage node
HostName=192.168.1.6
DataDir=/var/lib/mysql-cluster
BackupDataDir=/var/lib/mysql-cluster/backup
DataMemory=512M
[NDBD]
# IP address of the second storage node
HostName=192.168.1.7
DataDir=/var/lib/mysql-cluster
BackupDataDir=/var/lib/mysql-cluster/backup
DataMemory=512M

# one [MYSQLD] per storage node
[MYSQLD]
[MYSQLD]</pre>
<h3>Configuring the Storage Nodes</h3>
<p>As you can see in the file we created in the previous step, the cluster will be using <strong>/var/lib/mysql-cluster </strong>on the storage machines. This path is created when you install MySQL server but they are owned by root. We want to create the backup directory and change ownership to mysql.</p>
<pre>mkdir /var/lib/mysql-cluster/backup</pre>
<pre>chown -R mysql:mysql /var/lib/mysql-cluster</pre>
<p>Now we'll need to edit the MySQL configuration so that the storage nodes will communicate with the Management Node.</p>
<p>Edit <strong>/etc/mysql/my.cnf</strong></p>
<p>Search for <strong>[mysqld]</strong> and add the following.</p>
<pre>[mysqld]
<strong>ndbcluster
# IP address of the cluster management node
ndb-connectstring=192.168.1.5</strong></pre>
<p>Then scroll down to the bottom until you see <strong>[MYSQL_CLUSTER]</strong>. Uncomment the line and edit so it looks like</p>
<pre>[MYSQL_CLUSTER]
ndb-connectstring=192.168.1.5</pre>
<p>The reason the connect string it found twice in the mysql file is because one is used by mysql server, and the other is used by the ndb data node app. Save the changes to the file.</p>
<p>Make sure you complete the changes on both data nodes.</p>
<h3>Start the Management Node</h3>
<p>Start the Management Node using</p>
<pre>/etc/init.d/mysql-ndb-mgm restart</pre>
<p>The process shouldn't be running but using restart doesnt hurt. Once it is started we can access the management console using the command <strong>ndb_mgm</strong>. At the prompt type <strong>show;</strong> and you will see</p>
<pre>ndb_mgm&#62; show;
Connected to Management Server at: localhost:1186
Cluster Configuration
---------------------
[ndbd(NDB)]    2 node(s)
id=2 (not connected, accepting connect from 192.168.1.6)
id=3 (not connected, accepting connect from 192.168.1.7)

[ndb_mgmd(MGM)]    1 node(s)
id=1    @192.168.1.5  (Version: 5.0.51)

[mysqld(API)]    2 node(s)
id=4 (not connected, accepting connect from any host)
id=5 (not connected, accepting connect from any host)</pre>
<p>As you can see the management node is waiting for connections from the data nodes.</p>
<h3>Start the Data Nodes</h3>
<p>On the data nodes, issue the commands</p>
<pre>/etc/init.d/mysql restart
/etc/init.d/mysql-ndb restart</pre>
<p>Go back to the management node, type show; again, and now you should see something similar to</p>
<pre>id=2    @192.168.1.6  (Version: 5.0.51, starting, Nodegroup: 0)
id=3    @192.168.1.7  (Version: 5.0.51, starting, Nodegroup: 0)</pre>
<p>Once they have started properly, the show command should display</p>
<pre>ndb_mgm&#62; show;
Cluster Configuration
---------------------
[ndbd(NDB)]    2 node(s)
id=2    @192.168.1.6  (Version: 5.0.51, Nodegroup: 0, Master)
id=3    @192.168.1.7  (Version: 5.0.51, Nodegroup: 0)
[ndb_mgmd(MGM)]    1 node(s)
id=1    @192.168.1.5  (Version: 5.0.51)
[mysqld(API)]    2 node(s)
id=4    @192.168.1.7  (Version: 5.0.51)
id=5    @192.168.1.6  (Version: 5.0.51)</pre>
<p>Congratulations, your cluster is now setup.</p>
<h3>Testing the cluster</h3>
<p>Issue the following on both data nodes to create the test database. Since clustering is done on a table basis in MySQL we have to create the database manually on both data nodes.</p>
<pre>$&#62; mysql -u root -p
Enter password:
Welcome to the MySQL monitor.  Commands end with ; or \g.
Your MySQL connection id is 8
Server version: 5.0.51a-3ubuntu5.1 (Ubuntu)

Type 'help;' or '\h' for help. Type '\c' to clear the buffer.

mysql&#62; create database clustertest;
Query OK, 1 row affected (0.00 sec)</pre>
<p>Once this i done, on ONE of the data nodes, create a test table and add an entry.</p>
<pre>mysql&#62; use clustertest;
Database changed
mysql&#62; create table test (i int) engine=ndbcluster;
Query OK, 0 rows affected (0.71 sec)

mysql&#62; insert into test values (1);
Query OK, 1 row affected (0.05 sec)

mysql&#62; select * from test;
+------+
&#124; i    &#124;
+------+
&#124;    1 &#124;
+------+
1 row in set (0.03 sec)</pre>
<p>We've just created a table test, added a value to this table and made sure that the table contains one entry. Note that <strong>engine=ndbcluster</strong> must be used to let MySQL know that this table should be clustered among the data nodes. Let's make sure that the table is infact created on the other data node, and contains one entry.</p>
<pre>mysql&#62; use clustertest;
Reading table information for completion of table and column names
You can turn off this feature to get a quicker startup with -A

Database changed
mysql&#62; show tables;
+-----------------------+
&#124; Tables_in_clustertest &#124;
+-----------------------+
&#124; test                  &#124;
+-----------------------+
1 row in set (0.01 sec)

mysql&#62; select * from test;
+------+
&#124; i    &#124;
+------+
&#124;    1 &#124;
+------+
1 row in set (0.04 sec)</pre>
<p>As you can see, the cluster is working.</p>
<h3>Moving an existing database to the cluster</h3>
<p>Now that we have the cluster working, we can easily change an existing database to be clustered. All you need to do is run the following command on each of the tables.</p>
<pre>alter table my_test_table engine-ndbcluster;</pre>
<p>The table, and all it's data will be copied to the datanodes and you can now access/change then through any nodes in the cluster. Very simple.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Powershell and Moving CCR Mailbox Server Instances  Powershell Script...]]></title>
<link>http://telnetport25.wordpress.com/2008/07/29/powershell-and-moving-ccr-mailbox-server-instances-powershell-script/</link>
<pubDate>Tue, 29 Jul 2008 15:39:00 +0000</pubDate>
<dc:creator>Andy Grogan</dc:creator>
<guid>http://telnetport25.wordpress.com/2008/07/29/powershell-and-moving-ccr-mailbox-server-instances-powershell-script/</guid>
<description><![CDATA[Over the next few posts I would like to focus a little more on Powershell and some of the really coo]]></description>
<content:encoded><![CDATA[<p>Over the next few posts I would like to focus a little more on Powershell and some of the really cool automation tasks that can be accomplished with it in relation to Exchange 2007.</p>
<p>In this article I would like to present an example  script which is a little similar to the CCR Shutdown Script which I wrote about in this post: <a title="http://telnetport25.wordpress.com/2007/12/15/exchange-2007-ensuring-a-clean-ccr-node-shutdown/" href="http://telnetport25.wordpress.com/2007/12/15/exchange-2007-ensuring-a-clean-ccr-node-shutdown/">http://telnetport25.wordpress.com/2007/12/15/exchange-2007-ensuring-a-clean-ccr-node-shutdown/</a></p>
<p>You might be thinking - if you have already written about it, why re-hash it? - well the existing script in the post above seemed to exhibit some features with some readers of the blog that produced unpredictable results (see the comments on the post), it also required a certain amount of user intervention in order for it to work (e.g you needed to provide the names of your cluster nodes).</p>
<p>Additionally the script that I would like to present in this post is NOT a shutdown script per-se - it is a script that allows for new administrators of Exchange CCR clusters to safely move the Exchange resources between nodes by simply executing a batch file (or running the script via the Exchange Management Shell).</p>
<p><strong>Design goals of the script:</strong></p>
<p>Before I wrote the script that this post is based around I sat down an listed the design goals that I wished to achieve within the code:</p>
<ul>
<li>No static variables - e.g. I did not wish for the administrator to have to manually add the Names of the cluster nodes, or indeed the Exchange cluster as variable into the script - I wished for these to be dynamically detected</li>
<li>The script needed to be able to determine which node within the cluster is running the active instance of Exchange</li>
<li>The script needed to be able to determine the passive instance within the cluster</li>
<li>The script needed to be able to make an choice about which node it should move the clustered instance to - irrespective of which node it is executed on</li>
</ul>
<p><strong>Assumptions:</strong></p>
<p>As well as the design goals I needed to create some assumption boundaries which formed that environmental parameters in which the script would be supported - these were as follows:</p>
<ul>
<li>The script is designed to work on a TWO node CCR cluster</li>
</ul>
<p>The script that I came up with is available for download below</p>
<p><img src="http://domain564941.sites.fasthosts.com/images/Icons/PSIcon.jpg" alt="Script" width="53" height="54" /><a href="http://domain564941.sites.fasthosts.com/powershell/MoveClusteredMailboxServerScript.ps1">MoveClusteredMailboxServerScript.ps1 [4KB]</a></p>
<p>As always when working with Powershell scripts (and indeed scripts that are on the larger side) I recommend that you download and install PowerGUI (<a href="http://www.powergui.org.uk">http://www.powergui.org.uk</a>). PowerGUI contains a brilliant syntax savvy script editor for Powershell which makes reading and debugging code much easier.</p>
<p>In order to execute the script you will need to set the Powershell execution model to "Remote Signed" this can be accomplished by opening the Exchange Management Shell and typing in <strong><em>Set-ExecutionPolicy RemoteSigned</em></strong>. You will also need to ensure that you have downloaded it to at least one node within your cluster.</p>
<p>In order to execute the script within the Exchange Management Shell navigate to the folder where you have downloaded it and then type in the following command: .\MoveClusteredMailboxServerScript.ps1 then press enter - see below:</p>
<p><a href="http://telnetport25.files.wordpress.com/2008/07/image.png"><img style="border-right:0;border-top:0;border-left:0;border-bottom:0;" src="http://telnetport25.files.wordpress.com/2008/07/image-thumb.png" border="0" alt="image" width="483" height="50" align="left" /></a> </p>
<p> </p>
<p><a href="http://telnetport25.files.wordpress.com/2008/07/image1.png"><img style="border-right:0;border-top:0;border-left:0;border-bottom:0;" src="http://telnetport25.files.wordpress.com/2008/07/image-thumb1.png" border="0" alt="image" width="500" height="126" /></a></p>
<p><strong>Key aspect of the script:</strong></p>
<p>This script has been predominately been given to you folks to see what you can accomplish with Powershell and Exchange management and I hope that you will use it and modify it to you own ends, however there are some aspects of the script which I would like to go through in a little more detail.</p>
<ul>
<li><strong>Determining the Exchange Cluster that the node belongs to</strong>In order to complete this task I first needed to find out the name of the local machine (as this would be a node in a particular Exchange cluster) this is pretty straight forward in Powershell and is accomplished using the following:<span style="font-size:x-small;color:#0000ff;font-family:Courier New;"><br />
$CompStat = Get-WmiObject win32_computersystem<br />
$Localhst = $CompStat.Name<br />
</span><br />
Where the <strong>$Localhst </strong>variable contains the <strong>NETBIOS</strong> name of the local machine - in order to determine the name of the Exchange cluster to which the node belongs to is accomplished via the following:</p>
<p><span style="color:#0000ff;font-family:Courier New;"><br />
$Seed = Get-MailboxServer &#124; Where-Object { $_.RedundantMachines -eq $Localhst }<br />
</span><br />
Essentially the above code places the value of the <strong>Get-MailboxServer</strong> command where one of the redundant machines properties of the Mailbox server cmdlet contains the localhost name in the variable <strong>$Seed<br />
</strong></li>
<li><strong>Determining if the local machine is the Active or the Passive Node 
<p><span style="color:#0000ff;font-family:Courier New;"><strong>$MachineQ = Get-ClusteredMailboxServerStatus -identity $Seed.Name &#124; Select OperationalMachines &#124; Where-Object {$_.OperationalMachines -eq "$Localhst &#60;Active, Quorum Owner&#62;"}</strong></span></p>
<p>This performs pretty much the same as above, but also takes into account if the local machine also owns the Quorum.</p>
<p>In use if either the <strong>$Machine </strong>or <strong>$MachineQ </strong>variable is <strong>$NULL </strong>then the local machine is not the Active Node - and therefore in a two node SCR or CCR cluster must be the Active member (unless the cluster is down).</p>
<p><strong><span style="color:#0000ff;font-family:Courier New;">$Machine = Get-ClusteredMailboxServerStatus -identity $Seed.Name &#124; Select OperationalMachines &#124; Where-Object {$_.OperationalMachines -eq "$Localhst &#60;Active&#62;"}<br />
</span><br />
</strong>Essentially I have used the value that I have in the $Seed variable (which should be Mailbox Server name) with the <strong>Get-ClusteredMailboxServerStatus </strong>cmdlet<strong>.</strong></p>
<p></strong>This is the part of the script that I am most proud of, I searched and searched for a simple way to get Powershell to return the NETBIOS name of the Active node within an Exchange cluster - it is not easy therefore I used the<strong> <em>Get-ClusteredMailboxServerStatus</em> cmdlet </strong>which returns many properties, however the one that I was most interested in was the <strong>"OperationalMachines" </strong>value - this give you the status and the role that each node has. If the local machine is represented as <strong>{Machine Name, &#60;Active&#62;} </strong>then it is currently the active node<strong>.</strong>You will also notice that I have another command which looks like the following:</li>
</ul>
<p>I hope that you like to script it is the first of a few that I would like to present to you which might help make managing your Cluster's a little easier. Bear in mind that this script could be modified to do far more than move clustered Exchange instances between nodes - if you come up with any cool ideas let me know.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Document Clustering]]></title>
<link>http://yudiagusta.wordpress.com/?p=157</link>
<pubDate>Thu, 24 Jul 2008 00:57:32 +0000</pubDate>
<dc:creator>Yudi Agusta</dc:creator>
<guid>http://yudiagusta.wordpress.com/?p=157</guid>
<description><![CDATA[Document Clustering adalah suatu kegiatan mengelompokkan dokumen berdasarkan pada karakteristik yang]]></description>
<content:encoded><![CDATA[<p>Document Clustering adalah suatu kegiatan mengelompokkan dokumen berdasarkan pada karakteristik yang terkandung di dalamnya. Proses analisa document clustering pada intinya ada dua tahapan: yang pertama mentransformasi document ke dalam bentuk quantitative data dan yang kedua menganalisa dokumen dalam bentuk quantitative data tersebut dengan metode clustering yang ditentukan. Untuk proses tahapan kedua ada berbagai jenis metode clustering yang bisa digunakan. Lihat tulisan saya mengenai <a href="http://yudiagusta.wordpress.com/clustering/">clustering</a>, <a href="http://yudiagusta.wordpress.com/k-means/">k-means</a>, <a href="http://yudiagusta.wordpress.com/mixture-modelling/">mixture modelling</a> atau <a href="http://yudiagusta.wordpress.com/category/clustering/">tulisan-tulisan clustering lainnya</a>.</p>
<p>Yang umumnya menjadi permasalahan dalam pelaksanaan document clustering ini adalah bagaimana cara merepresentasikan dokumen ke dalam bentuk data quantitative. Ada beberapa cara yang umum digunakan, salah satunya adalah vector space model yang merepresentasikan dokumen ke dalam bentuk vector dari term yang muncul dalam dokumen yang dianalisa. Salah satu bentuk representasinya adalah term-frequency (TF) vector yang bisa dilambangkan dengan:</p>
<p>dtf = (tf1, tf2, . . . , tfm)</p>
<p>dimana<br />
tfi: adalah frekuensi dari term ke-i di dalam suatu dokumen. </p>
<p>Model ini biasanya diperbaiki dengan memberikan weight untuk setiap term dengan alasan term yang sering muncul dalam banyak dokumen tidak mempunyai descriminant power. Dengan alasan ini mereka perlu untuk di-de-emphasised. Ini umumnya dilakukan dengan mengalikan frekuensi yang ada dengan log(N/fi) dimana N adalah jumlah dokumen yang ada dan dfi adalah jumlah dokumen yang mengandung term ke-i. Sehingga didapatkan suatu tf-idf representasi sebagai berikut:</p>
<p>dtfidf = (tf1 log(N/df1), tf2 log(N/df2), . . . , tfm log(N/dfm))</p>
<p>Untuk mengakomodasikan, dokumen dengan panjang berbeda, panjang dokumen dinormalisasikan menjadi suatu unit length, dimana 1 dtfidf = 1 yang artinya setiap dokumen adalah suatu vector dalam unit hypershpere.</p>
<p>Referensi:<br />
G. Salton (1989). Automatic Text Processing: The Transformation, Analysis, and Retrieval of Information by Computer. Addison-Wesley.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Aug 5: Using Java Technology at the World's Largest Web Site, Yahoo! ]]></title>
<link>http://sdforumjavasig.wordpress.com/?p=15</link>
<pubDate>Wed, 23 Jul 2008 18:36:48 +0000</pubDate>
<dc:creator>sudhish</dc:creator>
<guid>http://sdforumjavasig.wordpress.com/?p=15</guid>
<description><![CDATA[SDForum JAVA SIG: Aug 5 Meeting at Cubberley Community Center #H1
Serving hundreds of millions of us]]></description>
<content:encoded><![CDATA[<p>SDForum JAVA SIG: <strong>Aug 5</strong> Meeting at Cubberley Community Center #H1</p>
<p>Serving hundreds of millions of users presents unique challenges. This session describes how Yahoo! has introduced and adopted a Java technology-based serving platform to scale to its global audience and overcome the challenges of integration, security, reliability, and scalability.</p>
<h2>Presenters:</h2>
<p>Joshua Blatt, Technical Yahoo!, and Dean Yu, Technical Yahoo!</p>
<h2>Location:</h2>
<p>Cubberley Community Center , Room H1 <a title="Cubberley Map" href="http://www.city.palo-alto.ca.us/civica/filebank/blobdload.asp?BlobID=8865" target="_self">[See Map]</a></p>
<h2>Agenda</h2>
<p>6:45-7:00 Doors open. Networking. Pizza.<br />
7:00-9:00 Presentations</p>
<h2>Price</h2>
<p>$15 at the door for non-SDForum members<br />
No charge for SDForum members<br />
No registration required</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[GO-Fuzzy C-Means Clustering]]></title>
<link>http://sysbioasu.wordpress.com/?p=32</link>
<pubDate>Tue, 22 Jul 2008 23:56:52 +0000</pubDate>
<dc:creator>sysbioasu</dc:creator>
<guid>http://sysbioasu.wordpress.com/?p=32</guid>
<description><![CDATA[Neha presents.
Slides (PPT)
]]></description>
<content:encoded><![CDATA[<p>Neha presents.</p>
<p><a href="http://sysbio.fulton.asu.edu/seminardocs/Neha22Jul.ppt">Slides (PPT)</a></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Cluster di macchine virtuali guest con MS Virtual Server 2005]]></title>
<link>http://gpacific.wordpress.com/?p=4</link>
<pubDate>Mon, 21 Jul 2008 19:27:35 +0000</pubDate>
<dc:creator>Giosuè</dc:creator>
<guid>http://gpacific.wordpress.com/?p=4</guid>
<description><![CDATA[In rete ho trovato molti articoli che descrivono la configurazione di un cluster windows 2003 con Vi]]></description>
<content:encoded><![CDATA[<p>In rete ho trovato molti articoli che descrivono la configurazione di un cluster windows 2003 con Virtual Server 2005, alcuni eccellenti, altri molto sintetici. Ho pensato di raccogliere qui alcune dell procedure descritte, aggiungendo qualche nota. Il lavoro è da considerarsi <em>in fieri (anche perchè devo prendere confidenza con lo strumento blog)</em></p>
<p><a href="http://gpacific.files.wordpress.com/2008/07/snap2_50.png"><img class="alignnone size-medium wp-image-5" src="http://gpacific.wordpress.com/files/2008/07/snap2_50.png?w=251" alt="" width="251" height="300" /></a></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Ce aduce nou Failover Clustering in Windows 2008 Server]]></title>
<link>http://rottyzone.wordpress.com/?p=39</link>
<pubDate>Wed, 09 Jul 2008 21:54:34 +0000</pubDate>
<dc:creator>rottys</dc:creator>
<guid>http://rottyzone.wordpress.com/?p=39</guid>
<description><![CDATA[Andrei Ionut Pop, inginer de sistem, specialist si certificat in tehnologie Microsoft, dar si un foa]]></description>
<content:encoded><![CDATA[<p><strong>Andrei Ionut Pop</strong>, inginer de sistem, specialist si certificat in tehnologie Microsoft, dar si un foarte activ membru in comunitatea <strong>ItBoard</strong> incepe o serie de articole dedicate Windows Server 2008. Primul din acest sir este un articol in care explica ce functionalitati noi are Windows Server 2008 pe parte de clustering.</p>
<div style="text-align:center;"><img src="http://www.hit.ro/assets/articole/2008/07/gal_mare_windowsserver2008.jpg" border="0" alt="windowsserver2008.jpg" width="400" height="340" /></div>
<p>Desi asociata de cele mai multe ori cu marile datacentere, hardware dedicat si foarte scump, precum si un nivel ridicat de competenta tehnica, tehnologia « high availability » sau, in traducere, inalta disponibilitate, este mult mai accesibila decat isi imagineaza majoritatea. Pentru ca este un subiect destul de evitat, cel putin la noi, am zis ca este o idee buna sa initiez o serie de articole pe aceasta tema, dedicate atat celor mai putin familiarizati cu subiectul, cat si celor pentru care 99.999 este un mod de viata.</p>
<p><!--more-->Avand in vedere ca Windows 2008 Server tocmai a fost lansat, incep printr-o prezentare generala a rolului de Failover Clustering, imbunatatiri fata de versiunea anterioara din Windows Server 2003 precum si noi functionalitati.</p>
<p>Avem pentru prima data o <strong>metoda de validare a configuratiei</strong>. Prin intermediul unui instrument de validare care ruleaza sub forma unui « wizard » putem efectua diverse teste asupra sistemelor, mediilor de stocare a datelor si a infrastructurii de retea:</p>
<p>Daca serverele intrunesc cerintele minime de hardware, daca ruleaza aceeasi versiune a sistemului de operare  si au acelasi update-uri. Nu se poate face cluster intre servere care ruleaza pe arhitecturi diferite, x32 si x64.<br />
Daca se intrunesc conditiile de redundanta ale retelei. De mentionat ca este validata inclusiv varianta in care toate nodurile clusterului sunt conectate printr-un singur segment de retea (desi primiti un avertisment de  «single point of failure»).<br />
Daca mediul de stocare suporta comenzile SCSI sau reactioneaza corect la actiunile cerute de cluster.</p>
<p>Tot pentru prima data avem <strong>suport GPT</strong> (global partition table) pentru discuri. Asta inseamna ca limita de spatiu pentru discuri creste la 16 Exabytes , de la 2 TB cat era inainte, restrictie impusa de utilizarea MBR.  Daca tot suntem la categoria Storage, amintesc si cateva alte imbunatatiri fata de versiunea precedenta:</p>
<p><strong>Persistent reservation </strong>– elimina SCSI Bus Resets (in momentul in care controllerul cerea acces la un disc, intrerupea automat accesul la toate celelalte discuri aflate pe acelasi bus) . Cu alte cuvinte, de acum pot fi folosite numai SAS, Fibre Channel si iSCSI, dar nu si « directly attached SCSI ».</p>
<p><strong>Disk management</strong> – administratorul poate efectua diverse operatiuni asupra volumelor din SAN. Aceasta facilitate a fost introdusa de fapt din versiunea de Windows 2003 R2.</p>
<p><strong>Maintenance mode</strong> – ofera administratorului acces exclusiv la discurile din cluster.</p>
<p><strong>Disk Signatures</strong> – prin « disk signatures » se face identificarea discurilor conectate la cluster.  Acum se foloseste SCSI Inquiry Data, metoda care corecteaza niste potentiale probleme generate in versiunea anterioara in anumite scenarii de “disaster recovery“.<br />
Se pot adauga noi discuri la cluster in timp ce aplicatiile functioneaza.</p>
<p>S-a imbunatatit partea de instalare si configurare (acum se poate face prin numai 5 ecrane). Totodata poate fi transpusa si efectuata integral prin scripting. Aceasta facilitate este prezenta si in scenariul de migrare, deoarece putem capta anumite elemente de configurare din versiunea de 2003 pentru a le aplica apoi in versiunea de 2008.</p>
<p>Partea de <strong>interfata de administrare</strong> este si ea simplificata, orientata mai mult spre partea de management al aplicatiilor decat pe cel al clusterului. De exemplu, pentru cei care folosesc clustering pentru file-sharing, este mult mai usor de urmarit acum ce directoare partajate sunt in cluster si pe ce noduri se afla un anume director, prin optiunea numita « scoping ».</p>
<p>Una dintre cele mai importante modificari o gasim la model de <strong>quorum</strong>. In Windows  Server 2003 aveam trei optiuni, « single disk quorum », MNS (majority node set) si MNS cu FSW (File Share Witness).  Acum, in Windows 2008 Server avem patru optiuni:</p>
<p>1. Node majority – clusterul va functiona atata vreme cat o majoritate de noduri este disponibila.</p>
<p>2. A doua optiune este sa folosim atat nodurile cat si discul quorum (care acum se numeste « witness disk »). De exemplu, in cazul cu doua noduri, acestea vor continua sa functioneze chiar daca  « witness disk »  devine indisponibil (doua din trei).</p>
<p>3. Modelul clasic si binecunoscut, cu un singur witness disk.</p>
<p>4. Modul MNS  cu file share witness (cu mentiunea ca nu se poate folosi DFS ca file share witness).</p>
<p>Avem cateva noutati si pe partea de retea, infrastructura si acces.</p>
<p>1. Evident, nu putea lipsi <strong>suportul pentru Ipv6</strong>, atat pentru comunicatia intre noduri cat si intre noduri si clienti.</p>
<p>2. Acum putem avea <strong>nodurile in subnet-uri</strong> diferite. Ceea ce inseamna ca un geo-cluster  (un cluster in care serverele sunt plasate in zone geografice diferite, configuratie nelipsita in orice scenariu de « disaster recovery »)  a devenit dintr-o data mult mai usor de realizat. De asemenea,  acum avem <strong>« heartbeat timeout »</strong> <strong> configurabil</strong> (timpul maxim de raspuns intre doua noduri ale unui cluster), deci nu mai exista restrictia de 500 ms. Totodata, « heartbeat »  foloseste de acum TCP in locul UDP.</p>
<p>3. De acum putem folosi numai DNS, nu mai este nevoie de suport NetBIOS si WINS, elimindu-se astfel o sursa de broadcast.</p>
<p>4. Serviciul de cluster acum ruleaza sub <strong>contul « Local System »</strong>.</p>
<p>Singurul dezavantaj pe care l-as mentiona este acela ca nu exista posibilitate de upgrade direct de la un cluster pe Windows 2003 la cel din Windows Server 2008. Singura varianta este migrarea, cu mentiunea ca putem transfera o serie de configurari.</p>
<p>Cam aceasta ar fi o descriere sumara a caracteristicilor noului  Windows 2008 Server Failover Clustering. In urmatoarele prezentari am sa incerc sa vin si cu o configurare pas cu pas, cu poze, pentru un cluster cu doua noduri, si am sa detaliez si partea de Network Load Balancing si un scenariu pentru acesta.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[How to Guide: SQL Server 2005 Clustering]]></title>
<link>http://hptv04.wordpress.com/?p=477</link>
<pubDate>Thu, 03 Jul 2008 16:54:47 +0000</pubDate>
<dc:creator>hptv</dc:creator>
<guid>http://hptv04.wordpress.com/?p=477</guid>
<description><![CDATA[This paper shows how SQL Server 2005 is implemented on a failover cluster, how to install and config]]></description>
<content:encoded><![CDATA[<p style="text-align:justify;">This paper shows how SQL Server 2005 is implemented on a failover cluster, how to install and configure SQL Server 2005 for failover clustering, and best practices for SQL Server 2005 clustering. As SQL Server 2005 environments move from smaller installations to larger mission-critical enterprises, the need for the database environment to be highly available becomes more apparent. SQL Server 2005 has many different mechanisms to achieve high availability. But one of the most commonly used methods to achieve a highly available mission-critical database environment is SQL Server 2005's ability to make use of clustered environments.</p>
<p style="text-align:justify;">...download here: <a href="http://hptv04.files.wordpress.com/2008/07/sql2005_clustering.pdf">sql2005_clustering</a></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[MySQL Database Sprawl]]></title>
<link>http://researchitblog.wordpress.com/?p=10</link>
<pubDate>Thu, 03 Jul 2008 03:40:01 +0000</pubDate>
<dc:creator>Gary Stiehr</dc:creator>
<guid>http://researchitblog.wordpress.com/?p=10</guid>
<description><![CDATA[Have you noticed a sprawl of MySQL database installations across your servers?  I&#8217;ve seen this]]></description>
<content:encoded><![CDATA[<p>Have you noticed a sprawl of MySQL database installations across your servers?  I've seen this at various organizations.  Usually it is because some tool or another utilizes it, such as a ticketing system or a content management system.</p>
<p>While discussing server upgrades/consolidation recently, it was mentioned that perhaps we could consolidate all of these scattered OLTP-like instances into one <a title="MySQL Cluster" href="http://www.mysql.com/products/database/cluster/" target="_blank">MySQL Cluster</a>.  Some of the benefits mentioned during the discussion were reducing effort related to configuration tracking and tuning, simplifying database backups and overall higher availability.  On the other hand, sometimes clustering setup and ongoing operation can add complexity to the environment, which may offset the benefits.</p>
<p>How have you approached MySQL database sprawl?  What has your experience been with MySQL clustering?</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Clustering large datasets]]></title>
<link>http://biswaroop.wordpress.com/?p=46</link>
<pubDate>Wed, 02 Jul 2008 12:52:03 +0000</pubDate>
<dc:creator>biswaroop</dc:creator>
<guid>http://biswaroop.wordpress.com/?p=46</guid>
<description><![CDATA[I was reading about the BIRCH clustering algorithm which is used to cluster large data sets. Its qui]]></description>
<content:encoded><![CDATA[<p>I was reading about the <a href="http://www.lans.ece.utexas.edu/course/ee380l/1999fall/papers/list2/p103-zhang.pdf">BIRCH clustering algorithm</a> which is used to cluster large data sets. Its quite popular in the database community. I will try to summarize the paper here.</p>
<p>The algorithm tries to address the problem of <a href="http://en.wikipedia.org/wiki/Data_clustering">clustering</a> large data sets, where the entire dataset cannot fit into memory. The popular clustering algorithms and some flaws with them in the context of large datasets are:</p>
<p><strong><em>Probability based approaches</em></strong>: These assume that probability distributions on separate attributes are statistically independent of each other.</p>
<p><em><strong>Distance based approaches</strong></em>: These assume that all data points are available at once and can be scanned frequently.</p>
<p>The primary problem is that algorithms dont consider the fact that the dataset can be too large to fit into the main memory. BIRCH is specially suited for large datasets.  BIRCH can typically find a good cluster with a single scan of the data and improve the cluster quality with additional scans. Its I/O is linear in the size of the data.</p>
<p>The keystones of BIRCH are the concepts of <strong>Clustering Features</strong>(CF vector) to describe a cluster and the <strong>CF Tree</strong>. A triplet is used to summarize each cluster as a CF vector, comprising of a) Number of points in cluster b) Linear sum of vectors in cluster c) Square sum of vectors in cluster. The CF vecor is efficient because it stores much less than all the datapoints in the cluster. The representation is optimal because it allows for all the calculations needed to make clustering decisions in BIRCH.</p>
<p>A CF Tree is a balanced tree used to store the clusters in a hierachial manner. The algorithm for insertion of points into the tree ensure that we end up with a good cluster at the end of a first scan of the data. I refer the reader to the paper for details of this data structure.</p>
<p>Clustering is now performed on the CF tree, which being a much reduced representation of the dataset, does fit into the main memory. There might be flaws in the clustering because of skews in the input order of the data. For example, the same data point, if inserted twice at different times might end up in differnt leaves of the tree. These anomalies are taken care of in the post processing steps.</p>
<p>An overview of the clustering is found in the figure below:<a href="http://biswaroop.files.wordpress.com/2008/07/birch1.jpg"><img class="aligncenter size-full wp-image-49" src="http://biswaroop.wordpress.com/files/2008/07/birch1.jpg" alt="" width="439" height="315" /></a></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Clustering an EV Server]]></title>
<link>http://robwilc.wordpress.com/?p=45</link>
<pubDate>Tue, 01 Jul 2008 15:03:44 +0000</pubDate>
<dc:creator>Rob Wilcox</dc:creator>
<guid>http://robwilc.wordpress.com/?p=45</guid>
<description><![CDATA[I was setting up EV 2007 Service Pack 2 on a Windows 2003 2-node cluster for the last couple of days]]></description>
<content:encoded><![CDATA[<p>I was setting up EV 2007 Service Pack 2 on a Windows 2003 2-node cluster for the last couple of days (so I could work on a repro that I have to do).   One interesting thing that I found was that when I was trying to configure the failover node, it wasn't giving me a list of all the cluster resource group containing my EV resources.</p>
<p>What I mean is that when I run the Enterprise Vault Configuration Wizard I see this :</p>
<p><a href="http://robwilc.files.wordpress.com/2008/07/2008-07-01_155453.jpg"><img class="aligncenter size-full wp-image-46" src="http://robwilc.wordpress.com/files/2008/07/2008-07-01_155453.jpg" alt="" width="526" height="387" /></a></p>
<p>If I select the middle option to add a new to an existing cluster I see this :</p>
<p><a href="http://robwilc.files.wordpress.com/2008/07/2008-07-01_155523.jpg"><img class="aligncenter size-full wp-image-47" src="http://robwilc.wordpress.com/files/2008/07/2008-07-01_155523.jpg" alt="" width="524" height="386" /></a></p>
<p>I should also see my other resource group called EVGroup.</p>
<p>I read through the notes on the wizard screen:</p>
<p>Does my group contain all those resources? Yes.</p>
<p>Is it online on the node in the cluster? Yes.</p>
<p>Why then is EVGroup not listed?  A little snippet of information in the Installing and Configuring Guide led me to find out what the problem was :</p>
<p><strong>Configuring a failover node</strong></p>
<p>Perform this procedure on the nodes that are to act as failover nodes.</p>
<p>To configure a failover node</p>
<p>1 On the node’s Windows Start menu, click All Programs &#62; Enterprise Vault &#62; Enterprise Vault Configuration. The first page of the Enterprise Vault Configuration wizard appears.</p>
<p>2 Click Configure the node as a failover node for an existing clustered server, and then click Next.</p>
<p><strong>3 The wizard prompts you for the name of the resource group for which you want to add the node as a failover node. Select any resource group that is configured to fail over to this node. The resource group must be online on one of the nodes that you have configured as an Enterprise Vault primary node, and its  resources must all have the failover node as a possible owner.</strong></p>
<p><strong><br />
Select the name of the resource group, and then click Next.</strong></p>
<p>4 On the next wizard page, enter the password for the Vault Service account, and then click Next.</p>
<p>5 The next wizard page lists the actions the wizard will take if you proceed. To continue click Next, then click and then click OK to confirm the actions taken.</p>
<p>6 The final wizard page displays a list of the actions the wizard has performed,and the results. Click Finish to exit the wizard.</p>
<p>When I double checked the EV resources didn't have this node listed as a possible owner :</p>
<p><a href="http://robwilc.files.wordpress.com/2008/07/2008-07-01_160109.jpg"><img class="aligncenter size-full wp-image-48" src="http://robwilc.wordpress.com/files/2008/07/2008-07-01_160109.jpg" alt="" width="399" height="446" /></a></p>
<p>Now one way to fix this up would be to edit each of the resources and add the second node as a possible owner.  Another way is to take the resources offline on the active node, fail the group over to the passive node (Node B in my case), then fail the resources back to the active node  (Node A in my case).  If you then bring the resources back online on NodeA, the configuration wizard can go through on the passive node !</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Using Google Charts for Cluster Performance Monitoring]]></title>
<link>http://feedblog.org/?p=1695</link>
<pubDate>Mon, 30 Jun 2008 07:45:16 +0000</pubDate>
<dc:creator>burtonator</dc:creator>
<guid>http://feedblog.org/?p=1695</guid>
<description><![CDATA[I&#8217;ve been looking at replacing Munin with our own higher level proprietary monitoring system f]]></description>
<content:encoded><![CDATA[<p>I've been looking at replacing Munin with our own higher level proprietary monitoring system for keeping track of cluster-wide statistics.</p>
<p>This is needed for a new feature we're trying to ship with Spinn3r so that we can expose some of our internal statistics to our customers.</p>
<p>We weren't able to do this before because our internal monitoring was a bit of a hack due to the lack of quality in the open source monitoring tool chain.</p>
<p>About 80% of the work in a performance monitoring tool is in the charting component and I'm glad to see that the Google chart API basically rocks.</p>
<p>It's very well done.  The only real complaint I have is that you can't submit a URL longer than 2000 bytes.  It seems like a limitation in their own internal tool as Firefox, Safari, etc can support URLs up to 65k.</p>
<p>This limitation is more pathological than you would think because you can't place two metrics on top of each other in the same graph.  They need to move into two separate graphs or you'll generate a long URL and Google will return an HTTP 40x response.</p>
<p>The other cool thing is the API is FAST.  Check out the render time of this <a href="http://chart.apis.google.com/chart?cht=ls&#38;chs=800x375&#38;chd=e:dyeYeJebd-d9didCdgdWePdOc-bsbPaGZjZCZOaobAbXbNbFaiaIaZaZayaMZZYMXCWAVdUwULT2SaSTRTRQQuQzRORPRKQqRIPOOHMiLpKqJ9JsImKMKtMSOWQwSwUaVdWXXXYZZgaZa2bKc.c.eCeveyeuffgOh9j4kznRoDo3pNoJm0lgkbiph5g7gsgagafie.eAeOeZemfCd6dseefJd4eweEdgdkcfcvdJdJdOdsd2eweUeifcgmh1h.iejPj3jzkVjWjgjmkGkGjajniYi5iGiehtg6ftfOemeBdcdHdVdyc0bEZpZva-bDbLbgbvbhb4boayZ-aAZDY0XrXaWVWCWIWKWOWkXUXxXQW.WYXpZPaZaCZ-bNb3baaRahZVZUZcasb8cIcsdGdedkdmdpdyeiere1ewexgFfnfJdvgjiPh2f0fjfxgNf3gLgMgngEfLgrgigBhMhViOiVjxjQi2imjRiihnh7hYhmeTc8c0exfSftfsfifSfLeid5eDdYfAgAeme3e-fmfdgMgGfaeYeLdscucDbucQcLeaf5edeeejf5glgrg-hchOgaejdgdZdBcDbYaKZcZvZ6alasbWc.d6eweifOc2bJc3fdhBhXiIk-l0m.nerwsluivQwBwuyk041r3y5i-M.-.f.a.o.y.m.Z.r.F-E7l6j555c37383c3jzv0kzd0G0KzxzKwyxQvQunp-o.pzo2oroGpYoFnwmRmqmBlikikdkPkBkGj1kRjTjajMihjhiTi1hoiEhShphUhAhFgnhCe1fte5e8fGegeHd5dCb4bQb8cAcYb1bKbqcEbhbtbKa7aWaBZKYvYLYGXqW4W.WFWGWGUmUlUaULUVUZUKSRRhROREREPvQARYSeUFVfWeXuZaaHbZbhcQcVcTckcicrcwc0c8eyfnhdjHlPmznJoGqHqSrAo5oEm4lnkjjMiFkHk8mom7nbn7oAoWoDn0nlnWluk8jDhqgbffebePcodldhfJfWgqhXifjWjRhUfReQdcc9cTb8bkbpb4b.cMcVcZcbccb.b-cCbFabY.XYVnUxS9SNQ-RdReRXS5RyTJS6S0SvSrSrSiScSUR7RiReRdQ8QOPwPvPsPlPoPmPlPfPUPcONNmOcQAQwSoTjWkX3aLb1cGd5eae5fEgyg-iDjAj7myqetouqwGwuydy63x3-4J3D1R1JyWyPvlvWtdr8r.qcqDqRqgpZpvpDpupZoelmjTi3hlhtgZgGblcAcKb4a4a8ananaSZ.aIbDa2arazajbFakZ8ZtYiYvXBWkV5WJWNVmVNVHUqT2TgTTToTbTqTuUJVvVzViWEWjXDWvWIWBV0VyVhUyUqUbURTaTuTiTeTbTjTyT0T9TSVQXyZIa3Y-aYbCcddCdDdadqeIeHeyeufZfYfsgji2j6lZmDoTomp2Vj&#38;chtt=Feed+Content+Growth+(24+hours)&#38;chxt=x,y&#38;chxl=0:%7C10:27%20PM%7C4:27%20AM%7C10:27%20AM%7C4:27%20PM%7C10:27%20PM%7C1:%7C%7C17333%7C34666%7C51999%7C69332%7C86664&#38;chg=10,10&#38;chco=bb2b27DD&#38;chm=B,bb2b27DD,0,0,0&#38;chdl=CURRENT:%2029,184%20%20AVG:%2041,282%20%20%20MIN:%2011,655%20%20%20MAX:%2086,664&#38;chdlp=b">chart which is 300k pixels</a>.... </p>
<p>I can now render graphs in about .09 seconds which is an order of magnitude faster than Munin with rrdgraph on a quad core Opteron with a 2.6Ghz processor.</p>
<p>I hope they do something about exposing the component used with Google Finance.  It is pretty sweet as well.</p>
<p><img src="http://burtonator.files.wordpress.com/2008/06/200806292227.jpg" height="225" width="463" border="0" hspace="4" vspace="4" alt="200806292227" /></p>
<p><img src="http://burtonator.files.wordpress.com/2008/06/200806292230.jpg" height="225" width="463" border="0" hspace="4" vspace="4" alt="200806292230" /></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Sequence Clustering]]></title>
<link>http://yudiagusta.wordpress.com/?p=148</link>
<pubDate>Tue, 24 Jun 2008 06:19:10 +0000</pubDate>
<dc:creator>Yudi Agusta</dc:creator>
<guid>http://yudiagusta.wordpress.com/?p=148</guid>
<description><![CDATA[Clustering on Sequential Pattern merupakan sub ilmu dari Data Mining dan Soft Computing.
Clustering ]]></description>
<content:encoded><![CDATA[<p>Clustering on Sequential Pattern merupakan sub ilmu dari Data Mining dan Soft Computing.</p>
<p>Clustering on Sequential Pattern adalah suatu proses pengelompokan data, dimana data yang dikelompokkan merupakan suatu pola berurut dan feature dalam data yang muncul sebelumnya menentukan probabilitas dari kemunculan feature berikutnya. Clustering on Sequential Pattern bisa dilakukan dengan memanfaatkan berbagai jenis metode clustering yang salah satunya adalah metode <a href="http://yudiagusta.wordpress.com/mixture-modelling/">mixture modelling</a>.</p>
<p>Memodel data sequence dalam bentuk cluster dengan memanfaatkan teori probabilitas dapat dilakukan dengan dua cara yaitu:<br />
1. Menganggap data sequence yang ada sebagai model <a href="http://yudiagusta.wordpress.com/2008/06/24/markov-chain/">Markov Chain</a><br />
2. Dengan memanfaatkan metode <a href="http://yudiagusta.wordpress.com/2008/06/25/hidden-markov-models/">Hidden Markov Model</a> sebagai model dari data sequence yang ada</p>
<p>Untuk kasus yang pertama, sequential pattern dapat dimodel dengan <a href="http://yudiagusta.wordpress.com/2008/06/24/markov-chain/">Markov Chain</a>, dimana order dari Markov Chain ini menentukan berapa banyak feature yang akan menentukan nilai dari feature yang akan datang. Umumnya Markov Chain yang digunakan adalah Markov Chain dengan order satu, dimana satu feature sebelumnya saja yang menentukan nilai feature yang akan datang beserta probabilitasnya. Markov Chain dengan order n, berarti bahwa sebanyak n feature sebelumnya yang menentukan nilai feature yang akan datang dan probabilitasnya.</p>
<p>Untuk kasus yang kedua, sequential pattern dapat dimodel menggunakan <a href="http://yudiagusta.wordpress.com/2008/06/25/hidden-markov-models/">Hidden Markov Model</a> yang merupakan perkembangan dari Markov Chain model. Hidden Markov Model mempunyai suatu variabel tambahan dibandingkan dengan Markov Chain yaitu berupa hidden variabel yang berfungsi untuk memodel jumlah dan jenis sumber darimana bagian-bagian dari sequence tersebut berasal.</p>
<p><a href="http://yudiagusta.wordpress.com/mixture-modelling/">Mixture modelling</a> terhadap data sequence dilakukan dengan memodel Markov Chain model atau Hidden Markov Model yang didapatkan dari data sequence yang bersangkutan. Distance measure yang digunakan adalah log-likelihood dari sequence yang bersangkutan ke model sequence representasi dari cluster yang terbentuk. Dari pemodelan ini, akan didapatkan jumlah cluster yang paling sesuai, jenis data yang masuk di dalam masing-masing cluster dan juga proporsi (relative size) dari masing-masing cluster.</p>
<p>Beberapa variasi dari sequence analisis juga didapatkan dalam bioinformatics dimana proses sequence alignment juga perlu untuk dilaksanakan, untuk memastikan ada tidaknya mutasi dari suatu gen ke gen yang lain. Dalam penganalisaan web sequence juga sering didapatkan bahwa suatu sequence adalah mirip dengan sequence yang lain dimana satu bagian dari salah satu sequence tidak terdapat pada sequence lainnya.</p>
<p>Referensi:<br />
Smyth P (1997). Clustering Sequences with Hidden Markov Models, Moser M. C. et al eds, Advances in Neural Information Processing Systems, vol 9, The MIT Press, page 648.<br />
Rabiner L. R. (1999). A Tutorial on Hidden Markov Models and Selected Applications in Speech Recognition. Proceedings of The IEEE, Vol. 77, No 2, pp. 257 - 286.</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[Clustering: intrinsic dimensionality experiment 1]]></title>
<link>http://kevd1337.wordpress.com/?p=13</link>
<pubDate>Wed, 18 Jun 2008 16:38:46 +0000</pubDate>
<dc:creator>kevd1337</dc:creator>
<guid>http://kevd1337.wordpress.com/?p=13</guid>
<description><![CDATA[
TEST POST.
Write up comming soon&#8230;
]]></description>
<content:encoded><![CDATA[<p><a href="http://kevd1337.files.wordpress.com/2008/06/intrinsicdimensionality_poc1.png"><img class="alignnone size-medium wp-image-12" src="http://kevd1337.wordpress.com/files/2008/06/intrinsicdimensionality_poc1.png?w=300" alt="Screen capture of intrinsic dimensionality experiment 1" width="300" height="187" /></a></p>
<p>TEST POST.</p>
<p>Write up comming soon...</p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[HCORE and Large-Scale Bayesian Network Learning]]></title>
<link>http://sysbioasu.wordpress.com/?p=28</link>
<pubDate>Tue, 17 Jun 2008 23:41:05 +0000</pubDate>
<dc:creator>sysbioasu</dc:creator>
<guid>http://sysbioasu.wordpress.com/?p=28</guid>
<description><![CDATA[Sungwon presents.
Slides (PDF)
Paper 1 (PDF)
Paper 2 (PDF)
]]></description>
<content:encoded><![CDATA[<p>Sungwon presents.</p>
<p><a href="http://sysbio.fulton.asu.edu/seminardocs/Sungwon17Jun1.pdf">Slides (PDF)</a><br />
<a href="http://sysbio.fulton.asu.edu/seminardocs/Sungwon17Jun2.pdf">Paper 1 (PDF)</a><br />
<a href="http://sysbio.fulton.asu.edu/seminardocs/Sungwon17Jun3.pdf">Paper 2 (PDF)</a></p>
]]></content:encoded>
</item>
<item>
<title><![CDATA[2K7SP1 CCR on 2k8 steps]]></title>
<link>http://johnacook.wordpress.com/2008/06/17/good-deal/</link>
<pubDate>Tue, 17 Jun 2008 22:29:50 +0000</pubDate>
<dc:creator>johnacook</dc:creator>
<guid>http://johnacook.wordpress.com/2008/06/17/good-deal/</guid>
<description><![CDATA[Deploying an Exchange 2007 SP1 CCR Cluster on a Windows Server 2008 Failover Cluster – Part 1: Pre]]></description>
<content:encoded><![CDATA[<p><a href="http://www.msexchange.org/articles_tutorials/exchange-server-2007/high-availability-recovery/deploying-exchange-2007-sp1-ccr-cluster-windows-server-2008-failover-cluster-part1.html">Deploying an Exchange 2007 SP1 CCR Cluster on a Windows Server 2008 Failover Cluster – Part 1: Prerequisites &#38; Configuring the failover cluster nodes</a></p>
]]></content:encoded>
</item>

</channel>
</rss>
