Archive

Archive for the ‘Technological Updates’ Category

Big Data – Hadoop HDFS and MapReduce

September 27, 2012 4 comments

The big data buzz is increasing day by day. So here is a more detailed look at the Hadoop – HDFS and MapReduce.

HDFS or the Hadoop Distributed File System is designed to store a large amount of data in various servers/clusters. The definition of large data needs no explanation (especially when we are talking Big Data).  Data in a Hadoop cluster is broken down in small blocks (default is 64MB) and distributed across the clusters.

The blocks in the cluster are placed based on a block placement algorithm – rack aware. Rack aware algorithm basically determines which block is to be placed in clusters based on the replication factor, which is generally 3x by default.

The basic architecture of HDFS cluster consists of two major nodes namely:

1. Name Node:

This is almost like the Master Node in Greenplum database and the “master” as per the master-slave concept.  The name node manages the file system namespace. It maintains the file system tree and the metadata for all the files and directories in the tree. This information is stored persistently on the local disk in the form of two files: the namespace image and the edit log.

Now the question arises what if the single name node crashes down (as we have only one primary name node). So, in order to maintain this data, Hadoop provides a secondary name node or Backup Name node. As primary name node is the Single Point of Failure (SPOF), the secondary name node copies the FsImage and EditLog from the Name Node at a particular time.

2. Data Node:

These are the major working blocks of the HDFS. They store and retrieve blocks when they are told to (by the name node), and they report back to the name node periodically with lists of blocks that they are storing. These data nodes are the places where the majority of the data resides.

 

Map Reduce is the second major portion of Hadoop architecture. Map Reduce is the programming logic or the brain as I would like to say. Map Reduce was created by Google which was based on the parallel processing programming logic, written in Java.

The Map Reduce programming model works on two parts – The Mapping part(done by the Mapper) and The Reduction part (done by the Reducer).

The Mapper works on the blocks of data available in the data nodes and tries to get the job done. You can think of Mapper as an individual worker (in the master-slave concept), working to get the data required from the client.

Now the major task remains is to get the aggregate count of the results done by each Mapper. This work is done by the Reducer. The Reducer iterates over the entire result data and sends back a single output value.

Map Reduce programming undergoes through various intermediate stages. Now let’s have a look at the following diagram:

From the diagram above we can see that the user give something as the input. In this case the input is a question and its subsequent answer. These files are stored in the data nodes of the HDFS. The Map-Reduce program looks into given data and breaks the data into an intermediate stage. The intermediate stage consists of a key/value pair, which breaks the file data into many key- value pair data. [If you have studied Compiler Design during your college days, then a look at the key-value stage just reminds me of the lexical analysis, semantic analysis, etc.]. Now after this stage, the sorting or the shuffling of the data takes place. It’s vague to understand from the diagram, but if you look into the second part of the above picture, you will understand the requirement of the sorting phase. The major reason is the availability of various servers or nodes. The Map Reduce makes sure that the shuffling and sorting of the data takes place using the key. Now come the reducer phase, which accepts the data coming from the sorting / shuffling phase and combines the data into a smaller set of values. This data is sent back to the user/client.

The above entire process is controlled by a JobTracker, which coordinates the job run and makes sure everything goes fine. The TaskTracker runs the tasks that the job has been split into.

So this is a brief description of the HDFS and the MapReduce. I didn’t go much deep into the core functionality of Map Reduce as it requires a full scale knowledge of the Java Programming Language. So I guess am able to give a short but detailed explanation on Hadoop. Thanks and take care.

Advertisements

Cloud Computing : Architecture


Hey guys !!! i hope everyone is clear with the overview on cloud computing ,which i had already discussed in my previous blog. Our entire discussion on cloud computing will not end until and unless we discuss about the architectures and the technical side of this system. So, without wasting much time on “bakwasss” lets begin our discussion on the architecture of cloud computing.

Cloud architecture, the systems architecture of the software systems involved in the delivery of cloud computing, typically involves multiple cloud components communicating with each other over a loose coupling mechanism such as a messaging queue. When talking about a cloud computing system, it’s helpful to divide it into two sections:

1. The Front End or the Intercloud:
The front end includes the client’s computer (or computer network) and the application required to access the cloud computing system. Not all cloud computing systems have the same user interface. Services like Web-based e-mail programs leverage existing Web browsers like Internet Explorer or Firefox. Other systems have unique applications that provide network access to clients.

Cloud Computing Architecture

Cloud Computing Architecture

2. The Back End or The Cloud Engineering :
On the back end of the system are the various computers, servers and data storage systems that create the “cloud” of computing services. In theory, a cloud computing system could include practically any computer program you can imagine, from data processing to video games. Usually, each application will have its own dedicated server.

[N.B: Cloud engineering is the application of engineering disciplines to cloud computing. It brings a systematic approach to the high level concerns of commercialisation, standardisation, and governance in conceiving, developing, operating and maintaining cloud computing systems. It is a multidisciplinary method encompassing contributions from diverse areas such as systems, software, web, performance, information, security, platform, risk, and quality engineering.]

If a cloud computing company has a lot of clients, there’s likely to be a high demand for a lot of storage space. Some companies require hundreds of digital storage devices. Cloud computing systems need at least twice the number of storage devices it requires to keep all its clients’ information stored. That’s because these devices, like all computers, occasionally break down. A cloud computing system must make a copy of all its clients’ information and store it on other devices. The copies enable the central server to access backup machines to retrieve data that otherwise would be unreachable. Making copies of data as a backup is called Redundancy.

The architecture of cloud is evolving rapidly. Hopefully in the upcoming future of computing we can say “we build our home in the cloud”. There are also many issues such as privacy, data maintenance, etc, but still there are loads of advantages too. We will discuss it in the later blogs. Stay tuned for more !!!

How to Fix Win32 Generic Host Error?

June 26, 2009 1 comment

What is an Generic Host process for Win32:

Generic Host Process for Win32 Services or svchost.exe is a legal and essential component of Windows which is used to host services which run from dynamic-link libraries (DLLs). Multiple instances of Svchost.exe can run at the same time. So it is not a problem in most cases if you see five or six or even more copies of svchost.exe running in your services because they host different groups of DLLs. However, there are several known spyware and trojans that pretend to be legal svchost.exe. They usually have the same name or one of the following names: svchost.exe, svchosts.exe (which often causes svchosts.exe page faults), Generic.exe, svcchost.exe and several others. Please note that legal svchost.exe should reside in Windows\System32 folder and should not appear in startup list.

Fixing Generic host Win32 Error:

Step1:

  1. Hop to ‘Run’ and open ‘Regedit’
  2. Navigate to:
    HKEY_LOCAL_MACHINE > SYSTEM > CurrentControlSet > Services > Browser > Parameters
  3. Find the Key
    Name: IsDomainMaster
    and set
    Data: False
  4. Restart Your PC

Step2:

  1. Go to ‘Run’ and open ‘cmd’
  2. Type ‘netsh’ in command console then press enter
  3. Then type ‘winsock’ and press enter and then type reset
  4. Restart Your PC

Step3:

  1. Open ‘cmd’
  2. type regedit and press enter. This will lead you to the registry editor.
  3. Find the following key: HKEY_LOCAL_MACHINE\System\CurrentControlSet\Services\NetBT\Parameters
  4. In the right-hand window pane you will find an option called TransportBindName.
    Double click onTransportBindName and delete the existing default value and click Ok (This gives TransportBindName a blank value)
  5. Close Port 135: Navigate to the following registry key:
    HKEY_LOCAL_MACHINE\Software\Microsoft\Ole
  6. In the right hand window pane you will see an option called EnableDCOM. Double-click EnableDCOM and change the Y to an N and click Ok. Close the Registry Editor and restart your computer.

Hope this will solve your problem. 🙂

Conficker Worm: Removal

April 3, 2009 Leave a comment

On 15 October 2008, Microsoft released an emergency out-of-band patch for vulnerability MS08-067, which the worm exploits to spread. The patch applies only to Windows XP SP 2, Windows XP SP 3, Windows 2000 SP4, Windows Vista; Windows XP SP 1 and and earlier are no longer supported.

Microsoft has since released a removal guide for the worm, and recommends using the current release of its Malicious Software Removal Tool to remove the worm, then applying the patch to prevent re-infection.

For Manual Removal of the worm, please follow this link:

I found this link quite useful…

Hope this updates on Win32 Conficker Worm will help you..I will try to add more posts as soon as I get some other useful information.

Thank You

Conficker Worm: Characteristics(contd.)

April 2, 2009 Leave a comment

When executed, the worm copies itself using a random name to the %Sysdir% folder.
(Where %Sysdir% is the Windows system folder; e.g. C:\Windows\System32)
It modifies the following registry key to create a randomly-named service on the affected syetem:

  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\{random}\Parameters\”ServiceDll” = “Path to worm”
  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Services\{random}\”ImagePath” = %SystemRoot%\system32\svchost.exe -k netsvcs

Attempts connections to one or more of the following websites to obtain the public ip address of the affected computer.

Starts a HTTP server on a random port on the infected machine to host a copy of the worm.
Continuously scans the subnet of the infected host for vulnerable machines and executes the exploit. If the exploit is successful, the remote computer will then connect back to the http server and download a copy of the worm.
Later variants of w32/Conficker.worm are using scheduled tasks and Autorun.inf file to replicate on to non vulnerable systems or to reinfect previously infected systems after they have been cleaned.

Conficker Worm: Characteristics

April 2, 2009 Leave a comment

It copies itself to the following patches:

  • %Sysdir%\[Random].dll
  • %Program Files%\Internet Explorer\[Random].dll
  • %Program Files%\Movie Maker\[Random].dll
  • %Program Files%\Windows Media Player\[Random].dll
  • %Program Files%\Windows NT\[Random].dll

It disables the following services:

  • WerSvc , ERSvc , BITS , wuauserv , WinDefend , wscsvc

It hooks the following functions in dnsapi.dll :

  • Query_Main , DnsQuery_W , DnsQuery_UTF8 , DnsQuery_A

It hooks the following functions in ws2_32.dll:

  • sendto

The worm deletes the following registry key to disable restarting in safe mode:

  • HKEY_LOCAL_MACHINE\SYSTEM\CurrentControlSet\Control\SafeBoot

It deletes the following registry keys:

  • HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\explorer\ShellServiceObjects\{FD6905CE-952F-41F1-9A6F-135D9C6622CC}
  • HKEY_LOCAL_MACHINE\Software\Microsoft\Windows\CurrentVersion\Run\Windows Defender

It terminates the processes that contains the following strings in name:

  • wireshark / unlocker / tcpview / sysclean / scct_ / regmon / procmon / procexp / ms08-06 / mrtstub / mrt. / mbsa. / klwk / kido / kb958 / kb890 / hotfix / gmer /  filemon / downad / confick / avenger / autoruns

In order to block users access to security-related domains, prevents network access to any domains that contain the following strings:

  • windowsupdate / wilderssecurity / virus / virscan / trojan / trendmicro / threatexpert / threat / technet / symantec / sunbelt / spyware / spamhaus / sophos / secureworks / securecomputing / safety.live / rootkit / rising / removal / quickheal / ptsecurity / prevx / pctools / panda / onecare / norton / norman / nod32 / networkassociates / mtc.sri / msmvps / msftncsi / mirage / microsoft / mcafee / malware / kaspersky / k7computing / jotti / ikarus / hauri / hacksoft / hackerwatch / grisoft / gdata / freeav / free-av / fortinet / f-secure / f-prot / ewido / etrust / eset / esafe / emsisoft / dslreports / drweb / defender / cyber-ta / cpsecure / conficker / computerassociates / comodo / clamav / centralcommand / ccollomb / castlecops / bothunter / avira / avgate / avast / arcabit / antivir / anti- / ahnlab / agnitum

Conficker Worm: A new threat to computer

April 2, 2009 3 comments

Conficker, also known as Downup, Downadup and Kido, is a computer worm targeting the Microsoft Windows operating system that was first detected in October 2008.
An early variant of the worm propagated through the Internet by exploiting a vulnerability in the network stack of Windows 2000, Windows XP, Windows Vista, Windows Server 2003, Windows Server 2008, Windows 7 Beta, and Windows Server 2008 R2 Beta that was discovered earlier that month.The worm has been unusually difficult for network operators and law enforcement to counter because of its combined use of advanced malware techniques.

Method of Infection:

This worm exploits the MS08-067 Microsoft Windows Server Service vulnerability in order to propagate.conficker_500x3751
Machines should be patched and rebooted to protect against this worm re-infecting the system after cleaning.
Upon detection of this worm the system should be rebooted to clean memory correctly. May require more than one reboot.
Scheduled tasks have been seen to be created on the system to re-activate the worm.

Autorun.inf files have been seen to be used to re-activate the worm.

Symptoms:

If your computer is infected with this worm, you may not experience any symptoms, or you may experience any of the following symptoms:

  • Account lockout policies are being tripped.
  • Automatic Updates, Background Intelligent Transfer Service  (BITS), Windows Defender, and Error Reporting Services are disabled.
  • Domain controllers respond slowly to client requests.
  • The network is congested.
  • Various security-related Web sites cannot be accessed.

For more information about Win32/Conficker.b, visit the following Web pages: