30 August 2014

When Not To Use Hadoop

Hadoop has become a necessity for almost all analytical applications that have huge data processing requirements. It also offers an open source flexibility as well as a range of subprojects to facilitate processing, ingestion, and downstreaming of input/outputs. However, Hadoop is not appropriate for all business applications. Often times a first litmus test when deciding to use Hadoop should be to answer a few specific questions around loading and processing of data. If one can load the data in a standard database without much problems then surely Hadoop is not really the way to go. Is a few hundred MB size dataset for processing a business case for Hadoop? What about a few hundred GB of datasets? It is also not a replacement for standard databases. In general, Hadoop has problems dealing with small files. So, having large number of small files is going to be suboptimal for Hadoop compared to large number of large files for processing. This is primarily why the platform works of a MapReduce approach and why the underlining layer is HDFS as standard approaches are just unable to handle such large data processing efficiently, albeit at a cost. Also, working with XML/RDF type of data will pose much problems and require pre-processing for deserialization to other processing formats such as SequenceFilesAvro, Protocol Buffers, and Thrift. Hadoop is also not appropriate for direct real-time processing needs. Although, stream processing has become available. It is most appropriate for as a flexible data warehouse where generally static data is stored for analysis rather than a rapidly changing dataset. It is useful for merging and unlocking large amounts of corporate and even web data from various data sources and providing analytical processing for useful insights and filtering to other systems. Hadoop in the cloud can save much headache for operations management. However, it still requires a careful strategy in the management of an appropriate cluster and capacity planning over namenodes. Otherwise, costs can invariably get out of hand in the cloud very quickly due to high computational processing requirements of Big Data.  The term Big Data also needs some clarity. Datasets in the order of terabytes and petabytes at web scale are aptly classed as Big Data where not only one is working with unstructured data but also size of data is so huge that it could not sensibly fit into a standard data architecture for continuous processing. Hadoop here could work wonderfully with HBase as a storage layer for the unstructured data and then filter more structured data downstream to other more appropriate systems. Increasingly, NoSQL approaches have also started to provide their own equivalent support for MapReduce. For example, MongoDB provides a MapReduce functionality and with its varying use cases, it is also widely used for real-time advertising. Although, MapReduce on MongoDB may not be in any comparison to the level of processing that could be done on Hadoop at scale. One obviously needs to understand firstly their data, and secondly what they plan to do with it. The below links provide further views on why Hadoop may not be the right approach for solving particular business problems.

Mule In Perspective

Service Oriented Architectures are big step towards integration of disparate systems. However, over time the approach of Web Services have branched out from SOAP to REST. There have also emerged many integration approaches from component to mediators as well as full enterprise service bus. Almost every software engineering area has a significant set of design patterns in which to approach large scale solutions. Mule has over the years become a strong contender in the enterprise service bus area. It provides a very open and holistic approach to integration, facilitated by connectors as well as a visual flow mechanisms. However, the platform does have its many quirks and drawbacks that leaves one wondering whether quality assurance was compromised over the sake of releases. The visual flow mechanism is also a very buggy and limiting perspective for a developer who may want to directly utilize XML to gain flexibility. Also, even their training course instructors dispel many truths to significant buggy areas of the platform especially within the Mule Studio. One has to really get their head around the whole idea of visual flows and how to wire them in the most optimized and efficient way. Using Mule most likely will also lead to vendor lock in as well as complexities when it comes to upgrading versions from which backward compatibility of flow components can only be left as questionable. These days one rarely has a full need for such heavy weight enterprise service bus within enterprise architectures. Often using mediators and such can be sufficient. Loose coupling is paramount for service oriented delivery of business applications. However, using Mule one could question whether loose coupling comes at a cost of excessive XML and rigid methods in implementation. These days even integration services provide for multiple forms of functionality towards the full Big Data support for ETL. Although, Mule does support batch processing, one could argue that such implementations should really be separate from the use of ESB. Alternatives, that can provide for a more flexible option for integration include Camel in comparison to Mule, even if they strictly speaking cater to different functional domains. Utilizing Mule in new projects and within large teams could require an investment in time. But, one is always left wondering whether using such a technology is perhaps just over engineering on the problem which can better be solved through more loosely coupled approaches and even a wide range of open source libraries.

23 August 2014

Cheerleaders

What is the point of a cheerleader? Well, essentially as the title says they are supposed to lead the crowd, into a cheer, for their team, during a sporting game. However, the whole aspect of cheerleading has turned into an almost gratuitous and sexualized activity as well as pretty much a sexist affair, during certain sporting events. One would wonder, in a modern society where women are looking for equal rights, should they really be taking on such professions to begin with? One also would wonder as to why male cheerleaders get frowned upon and are quite uncommon as a result. It can be the same way stated towards why so many women choose to go into such unwieldy professions only to later claim for more feminist ideals of equality.  Cheerleading is not a high paying profession, so why do so many women find it interesting compared to modelling where they could command comparatively higher pay scales? Are they just looking to be discovered? Is cheerleading a way for them to head into more seedy professions? Are there no real professions available for women in our society? Are such women just craving for attention and popularity? Or, can this be seen more of an animal instinct where women try to attract the most able of men. It appears to be about equality when it suits them. Should men still be expected to hold doors for women as gentlemen were expected to do of the past? Should women be expected more and more to look after themselves? We still find the gold digger analogy where women with no real ambitions other than to find a wealthy man that can provide for them. In what way is this describing equality of women? Perhaps, such ideals of some women taints the bigger picture of what most women actually want out of society. Obviously, it would be unfair to generalize. It is an undeniable fact that cheerleading makes sporting events interesting and entertaining. Models in adverts are also often used to entice consumers. Models are also used for fashion to showcase new designs. Many do feel that a female body is an art form that should be celebrated. However, where does one draw the line between what is equality and what is deemed as hypocritical?

Semantic Pricing

For many businesses it is critical to have an accurate price to sell their products and services. It also provides them a measure of profitability and growth as well as an indicator of optimization of the balance between pricing right to offset supply and demand. One needs to understand competitors in the market as well as to measure consumer demand, and then to calculate the optimal price. As a result, companies often use complicated pricing analytics as consumer markets can change on a daily basis. Ecommerce is a major mover in pricing analytics and there are plenty of specialized software catered to provide such services for decision makers. However, it seems one could even benefit with more semantic pricing of goods and services in markets. Furthermore, Semantic Web with Linked Data could provide for a more connected form of real-time pricing that can impact the business in a positive way on a daily basis. Semantics add more context which is often needed for business strategy and forecasting. Semantic pricing could also come into effect within locals and regions of consumer markets. Semantic Pricing can also add more granularity to seasonal and holiday variances as well as based on variations in promotions and deals.

HarperCollins OpenBook API

HarperCollins have been focusing more recently on ebooks. And, as a next pivotal step they have unleashed the OpenBooks API to provide access for everyone to take ownership in creativity to author their own books as well as build interesting mashups. Although, still in beta it is provided as both Data API and Content API making it very much a flexible in the scope of future functionality. Perhaps, an added bonus here could be the use of OData services as well as Semantic Web. Maybe, even exposing metadata annotations and linking through more articulated eReaders. There is even scope here for providing a JSONLD format for graphical linked data of concepts and relations of stories. Another, creative step the project could take, is towards building out a collaboration platform for shared story creation as an access point for editors, writers, readers, and publishers. An intelligent editor assistant would be quite valuable in this respect to guide writers into specific story structure, plot lines, appropriate character building, and even creative endings. In such manner, the intelligent assistant could be taught to learn the patterns of successful stories and guide the proliferation of new story structures with adaptive editing. Often, the start to a story could involve a deep brainstorming session itself for which intelligent agents could provide much guided support. Collaborative filtering for recommendations could be yet another way for which writers, readers and publishers could endeavor towards successful story development. Publishing companies also can benefit from sentiment analysis in understanding the moving trends of reader interests over the web but also to understand reader opinions on their products as well as brands. Such analysis could also help for market engagement towards connecting more socially with the reading community to not only increase interest in books but also to provide a point of knowledge about consumer intents. Furthermore, the web in all its social forms could provide for a focal view for predictive analytics on the success of a particular book.

Stirfry

Stirfry can be the most appetizing meal. It is also one type of meal that provides for an unlimited variety for creativity and flavors all wrapped into one. It is also a very healthy and easy to make. For many, the increasingly time and budget constraints as well as the health conscious consumer, a stirfry could be an ideal meal for a mouth watering mix of tasty flavors and something that is fresh as well as can be topped with a nice bottle of wine. Such meals also are flexible and adaptable for singles, small to large families with children, as well as party friendly. The appetizing meal can go down well with any dessert and even a side starter. All one really needs is a mixed set of ingredients, the thrill of using a wok, and a mouthful of taste buds to satisfy.

18 August 2014

Azure

Microsoft have never been all that great with internet. And, with their leap into big data and platform as a service cloud, they have tried to embark on yet another ambitious feat. However, the Azure platform leaves one desiring more and yet not getting enough of the basics. The fundamentals are lacking especially the aspects that are supposed to make platform as services seamless. One not only has to find their way around the maze of user interface iconic interactions, but also to keep playing around with access logins. The credential side of it alone should make one wonder off back to the AWS for good. One also is expected to utilize the ugly looking PowerShell. Not to mention it would be quite laughable if they were running Big Data services on Windows based commodity hardware. Windows is a guarantee show stopper for most cloud based services. Especially, Semantic Web and Linked Data services will be dragging in performance if run in such a manner. For many, Linux and Unix are the default standard to have on the Cloud environment. Azure attempts to provide support for Linux, Python, and even Java. However, to what degree are we going to see their support team keep doing their salesman pitch at customers every time they decide to opt out of the .Net environment. Also, giving up on Windows in the cloud would imply Microsoft have finally accepted defeat that the OS is inferior to Linux and Unix. For all intents and purposes, Azure is really only good for the .Net based developers and businesses. One will often find Heroku, Google, and even Amazon open for comparison with Azure. However, one often finds an Azure support team totally besides themselves, often bewildered, and perplexed over why a customer would even dare mention these other cloud providers. There is a rare level of arrogance that behooves the Microsoft brand and all the products that come with it. Yet, over time they are failing by miles to play catch up with the likes of Apple, Google, Amazon, and even Oracle.

16 August 2014

Semantic Web For DevOps

Semantic Web can unleash a whole spectrum of insights for devops teams from rich semantics in real-time monitoring to even connected architectures. It can provide an entire new dimensional view of the enterprise system and a way to organize events, logs, and even jobs. It could even provide a new outlook of linked automations for the cloud. And, even management of deployments and libraries could be unleashed with semantics. It seems organizations still hold back on the new ways of doing things and seem to stick with the usual ways of approaching their architectural complexities. Overtime, organizations that take on Semantic Web, as part of their architecture, will be ready for the future. There is much to be gained when businesses actively control costs in the cloud and where mission critical aspects are often the order of the day. Web 3.0 is getting closer and closer to reality in many domains. And, as projects converge we are bound to see a breaking point of when such approaches become the standard and not just the perennial for research.

Rest.li

Rest.li is yet another Restful approach to development which provides a holistic view to an entire end-to-end architecture. It has been developed by the Linkedin engineering team. The features are so rich that even a directed link to their site can provide an entire informational spectrum of documentation.

Awesome Libraries

Each programming language has its on flavor and community for which a diverse set of libraries emerge. Libraries are useful for developers as they provide for reuse for solving specific implementation needs as well as a way of building on the shoulders of giants through a more stable solution. At times, hunting through the web in search of good libraries can be quite problematic and time consuming. One wants a way of amalgamating all the libraries and frameworks associated to a particular language in one place for easy access which can also be regularly updated. Although, such lists may not be fully exhaustive. The following is a shared list of curated open source libraries specific to a programming language.

awesome-python
awesome-java
awesome-scala
awesome-javascript
awesome-groovy
awesome-kotlin

awesome-awesomeness

RESTX

RESTX is a relatively new and interesting approach to Restful API. It up scales on performance by utilizing a custom built dependency injection container. It also provides a feature rich as well as modular approach to pluggable development for meeting the flexibility needs of any business domain requirements. There is even a useful API docs built in, as well as developer friendly testing and an admin console. There is also a very unique integration with MongoDB. Although, it does not limit one to a specific backend. 

2 August 2014

Organizational Ethics

It is time that organizations had a separate department to study the moral and ethical dilemmas of employees and their employers as well as the business practices. Such departments could be spin-off from compliance and even their governance services. We live in a growing capitalist economy where businesses are dictated with shareholder value without the forethought or care for either employees or customers for that matter. We need to strive for more ethically run businesses that have their own internal audit department for which scrutiny for compliance can be provided in a fair manner and with an established code of conduct. Also, such departments can track employee ethics both internally and externally to combat discrimination in the workplace but also to protect the reputation of the business as a whole. Such departments could also act as third-party mediators that can handle a whole suite of investigation for which the manager or a human resources time may be wasted. While formal and publicly run organizations for such may not be an answer, more thorough approaches need to be taken to protect the rights of employees as well as employers in the workplace. A linked data approach to integrated ethical boundaries could also be a stepping stone in a more interconnected direction. Reducing staff as a way of cutting back on cost of businesses seems to be an almost typical answer from management who may ultimately be the real culprit to bare the blame for the missed performance. It seems for many businesses, fairness goes out the door soon as it starts losing money on the balance sheets. However, it is usually such practices that not only lead to more mistakes and dire performance consequences but also distaste from current as well as former employees. Institutional discrimination in the workplace is also a big dilemma that is often overlooked. Also, another aspect crucial to businesses is whether 'something is the right thing to do given the current circumstances' is also an ethical aspect that many organizations lack in their internal processes as well as part of their strategy. Perhaps, it is time in civilized societies we started caring more for our environment, our employees, and not just for shareholder value. This will not only help businesses take responsibility for their actions but also be held accountable both to their employees, customers, as well as within the global space of things for their cumulative effects to an economy.