Welcome to data science's dirty secret: real-world data is messy. Data scientists must spend a good deal of time playing software developer, writing code to clean up data before they can actually do anything constructive with it. It's a necessary evil, but you can still make the most of it. This practical book walks you through several real-world examples to demonstrate the theory and practice behind working with and cleaning up dirty data. No one tool solves all of the problems well. Wise data scientists learn many tools and learn where each one shines. To that end, this book takes a polyglot approach: most examples will involve R and Python, but expect the occasional smattering of Groovy and sed/awk fun.
- RRP £31.99
Whether you're building the newest and hottest social media web site or developing an internal-use-only enterprise business intelligence application, scaling your data model has never been more important. Traditional relational databases, while familiar, present significant challenges and complications when trying to scale up to such "big data" needs. Into this world steps MongoDB, a leading NoSQL database, to address these scaling challenges while also simplifying the process of development. However, in all the hype surrounding big data, many sites have launched their business on NoSQL databases without an understanding of the techniques necessary to effectively use the features of their chosen database. MongoDB Applied Design Patterns provides the much-needed connection between the features of MongoDB and the business problems that it is suited to solve. The book's focus on the practical aspects of the MongoDB implementation makes it an ideal purchase for developers charged with bringing MongoDB's scalability to bear on the particular problem you've been tasked to solve.
- RRP £27.99
- Save £1.50Save 5%
The Web is getting faster, and the data it delivers is getting bigger. How can you handle everything efficiently? This book introduces Spark, an open source cluster computing system that makes data analytics fast to run and fast to write. You'll learn how to run programs faster, using primitives for in-memory cluster computing. With Spark, your job can load data into memory and query it repeatedly much quicker than with disk-based systems like Hadoop MapReduce. Written by the developers of Spark, this book will have you up and running in no time. You'll learn how to express MapReduce jobs with just a few simple lines of Spark code, instead of spending extra time and effort working with Hadoop's raw Java API. Quickly dive into Spark capabilities such as collect, count, reduce, and save Use one programming paradigm instead of mixing and matching tools such as Hive, Hadoop, Mahout, and S4/Storm Learn how to run interactive, iterative, and incremental analyses Integrate with Scala to manipulate distributed datasets like local collections Tackle partitioning issues, data locality, default hash partitioning, user-defined partitioners, and custom serialization Use other languages by means of pipe() to achieve the equivalent of Hadoop streaming
- RRP £31.99
- Save £6.40Save 20%
Get up to speed on the nuances of NoSQL databases and what theymean for your organization
- RRP £24.99
This easy to read guide to NoSQL databases provides the type ofno-nonsense overview and analysis that you need to learn, includingwhat NoSQL is and which database is right for you. Featuringspecific evaluation criteria for NoSQL databases, along with a lookinto the pros and cons of the most popular options, NoSQL ForDummies provides the fastest and easiest way to dive into thedetails of this incredible technology. You'll gain an understandingof how to use NoSQL databases for mission-critical enterprisearchitectures and projects, and real-world examples reinforce theprimary points to create an action-oriented resource for ITpros.
If you're planning a big data project or platform, you probablyalready know you need to select a NoSQL database to complete yourarchitecture. But with options flooding the market and updates andadd-ons coming at a rapid pace, determining what you require now,and in the future, can be a tall task. This is where NoSQL ForDummies comes in! * Learn the basic tenets of NoSQL databases and why they havecome to the forefront as data has outpaced the capabilities ofrelational databases * Discover major players among NoSQL databases, includingCassandra, MongoDB, MarkLogic, Neo4J, and others * Get an in-depth look at the benefits and disadvantages of thewide variety of NoSQL database options * Explore the needs of your organization as they relate to thecapabilities of specific NoSQL databases
Big data and Hadoop get all the attention, but when it comesdown to it, NoSQL databases are the engines that power many bigdata analytics initiatives. With NoSQL For Dummies, you'llgo beyond relational databases to ramp up your enterprise's dataarchitecture in no time.
Includes Coverage of Oracle and Microsoft SQL Implementations In just 24 lessons of one hour or less, Sams Teach Yourself SQL in 24 Hours, Sixth Edition, helps you use SQL to build effective databases, efficiently retrieve data, and manage everything from performance to security. This book's straightforward, step-by-step approach shows you how to work with database structures, objects, queries, tables, and more. In just hours, you will be applying advanced techniques, including views, transactions, web connections, and powerful Oracle and SQL Server extensions. Every lesson builds on what you've already learned, giving you a rock-solid foundation for real-world success. Step-by-step instructions carefully walk you through the most common SQL tasks. Practical, hands-on examples show you how to apply what you learn. Quizzes and exercises help you test your knowledge and stretch your skills. Notes and tips point out shortcuts and solutions. Learn how to...* Define efficient database structures and objects * "Normalize" raw databases into logically organized tables * Edit relational data and tables with DML * Manage transactions * Write effective, well-performing queries * Categorize, summarize, sort, group, and restructure data * Work with dates and times * Join tables in queries, use subqueries, and combine multiple queries * Master powerful query optimization techniques * Administer databases and manage users * Secure databases and protect data * Use views, synonyms, and the system catalog * Extend SQL to the enterprise and Internet * Master important Oracle and Microsoft extensions to ANSI SQL Register your product at informit.com/register for convenient access to downloads, updates, and corrections as they become available.
- RRP £29.49
- Save £4.10Save 14%
Build advanced authentication solutions for any cloud or web environment Active Directory has been transformed to reflect the cloud revolution, modern protocols, and today's newest SaaS paradigms. This is an authoritative, deep-dive guide to building Active Directory authentication solutions for these new environments. Author Vittorio Bertocci drove these technologies from initial concept to general availability, playing key roles in everything from technical design to documentation. In this book, he delivers comprehensive guidance for building complete solutions. For each app type, Bertocci presents high-level scenarios and quick implementation steps, illuminates key concepts in greater depth, and helps you refine your solution to improve performance and reliability. He helps you make sense of highly abstract architectural diagrams and nitty-gritty protocol and implementation details. This is the book for people motivated to become experts. Active Directory Program Manager Vittorio Bertocci shows you how to: * Address authentication challenges in the cloud or on-premises* Systematically protect apps with Azure AD and AD Federation Services* Power sign-in flows with OpenID Connect, Azure AD, and AD libraries* Make the most of OpenID Connect's middleware and supporting classes* Work with the Azure AD representation of apps and their relationships* Provide fine-grained app access control via roles, groups, and permissions* Consume and expose Web APIs protected by Azure AD* Understand new authentication protocols without reading complex spec documents
- RRP £29.49
- Save £4.00Save 13%
This new edition of Essential SQLAlchemy is the tool developers need to understand the technology. Rather than being a simple tutorial or API reference, this book builds an application step by step. This application is comprised of many of the most common usages of SQLAlchemy, thus showing how to manage complexity and engaging in real world examples. Using easy, common language, the author teaches you how to turn knowledge into usable work.
- RRP £31.99
The best-selling author of Big Data is back, this time with a unique and in-depth insight into how specific companies use big data. Big data is on the tip of everyone's tongue. Everyone understands its power and importance, but many fail to grasp the actionable steps and resources required to utilise it effectively. This book fills the knowledge gap by showing how major companies are using big data every day, from an up-close, on-the-ground perspective. From technology, media and retail, to sport teams, government agencies and financial institutions, learn the actual strategies and processes being used to learn about customers, improve manufacturing, spur innovation, improve safety and so much more. Organised for easy dip-in navigation, each chapter follows the same structure to give you the information you need quickly. For each company profiled, learn what data was used, what problem it solved and the processes put it place to make it practical, as well as the technical details, challenges and lessons learned from each unique scenario. * Learn how predictive analytics helps Amazon, Target, John Deere and Apple understand their customers * Discover how big data is behind the success of Walmart, LinkedIn, Microsoft and more * Learn how big data is changing medicine, law enforcement, hospitality, fashion, science and banking * Develop your own big data strategy by accessing additional reading materials at the end of each chapter
- RRP £29.99
- Save £6.00Save 20%
"Search" transforms the essential but challenging topic of search algorithms into a fantasy-noir mystery for the digital age. This is a unique introduction to search algorithms and how they work, written by a Google engineer with specializations in algorithms and machine learning.Meet Frank Runtime. Disgraced ex-detective. Hard-boiled private eye. Search expert. When the police headquarters is hit with a robbery, Frank and his extensive search skills are called upon to catch the culprits. Pulling out his best algorithms, Frank scours smugglers' boats with binary search, tails spies with a search tree, escapes burning prisons with breadth-first search, and burns down cafeterias with queues. He's joined by know-it-all rookie Officer Notation and Socks the inept wizard as he follows leads in a best-first search that unravels a deep conspiracy. Each chapter introduces a new twist and a new concept, ending with a technical summary.From finding informants with exhaustive search to lock-picking with priority queues, Frank's mission will give you an understanding of: The algorithms behind best-first and depth-first search, iterative deepening, parallelizing, binary search, index inversion, and moreBasic computational concepts like strings, arrays, stacks, and queues How to adapt search algorithms to unusual data structuresThe most efficient algorithm to use in a situation, and when to apply common-sense heuristic methodsFor computer science students and amateur sleuths alike, "Search" is the most efficient route to understanding algorithms.
- RRP £14.99
- Save £0.60Save 4%
Learn how to take full advantage of Apache Kafka, the distributed, publish-subscribe queue for handling real-time data feeds. With this comprehensive book, you ll understand how Kafka works and how it s designed. Authors Neha Narkhede, Gwen Shapira, and Todd Palino show you how to deploy production Kafka clusters; secure, tune, and monitor them; write rock-solid applications that use Kafka; and build scalable stream-processing applications.Learn how Kafka compares to other queues, and where it fits in the big data ecosystemDive into Kafka s internal designPick up best practices for developing applications that use KafkaUnderstand the best way to deploy Kafka in production monitoring, tuning, and maintenance tasksLearn how to secure a Kafka clusterGet detailed use-cases"
- RRP £43.99
- Save £8.80Save 20%
How can data compression save you? If you're in the business of building or marketing mobile apps and services, this book is an essential read. Mobile users are consuming and producing so much data today that they routinely exceed their data plans. They leave sites that don't load quickly and delete apps to save space on their devices. Data compression can help you turn the tide-right away. This book uses diagrams and games to help you learn about data compression and its value in today's mobile and big data spaces. No advanced coding knowledge is required. You'll learn several core algorithms for data compression that you can apply immediately. Authors Colt and Aleks Haecky also take you on a fun and engaging tour of the history and theory of data compression by introducing key figures of the compression industry, including Claude Shannon. If you want to attract and retain mobile users with quick-loading apps that will help them save money, this book is a must. People involved with big data will also benefit, whether they're students, mid-level engineers, or CTOs. Understanding Compression is a perfect merging of algorithms, history, business advice, and offbeat humor.
- RRP £29.50
- Save £5.90Save 19%
Apache Hadoop is the technology at the heart of the Big Data revolution, and Hadoop skills are in enormous demand. Now, in just 24 lessons of one hour or less, you can learn all the skills and techniques you'll need to deploy each key component of a Hadoop platform in your local environment or in the cloud, building a fully functional Hadoop cluster and using it with real programs and datasets. Each short, easy lesson builds on all that's come before, helping you master all of Hadoop's essentials, and extend it to meet your unique challenges. Apache Hadoop in 24 Hours, Sams Teach Yourself covers all this, and much more: * Understanding Hadoop and the Hadoop Distributed File System (HDFS) * Importing data into Hadoop, and process it there * Mastering basic MapReduce Java programming, and using advanced MapReduce API concepts * Making the most of Apache Pig and Apache Hive * Implementing and administering YARN * Taking advantage of the full Hadoop ecosystem * Managing Hadoop clusters with Apache Ambari * Working with the Hadoop User Environment (HUE) * Scaling, securing, and troubleshooting Hadoop environments * Integrating Hadoop into the enterprise * Deploying Hadoop in the cloud * Getting started with Apache Spark Step-by-step instructions walk you through common questions, issues, and tasks; Q-and-As, Quizzes, and Exercises build and test your knowledge; "Did You Know?" tips offer insider advice and shortcuts; and "Watch Out!" alerts help you avoid pitfalls. By the time you're finished, you'll be comfortable using Apache Hadoop to solve a wide spectrum of Big Data problems.
- RRP £36.99
- Save £5.10Save 13%
Uncover the secrets of SQL and start building better relational databases today! This fun and friendly guide will help you demystify database management systems so you can create more powerful databases and access information with ease. Updated for the latest SQL functionality, SQL For Dummies, 8th Edition covers the core SQL language and shows you how to use SQL to structure a DBMS, implement a database design, secure your data, and retrieve information when you need it. Includes new enhancements of SQL:2011, including temporal data functionality which allows you to set valid times for transactions to occur and helps prevent database corruption Covers creating, accessing, manipulating, maintaining, and storing information in relational database management systems like Access, Oracle, SQL Server, and MySQL Provides tips for keeping your data safe from theft, accidental or malicious corruption, or loss due to equipment failures and advice on eliminating errors in your work Don't be daunted by database development anymore - get SQL For Dummies, 8th Edition , and you'll be on your way to SQL stardom.
- RRP £21.99
- Save £1.20Save 5%
Let Hadoop For Dummies help harness the power of your data and rein in the information overload Big data has become big business, and companies and organizations of all sizes are struggling to find ways to retrieve valuable information from their massive data sets with becoming overwhelmed. Enter Hadoop and this easy-to-understand ForDummies guide. Hadoop For Dummies helps readers understand the value of big data, make a business case for using Hadoop, navigate the Hadoop ecosystem, and build and manage Hadoop applications and clusters. Explains the origins of Hadoop, its economic benefits, and its functionality and practical applications Helps you find your way around the Hadoop ecosystem, program MapReduce, utilize design patterns, and get your Hadoop cluster up and running quickly and easily Details how to use Hadoop applications for data mining, web analytics and personalization, large-scale text processing, data science, and problem-solving Shows you how to improve the value of your Hadoop cluster, maximize your investment in Hadoop, and avoid common pitfalls when building your Hadoop cluster From programmers challenged with building and maintaining affordable, scaleable data systems to administrators who must deal with huge volumes of information effectively and efficiently, this how-to has something to help you with Hadoop.
- RRP £21.99
- Save £4.40Save 20%
ALTHR 7 years +7 years +
An easy-to-grasp introduction to coding concepts for kids Coding For Kids For Dummies breaks coding into a series of small projects, each designed to teach elementary-to-middle-school-aged students a core concept to build a game, application, or other tool. In this his hands-on, friendly guide readers will get access to a leading coding tool that has been designed specifically for kids, showing them how to create the projects provided in the book as well as how to implement them into their own creative work. Written by a teacher and leading advocate of coding education, Coding For Kids For Dummies explains to kids in plain English how to apply the math and logic skills they already have to the subject of coding. In no time, they'll be grasping basic coding concepts, completing their very own technical feats, and arming themselves with the computer science experience and know-how to prepare for a future working with technology. Lay-flat binding allows for easy access as students work on projects Full-color, large-print design make the information more approachable to kids Kids interested in computer science get a competitive edge The author has dedicated her career to enhancing coding and other STEM education in schools If you're a student who wants to learn coding, a parent who wants to help your kid pursue an interest in coding, or a teacher who is in need of a supplemental course book for your computer science class, Coding For Kids For Dummies has you covered.
- RRP £21.99
- Save £1.10Save 5%
Data Science and Big Data Analytics is about harnessing the power of data for new insights. The book covers the breadth of activities and methods and tools that Data Scientists use. The content focuses on concepts, principles and practical applications that are applicable to any industry and technology environment, and the learning is supported and explained with examples that you can replicate using open-source software.
- RRP £47.50
- Save £9.50Save 19%
This book will help you: * Become a contributor on a data science team * Deploy a structured lifecycle approach to data analytics problems * Apply appropriate analytic techniques and tools to analyzing big data * Learn how to tell a compelling story with data to drive business action * Prepare for EMC Proven Professional Data Science Certification
Ready to unlock the power of your data? With the fourth edition of this comprehensive guide, you'll learn how to build and maintain reliable, scalable, distributed systems with Apache Hadoop. This book is ideal for programmers looking to analyze datasets of any size, and for administrators who want to set up and run Hadoop clusters. You'll find illuminating case studies that demonstrate how Hadoop is used to solve specific problems. This edition includes new case studies, updates on Hadoop 2, a refreshed HBase chapter, and new chapters on Crunch and Flume. Author Tom White also suggests learning paths for the book.Store large datasets with the Hadoop Distributed File System (HDFS) Run distributed computations with MapReduce Use Hadoop's data and I/O building blocks for compression, data integrity, serialization (including Avro), and persistence Discover common pitfalls and advanced features for writing real-world MapReduce programs Design, build, and administer a dedicated Hadoop cluster - or run Hadoop in the cloud Load data from relational databases into HDFS, using Sqoop Perform large-scale data processing with the Pig query language Analyze datasets with Hive, Hadoop's data warehousing system Take advantage of HBase for structured and semi-structured data, and ZooKeeper for building distributed systems
- RRP £47.99
- Save £9.60Save 20%
Prominent FileMaker developers Susan Prosser and Stuart Gripman know how to work magic with FileMaker Pro, and they share their knowledge in this bestselling book. You'll learn how to build a database and organize all of your information quickly and efficiently.
- RRP £31.99
- Save £6.80Save 21%
Get a solid grounding in Oozie, the workflow scheduler for Hadoop jobs. With this practical guide, two experienced Hadoop practitioners teach you Oozie concepts and caveats through lots of examples. You'll learn how to set up an Oozie server and run jobs, then dive into Oozie workflow techniques: coordinating workflows, bundling applications, and writing to them. Advanced topics show you how to use Oozie to submit MapReduce, Pig, and Hive jobs directly, and how to use Oozie's security capabilities.
- RRP £31.99
- Save £6.70Save 20%
Learn the algorithms and tools you need to build MapReduce applications with Hadoop and Spark for processing gigabyte, terabyte, or petabyte-sized datasets on clusters of commodity hardware. With this practical book, author Mahmoud Parsian, head of the big data team at Illumina, takes you step-by-stepthrough the design of machine-learning algorithms, such as Naive Bayes and Markov Chain, and shows you how apply them to clinical and biological datasets, using MapReduce design patterns.Apply MapReduce algorithms to clinical and biological data, such as DNA-Seq and RNA-SeqUse the most relevant regression/analytical algorithms used for different biological data typesApply t-test, joins, top-10, and correlation algorithms using MapReduce/Hadoop and Spark
- RRP £55.99
- Save £11.20Save 20%
Businesses are gathering data today at exponential rates and yet few people know how to access it meaningfully. If you're a business or IT professional, this short hands-on guide teaches you how to pull and transform data with SQL in significant ways. You will quickly master the fundamentals of SQL and learn how to create your own databases. Author Thomas Nield provides exercises throughout the book to help you practice your newfound SQL skills at home, without having to use a database server environment. Not only will you learn how to use key SQL statements to find and manipulate your data, but you'll also discover how to efficiently design and manage databases to meet your needs. You'll also learn how to: Explore relational databases, including lightweight and centralized models Use SQLite and SQLiteStudio to create lightweight databases in minutes Query and transform data in meaningful ways by using SELECT, WHERE, GROUP BY, and ORDER BY Join tables to get a more complete view of your business data Build your own tables and centralized databases by using normalized design principles Manage data by learning how to INSERT, DELETE, and UPDATE records
- RRP £26.99
- Save £5.40Save 20%
This book is an introduction and deep-dive into the many uses of dynamic SQL in Microsoft SQL Server. Dynamic SQL is key to large-scale searching based upon user-entered criteria. It's also useful in generating value-lists, in dynamic pivoting of data for business intelligence reporting, and for customizing database objects and querying their structure. Executing dynamic SQL is at the heart of applications such as business intelligence dashboards that need to be fluid and respond instantly to changing user needs as those users explore their data and view the results. Yet dynamic SQL is feared by many due to concerns over SQL injection attacks. Reading Dynamic SQL: Applications, Performance, and Security is your opportunity to learn and master an often misunderstood feature, including security and SQL injection. All aspects of security relevant to dynamic SQL are discussed in this book. You will learn many ways to save time and develop code more efficiently, and you will practice directly with security scenarios that threaten companies around the world every day. Dynamic SQL: Applications, Performance, and Security helps you bring the productivity and user-satisfaction of flexible and responsive applications to your organization safely and securely. Your organization's increased ability to respond to rapidly changing business scenarios will build competitive advantage in an increasingly crowded and competitive global marketplace. * Discusses many applications of dynamic SQL, both simple and complex.* Explains each example with demos that can be run at home and on your laptop.* Helps you to identify when dynamic SQL can offer superior performance.* Pays attention to security and best practices to ensure safety of your data. What You Will Learn * Build flexible applications that respond fast to changing business needs. * Take advantage of unconventional but productive uses of dynamic SQL. * Protect your data from attack through best-practices in your implementations. * Know about SQL Injection and be confident in your defenses against it * Run at high performance by optimizing dynamic SQL in your applications. * Troubleshoot and debug dynamic SQL to ensure correct results. Who This Book is For Dynamic SQL: Applications, Performance, and Security is for developers and database administrators looking to hone and build their T-SQL coding skills. The book is ideal for advanced users wanting to plumb the depths of application flexibility and troubleshoot performance issues involving dynamic SQL. The book is also ideal for beginners wanting to learn what dynamic SQL is about and how it can help them deliver competitive advantage to their organizations.
Beginning Queries with SQL is a friendly and easily read guide to writing queries with the all-important - in the database world - SQL language. Anyone who does any work at all with databases needs to know something of SQL, and that is evidenced by the strong sales of such books as Learning SQL (O'Reilly) and SQL Queries for Mere Mortals (Pearson). Beginning Queries with SQL is written by the author of Beginning Database Design, an author who is garnering great reviews on Amazon due to the clarity and succinctness of her writing.
Imagine what you could do if scalability wasn't a problem. With this hands-on guide, you'll learn how the Cassandra database management system handles hundreds of terabytes of data while remaining highly available across multiple data centers. This expanded second edition-updated for Cassandra 3.0-provides the technical details and practical examples you need to put this database to work in a production environment. Authors Jeff Carpenter and Eben Hewitt demonstrate the advantages of Cassandra's non-relational design, with special attention to data modeling. If you're a developer, DBA, or application architect looking to solve a database scaling issue or future-proof your application, this guide helps you harness Cassandra's speed and flexibility. Understand Cassandra's distributed and decentralized structure Use the Cassandra Query Language (CQL) and cqlsh-the CQL shell Create a working data model and compare it with an equivalent relational model Develop sample applications using client drivers for languages including Java, Python, and Node.js Explore cluster topology and learn how nodes exchange data Maintain a high level of performance in your cluster Deploy Cassandra on site, in the Cloud, or with Docker Integrate Cassandra with Spark, Hadoop, Elasticsearch, Solr, and Lucene
- RRP £39.99
- Save £8.00Save 20%